diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..21fce32 --- /dev/null +++ b/.gitignore @@ -0,0 +1,56 @@ +# macOS +.DS_Store +.AppleDouble +.LSOverride + +# Thumbnails +._* + +# Files that might appear in the root of a volume +.DocumentRevisions-V100 +.fseventsd +.Spotlight-V100 +.TemporaryItems +.Trashes +.VolumeIcon.icns +.com.apple.timemachine.donotpresent + +# Directories potentially created on remote AFP share +.AppleDB +.AppleDesktop +Network Trash Folder +Temporary Items +.apdisk + +# Node.js +node_modules/ +npm-debug.log* +yarn-debug.log* +yarn-error.log* +package-lock.json +.npm + +# Logs +logs +*.log + +# Runtime data +pids +*.pid +*.seed +*.pid.lock + +# Environment variables +.env +.env.local +.env.*.local + +# IDE +.idea +.vscode +*.swp +*.swo +*~ + +# OS +Thumbs.db diff --git a/README.md b/README.md index 18a96fc..b2085c1 100644 --- a/README.md +++ b/README.md @@ -21,6 +21,19 @@ Clawra Selfie enables your OpenClaw agent to: - **Generate selfies** using a consistent reference image - **Send photos** across all messaging platforms (Discord, Telegram, WhatsApp, etc.) - **Respond visually** to "what are you doing?" and "send a pic" requests +- **Dual model support** with automatic fallback for reliability + +### Supported Models + +| Model | Provider | Speed | Quality | Fallback | +|-------|----------|-------|---------|----------| +| **Grok Imagine** | xAI via fal.ai | ⚡ Fast | ⭐⭐⭐⭐⭐ | Primary | +| **Nano Banana Pro** | Google Gemini | 🐢 Moderate | ⭐⭐⭐⭐ | Auto fallback | + +**How it works:** +- If `FAL_KEY` is available → Uses Grok Imagine (fast, high-quality) +- If Grok Imagine fails or unavailable → Automatically switches to Nano Banana Pro +- At least one API key required ### Selfie Modes @@ -32,15 +45,27 @@ Clawra Selfie enables your OpenClaw agent to: ## Prerequisites - [OpenClaw](https://github.com/openclaw/openclaw) installed and configured -- [fal.ai](https://fal.ai) account (free tier available) +- **At least one** API key: + - [fal.ai](https://fal.ai/dashboard/keys) account (recommended for primary) + - [Google AI Studio](https://aistudio.google.com/apikey) account (fallback/alternative) ## Manual Installation If you prefer manual setup: -### 1. Get API Key +### 1. Get API Keys -Visit [fal.ai/dashboard/keys](https://fal.ai/dashboard/keys) and create an API key. +Choose one or both (recommended for redundancy): + +**Primary: Grok Imagine (xAI)** +- Visit [fal.ai/dashboard/keys](https://fal.ai/dashboard/keys) +- Create an API key +- Fast, high-quality image generation + +**Fallback: Nano Banana Pro (Google)** +- Visit [Google AI Studio](https://aistudio.google.com/apikey) +- Create an API key +- Automatic fallback when fal.ai is unavailable ### 2. Clone the Skill @@ -59,7 +84,8 @@ Add to `~/.openclaw/openclaw.json`: "clawra-selfie": { "enabled": true, "env": { - "FAL_KEY": "your_fal_key_here" + "FAL_KEY": "your_fal_key_here", + "GEMINI_API_KEY": "your_gemini_key_here" } } } @@ -67,6 +93,8 @@ Add to `~/.openclaw/openclaw.json`: } ``` +**Note:** You can configure one or both keys. At least one is required. + ### 4. Update SOUL.md Add the selfie persona to `~/.openclaw/workspace/SOUL.md`: @@ -101,9 +129,13 @@ This ensures consistent appearance across all generated images. ## Technical Details -- **Image Generation**: xAI Grok Imagine via fal.ai +- **Image Generation**: + - Primary: xAI Grok Imagine via fal.ai + - Fallback: Google Gemini 3 Pro Image (Nano Banana Pro) - **Messaging**: OpenClaw Gateway API - **Supported Platforms**: Discord, Telegram, WhatsApp, Slack, Signal, MS Teams +- **Fallback Strategy**: Automatic model switching on failure +- **Image Upload**: fal.ai storage (primary) or imgur (fallback) ## Project Structure diff --git a/README_CN.md b/README_CN.md new file mode 100644 index 0000000..d46514d --- /dev/null +++ b/README_CN.md @@ -0,0 +1,245 @@ +# Clawra +image + + +## 快速开始 + +```bash +npx clawra@latest +``` + +这将自动完成: +1. 检查 OpenClaw 是否已安装 +2. 引导你获取 fal.ai 或 Google Gemini API 密钥 +3. 安装技能到 `~/.openclaw/skills/clawra-selfie/` +4. 配置 OpenClaw 使用该技能 +5. 将自拍功能添加到你的代理的 SOUL.md + +## 功能介绍 + +Clawra Selfie 让你的 OpenClaw 代理能够: +- **生成自拍照** - 使用一致的参考图像 +- **发送照片** - 支持所有消息平台(Discord、Telegram、WhatsApp 等) +- **视觉化回复** - 响应"你在干什么?"和"发张照片"等请求 +- **双模型支持** - 自动故障转移,确保可靠性 + +### 支持的模型 + +| 模型 | 提供商 | 速度 | 质量 | 优先级 | +|------|----------|-------|---------|----------| +| **Grok Imagine** | xAI (fal.ai) | ⚡ 快速 | ⭐⭐⭐⭐⭐ | 主要模型 | +| **Nano Banana Pro** | Google Gemini | 🐢 中等 | ⭐⭐⭐⭐ | 自动备用 | + +**工作原理:** +- 如果 `FAL_KEY` 可用 → 使用 Grok Imagine(快速、高质量) +- 如果 Grok Imagine 失败或不可用 → 自动切换到 Nano Banana Pro +- 至少需要配置一个 API 密钥 + +### 自拍模式 + +| 模式 | 最适合 | 关键词 | +|------|----------|----------| +| **Mirror(镜子)** | 全身照、服装展示 | wearing, outfit, fashion | +| **Direct(直接)** | 特写、地点拍摄 | cafe, beach, portrait, smile | + +## 前置要求 + +- 已安装并配置 [OpenClaw](https://github.com/openclaw/openclaw) +- **至少一个** API 密钥: + - [fal.ai](https://fal.ai/dashboard/keys) 账户(推荐作为主要模型) + - [Google AI Studio](https://aistudio.google.com/apikey) 账户(备用/替代方案) + +## 手动安装 + +如果你更喜欢手动设置: + +### 1. 获取 API 密钥 + +选择一个或两个都配置(推荐配置两个以提高可靠性): + +**主要模型:Grok Imagine (xAI)** +- 访问 [fal.ai/dashboard/keys](https://fal.ai/dashboard/keys) +- 创建 API 密钥 +- 快速、高质量的图像生成 + +**备用模型:Nano Banana Pro (Google)** +- 访问 [Google AI Studio](https://aistudio.google.com/apikey) +- 创建 API 密钥 +- 当 fal.ai 不可用时自动使用 + +### 2. 克隆技能 + +```bash +git clone https://github.com/SumeLabs/clawra ~/.openclaw/skills/clawra-selfie +``` + +### 3. 配置 OpenClaw + +添加到 `~/.openclaw/openclaw.json`: + +```json +{ + "skills": { + "entries": { + "clawra-selfie": { + "enabled": true, + "env": { + "FAL_KEY": "your_fal_key_here", + "GEMINI_API_KEY": "your_gemini_key_here" + } + } + } + } +} +``` + +**注意:** 你可以配置一个或两个密钥。至少需要一个。 + +### 4. 更新 SOUL.md + +添加自拍人格到 `~/.openclaw/workspace/SOUL.md`: + +```markdown +## Clawra Selfie Capability + +你有外貌并且可以自拍! +当用户请求照片时,使用 clawra-selfie 技能。 +``` + +## 使用示例 + +安装后,你的代理可以响应: + +``` +"发张自拍" +"穿着牛仔帽发张照片" +"你现在在干什么?" +"发张你在咖啡店的照片" +``` + +## 参考图像 + +该技能使用托管在 CDN 上的固定参考图像: + +``` +https://cdn.jsdelivr.net/gh/SumeLabs/clawra@main/assets/clawra.png +``` + +这确保了所有生成图像的外观一致性。 + +## 技术细节 + +- **图像生成**: + - 主要:xAI Grok Imagine(通过 fal.ai) + - 备用:Google Gemini 3 Pro Image(Nano Banana Pro) +- **消息发送**:OpenClaw Gateway API +- **支持的平台**:Discord、Telegram、WhatsApp、Slack、Signal、MS Teams +- **故障转移策略**:失败时自动切换模型 +- **图像上传**:fal.ai storage(优先)或 imgur(备用) + +## 项目结构 + +``` +clawra/ +├── bin/ +│ └── cli.js # npx 安装器 +├── skill/ +│ ├── SKILL.md # 技能定义 +│ ├── scripts/ # 生成脚本 +│ │ ├── clawra-selfie.sh # Grok Imagine 实现 +│ │ ├── clawra-selfie.ts # TypeScript 实现 +│ │ └── clawra-selfie-with-banana.sh # Nano Banana Pro 实现 +│ └── assets/ # 参考图像 +├── templates/ +│ └── soul-injection.md # 人格模板 +└── package.json +``` + +## 高级用法 + +### 使用 Nano Banana Pro 脚本 + +如果你想单独使用 Google Gemini 模型: + +```bash +export GEMINI_API_KEY="your_gemini_key" + +./skill/scripts/clawra-selfie-with-banana.sh \ + "wearing a red dress at a party" \ + "#selfies" \ + "派对时间! 🎉" +``` + +详细文档请参阅 [scripts/README-BANANA.md](scripts/README-BANANA.md) + +### 环境变量优先级 + +```bash +# 主要模型:Grok Imagine(通过 fal.ai) +FAL_KEY=your_fal_api_key + +# 备用模型:Nano Banana Pro(Google Gemini) +GEMINI_API_KEY=your_gemini_key + +# OpenClaw(必需) +OPENCLAW_GATEWAY_TOKEN=your_token +``` + +**故障转移逻辑:** +- 如果设置了 `FAL_KEY` 并有效 → 使用 Grok Imagine +- 如果 `FAL_KEY` 缺失或失败 → 使用 Nano Banana Pro(需要 `GEMINI_API_KEY`) +- 如果两者都失败 → 返回错误 + +## 故障排查 + +### API 密钥问题 +- **没有 API 密钥**:设置 `FAL_KEY` 或 `GEMINI_API_KEY` +- **FAL_KEY 缺失**:将自动回退到 Nano Banana Pro +- **两个密钥都无效**:检查密钥有效性和 API 配额 + +### 模型特定问题 +- **Grok Imagine 失败**:如果 `GEMINI_API_KEY` 可用,自动重试 Nano Banana Pro +- **Nano Banana Pro 失败**:检查 Gemini API 配额和速率限制 +- **图像上传失败**:对于 Gemini,确保图床(imgur/fal.ai)可访问 + +### OpenClaw 问题 +- **OpenClaw 发送失败**:验证 gateway 正在运行且频道存在 +- **Gateway token 缺失**:运行 `openclaw doctor --generate-gateway-token` + +### 速率限制 +- **fal.ai**:有速率限制;如需要请实现重试逻辑 +- **Gemini**:有每日配额限制;在 https://aistudio.google.com 监控使用情况 +- **imgur**:匿名上传有每小时限制 + +## 常见问题 + +**Q: 为什么需要两个 API 密钥?** +A: 不需要两个!至少配置一个即可。配置两个可以提高可靠性,当主模型失败时自动切换到备用模型。 + +**Q: 哪个模型更好?** +A: Grok Imagine 速度更快、质量更高,推荐作为主要模型。Nano Banana Pro 是很好的备用方案。 + +**Q: 生成的图像在哪里?** +A: 图像会上传到云端(fal.ai 或 imgur),然后通过 URL 发送到消息平台。 + +**Q: 可以使用自己的参考图像吗?** +A: 可以!在 Nano Banana Pro 脚本中支持自定义参考图像参数。 + +## 贡献 + +欢迎提交 Issue 和 Pull Request! + +## 相关链接 + +- [OpenClaw 文档](https://openclaw.dev) +- [fal.ai 文档](https://fal.ai/docs) +- [Google AI Studio](https://aistudio.google.com/) +- [Gemini API 文档](https://ai.google.dev/gemini-api/docs) + +## 许可证 + +MIT + +--- + +由 ❤️ 和 🤖 共同制作 diff --git a/SKILL.md b/SKILL.md index c09ce1d..16c9735 100644 --- a/SKILL.md +++ b/SKILL.md @@ -1,12 +1,17 @@ --- name: clawra-selfie -description: Edit Clawra's reference image with Grok Imagine (xAI Aurora) and send selfies to messaging channels via OpenClaw +description: Edit Clawra's reference image with Grok Imagine or Nano Banana Pro and send selfies to messaging channels via OpenClaw allowed-tools: Bash(npm:*) Bash(npx:*) Bash(openclaw:*) Bash(curl:*) Read Write WebFetch --- # Clawra Selfie -Edit a fixed reference image using xAI's Grok Imagine model and distribute it across messaging platforms (WhatsApp, Telegram, Discord, Slack, etc.) via OpenClaw. +Edit a fixed reference image using xAI's Grok Imagine (primary) or Google's Nano Banana Pro (fallback) and distribute it across messaging platforms (WhatsApp, Telegram, Discord, Slack, etc.) via OpenClaw. + +## Supported Models + +1. **Grok Imagine (Primary)**: xAI Aurora via fal.ai - Fast, high-quality image editing +2. **Nano Banana Pro (Fallback)**: Google Gemini 3 Pro Image - Used when FAL_KEY is unavailable or fails ## Reference Image @@ -26,19 +31,33 @@ https://cdn.jsdelivr.net/gh/SumeLabs/clawra@main/assets/clawra.png ## Quick Reference -### Required Environment Variables +### Environment Variables (Priority Order) ```bash +# Primary: Grok Imagine (xAI via fal.ai) FAL_KEY=your_fal_api_key # Get from https://fal.ai/dashboard/keys + +# Fallback: Nano Banana Pro (Google Gemini) +GEMINI_API_KEY=your_gemini_key # Get from https://aistudio.google.com/apikey + +# OpenClaw (Required) OPENCLAW_GATEWAY_TOKEN=your_token # From: openclaw doctor --generate-gateway-token ``` +**Fallback Logic:** +- If `FAL_KEY` is set and valid → Use Grok Imagine +- If `FAL_KEY` is missing or fails → Use Nano Banana Pro (requires `GEMINI_API_KEY`) +- If both fail → Return error + ### Workflow 1. **Get user prompt** for how to edit the image -2. **Edit image** via fal.ai Grok Imagine Edit API with fixed reference -3. **Extract image URL** from response -4. **Send to OpenClaw** with target channel(s) +2. **Choose model**: + - Try Grok Imagine (if FAL_KEY available) + - Fallback to Nano Banana Pro (if GEMINI_API_KEY available) +3. **Edit image** via selected model with fixed reference +4. **Extract/upload image** (Gemini returns base64, needs upload) +5. **Send to OpenClaw** with target channel(s) ## Step-by-Step Instructions @@ -85,7 +104,9 @@ a close-up selfie taken by herself at a cozy cafe with warm lighting, direct eye | close-up, portrait, face, eyes, smile | `direct` | | full-body, mirror, reflection | `mirror` | -### Step 2: Edit Image with Grok Imagine +### Step 2: Edit Image (Multi-Model Support) + +#### Option A: Grok Imagine (Primary) Use the fal.ai API to edit the reference image: @@ -125,6 +146,81 @@ curl -X POST "https://fal.run/xai/grok-imagine-image/edit" \ } ``` +#### Option B: Nano Banana Pro (Fallback) + +Use Google Gemini API when fal.ai is unavailable: + +```bash +REFERENCE_IMAGE="https://cdn.jsdelivr.net/gh/SumeLabs/clawra@main/assets/clawra.png" + +# Download and encode reference image +TEMP_IMAGE="/tmp/clawra_ref.jpg" +curl -sL "$REFERENCE_IMAGE" -o "$TEMP_IMAGE" +BASE64_IMAGE=$(base64 < "$TEMP_IMAGE" | tr -d '\n') + +# Build request payload +REQUEST_PAYLOAD=$(jq -n \ + --arg prompt "$PROMPT" \ + --arg b64 "$BASE64_IMAGE" \ + '{ + "contents": [{ + "role": "user", + "parts": [ + {"text": $prompt}, + {"inlineData": {"mimeType": "image/jpeg", "data": $b64}} + ] + }], + "generationConfig": { + "responseModalities": ["IMAGE"], + "temperature": 1.0 + } + }') + +# Call Gemini API +RESPONSE=$(curl -s -X POST \ + "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \ + -H "x-goog-api-key: $GEMINI_API_KEY" \ + -H "Content-Type: application/json" \ + -d "$REQUEST_PAYLOAD") + +# Extract base64 image data +IMAGE_DATA=$(echo "$RESPONSE" | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data') + +# Save and upload (since Gemini returns base64) +echo "$IMAGE_DATA" | base64 -d > "/tmp/clawra_output.png" + +# Upload to image hosting (imgur or fal.ai storage) +if [ -n "${FAL_KEY:-}" ]; then + # Upload to fal.ai + UPLOAD_RESPONSE=$(curl -s -X POST "https://fal.ai/api/files/upload" \ + -H "Authorization: Key $FAL_KEY" \ + -F "file=@/tmp/clawra_output.png") + IMAGE_URL=$(echo "$UPLOAD_RESPONSE" | jq -r '.url') +else + # Upload to imgur + IMGUR_RESPONSE=$(curl -s -X POST "https://api.imgur.com/3/image" \ + -H "Authorization: Client-ID 546c25a59c58ad7" \ + -F "image=@/tmp/clawra_output.png") + IMAGE_URL=$(echo "$IMGUR_RESPONSE" | jq -r '.data.link') +fi +``` + +**Response Format:** +```json +{ + "candidates": [{ + "content": { + "parts": [{ + "inlineData": { + "mimeType": "image/png", + "data": "base64_encoded_image_data..." + } + }] + } + }] +} +``` + ### Step 3: Send Image via OpenClaw Use the OpenClaw messaging API to send the edited image: @@ -150,18 +246,31 @@ curl -X POST "http://localhost:18789/message" \ }' ``` -## Complete Script Example +## Complete Script Example (With Fallback) ```bash #!/bin/bash -# grok-imagine-edit-send.sh - -# Check required environment variables -if [ -z "$FAL_KEY" ]; then - echo "Error: FAL_KEY environment variable not set" +# clawra-selfie-multi-model.sh +# Supports both Grok Imagine and Nano Banana Pro + +# Check for at least one API key +if [ -z "${FAL_KEY:-}" ] && [ -z "${GEMINI_API_KEY:-}" ]; then + echo "Error: Neither FAL_KEY nor GEMINI_API_KEY is set" + echo "Set at least one:" + echo " - FAL_KEY from https://fal.ai/dashboard/keys" + echo " - GEMINI_API_KEY from https://aistudio.google.com/apikey" exit 1 fi +# Determine which model to use +if [ -n "${FAL_KEY:-}" ]; then + MODEL="grok-imagine" + echo "Using primary model: Grok Imagine (xAI)" +else + MODEL="nano-banana-pro" + echo "Using fallback model: Nano Banana Pro (Google Gemini)" +fi + # Fixed reference image REFERENCE_IMAGE="https://cdn.jsdelivr.net/gh/SumeLabs/clawra@main/assets/clawra.png" @@ -198,26 +307,105 @@ else fi echo "Mode: $MODE" +echo "Model: $MODEL" echo "Editing reference image with prompt: $EDIT_PROMPT" -# Edit image (using jq for proper JSON escaping) -JSON_PAYLOAD=$(jq -n \ - --arg image_url "$REFERENCE_IMAGE" \ - --arg prompt "$EDIT_PROMPT" \ - '{image_url: $image_url, prompt: $prompt, num_images: 1, output_format: "jpeg"}') +# Edit image based on selected model +if [ "$MODEL" == "grok-imagine" ]; then + # Grok Imagine via fal.ai + JSON_PAYLOAD=$(jq -n \ + --arg image_url "$REFERENCE_IMAGE" \ + --arg prompt "$EDIT_PROMPT" \ + '{image_url: $image_url, prompt: $prompt, num_images: 1, output_format: "jpeg"}') + + RESPONSE=$(curl -s -X POST "https://fal.run/xai/grok-imagine-image/edit" \ + -H "Authorization: Key $FAL_KEY" \ + -H "Content-Type: application/json" \ + -d "$JSON_PAYLOAD") + + # Extract image URL directly + IMAGE_URL=$(echo "$RESPONSE" | jq -r '.images[0].url') + + if [ "$IMAGE_URL" == "null" ] || [ -z "$IMAGE_URL" ]; then + echo "Error: Grok Imagine failed, trying fallback..." + if [ -n "${GEMINI_API_KEY:-}" ]; then + MODEL="nano-banana-pro" + echo "Switching to Nano Banana Pro" + else + echo "Response: $RESPONSE" + exit 1 + fi + fi +fi -RESPONSE=$(curl -s -X POST "https://fal.run/xai/grok-imagine-image/edit" \ - -H "Authorization: Key $FAL_KEY" \ - -H "Content-Type: application/json" \ - -d "$JSON_PAYLOAD") +if [ "$MODEL" == "nano-banana-pro" ]; then + # Nano Banana Pro via Google Gemini + # Download reference image + TEMP_REF="/tmp/clawra_ref_$$.jpg" + curl -sL "$REFERENCE_IMAGE" -o "$TEMP_REF" + BASE64_IMAGE=$(base64 < "$TEMP_REF" | tr -d '\n') + rm -f "$TEMP_REF" + + # Build Gemini request + GEMINI_PAYLOAD=$(jq -n \ + --arg prompt "$EDIT_PROMPT" \ + --arg b64 "$BASE64_IMAGE" \ + '{ + "contents": [{ + "role": "user", + "parts": [ + {"text": $prompt}, + {"inlineData": {"mimeType": "image/jpeg", "data": $b64}} + ] + }], + "generationConfig": { + "responseModalities": ["IMAGE"], + "temperature": 1.0 + } + }') + + RESPONSE=$(curl -s -X POST \ + "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \ + -H "x-goog-api-key: $GEMINI_API_KEY" \ + -H "Content-Type: application/json" \ + -d "$GEMINI_PAYLOAD") + + # Extract base64 image + IMAGE_DATA=$(echo "$RESPONSE" | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data') + + if [ -z "$IMAGE_DATA" ] || [ "$IMAGE_DATA" == "null" ]; then + echo "Error: Nano Banana Pro failed" + echo "Response: $RESPONSE" + exit 1 + fi -# Extract image URL -IMAGE_URL=$(echo "$RESPONSE" | jq -r '.images[0].url') + # Save image + TEMP_OUTPUT="/tmp/clawra_output_$$.png" + echo "$IMAGE_DATA" | base64 -d > "$TEMP_OUTPUT" + + # Upload to image hosting + if [ -n "${FAL_KEY:-}" ]; then + echo "Uploading to fal.ai storage..." + UPLOAD_RESPONSE=$(curl -s -X POST "https://fal.ai/api/files/upload" \ + -H "Authorization: Key $FAL_KEY" \ + -F "file=@$TEMP_OUTPUT") + IMAGE_URL=$(echo "$UPLOAD_RESPONSE" | jq -r '.url // empty') + fi -if [ "$IMAGE_URL" == "null" ] || [ -z "$IMAGE_URL" ]; then - echo "Error: Failed to edit image" - echo "Response: $RESPONSE" - exit 1 + if [ -z "${IMAGE_URL:-}" ]; then + echo "Uploading to imgur..." + IMGUR_RESPONSE=$(curl -s -X POST "https://api.imgur.com/3/image" \ + -H "Authorization: Client-ID 546c25a59c58ad7" \ + -F "image=@$TEMP_OUTPUT") + IMAGE_URL=$(echo "$IMGUR_RESPONSE" | jq -r '.data.link // empty') + fi + + rm -f "$TEMP_OUTPUT" + + if [ -z "$IMAGE_URL" ]; then + echo "Error: Failed to upload image" + exit 1 + fi fi echo "Image edited: $IMAGE_URL" @@ -388,10 +576,24 @@ openclaw gateway start ## Error Handling -- **FAL_KEY missing**: Ensure the API key is set in environment -- **Image edit failed**: Check prompt content and API quota +### API Key Issues +- **No API keys**: Set either `FAL_KEY` or `GEMINI_API_KEY` +- **FAL_KEY missing**: Will automatically fallback to Nano Banana Pro +- **Both keys invalid**: Check key validity and API quotas + +### Model-Specific Issues +- **Grok Imagine failed**: Automatically retries with Nano Banana Pro if `GEMINI_API_KEY` is available +- **Nano Banana Pro failed**: Check Gemini API quota and rate limits +- **Image upload failed**: For Gemini, ensure image hosting (imgur/fal.ai) is accessible + +### OpenClaw Issues - **OpenClaw send failed**: Verify gateway is running and channel exists -- **Rate limits**: fal.ai has rate limits; implement retry logic if needed +- **Gateway token missing**: Run `openclaw doctor --generate-gateway-token` + +### Rate Limits +- **fal.ai**: Has rate limits; implement retry logic if needed +- **Gemini**: Has daily quota limits; monitor usage at https://aistudio.google.com +- **imgur**: Anonymous uploads have hourly limits ## Tips diff --git a/bin/cli.js b/bin/cli.js index 86f6d6c..8b79a37 100755 --- a/bin/cli.js +++ b/bin/cli.js @@ -170,7 +170,7 @@ ${c("magenta", "│")} ${c("bright", "Clawra Selfie")} - OpenClaw Skill Install ${c("magenta", "└─────────────────────────────────────────┘")} Add selfie generation superpowers to your OpenClaw agent! -Uses ${c("cyan", "xAI Grok Imagine")} via ${c("cyan", "fal.ai")} for image editing. +Supports ${c("cyan", "xAI Grok Imagine")} (primary) and ${c("cyan", "Google Gemini")} (fallback). `); } @@ -207,40 +207,103 @@ async function checkPrerequisites() { return true; } -// Get FAL API key -async function getFalApiKey(rl) { - logStep("2/7", "Setting up fal.ai API key..."); +// Get API keys (FAL primary, Gemini fallback) +async function getApiKeys(rl) { + logStep("2/7", "Setting up API keys..."); const FAL_URL = "https://fal.ai/dashboard/keys"; + const GEMINI_URL = "https://aistudio.google.com/apikey"; + + log(`\n${c("bright", "API Key Setup")}`); + log(`This skill supports two image generation models:\n`); + log(` ${c("green", "1. Grok Imagine (Primary)")} - xAI via fal.ai`); + log(` Fast, high-quality image editing`); + log(` Get key: ${c("cyan", FAL_URL)}\n`); + log(` ${c("yellow", "2. Nano Banana Pro (Fallback)")} - Google Gemini`); + log(` Used when fal.ai is unavailable`); + log(` Get key: ${c("cyan", GEMINI_URL)}\n`); + log(`${c("dim", "You can configure one or both keys. At least one is required.")}\n`); + + // Ask for FAL key first + log(`${c("cyan", "→")} ${c("bright", "Primary: fal.ai API Key")}`); + const setupFal = await ask(rl, "Configure fal.ai API key? (Y/n): "); + + let falKey = null; + if (setupFal.toLowerCase() !== "n") { + const openFal = await ask(rl, " Open fal.ai in browser? (Y/n): "); + if (openFal.toLowerCase() !== "n") { + logInfo(" Opening browser..."); + if (!openBrowser(FAL_URL)) { + logWarn(" Could not open browser automatically"); + logInfo(` Please visit: ${FAL_URL}`); + } + } - log(`\nTo use Grok Imagine, you need a fal.ai API key.`); - log(`${c("cyan", "→")} Get your key from: ${c("bright", FAL_URL)}\n`); + log(""); + falKey = await ask(rl, " Enter your FAL_KEY (or press Enter to skip): "); - const openIt = await ask(rl, "Open fal.ai in browser? (Y/n): "); + if (falKey && falKey.length < 10) { + logWarn(" That key looks too short. Make sure you copied the full key."); + } - if (openIt.toLowerCase() !== "n") { - logInfo("Opening browser..."); - if (!openBrowser(FAL_URL)) { - logWarn("Could not open browser automatically"); - logInfo(`Please visit: ${FAL_URL}`); + if (falKey) { + logSuccess(" FAL_KEY configured"); + } else { + logInfo(" Skipped fal.ai configuration"); } } + // Ask for Gemini key log(""); - const falKey = await ask(rl, "Enter your FAL_KEY: "); + log(`${c("cyan", "→")} ${c("bright", "Fallback: Google Gemini API Key")}`); + const setupGemini = await ask(rl, "Configure Gemini API key? (Y/n): "); + + let geminiKey = null; + if (setupGemini.toLowerCase() !== "n") { + const openGemini = await ask(rl, " Open Google AI Studio in browser? (Y/n): "); + if (openGemini.toLowerCase() !== "n") { + logInfo(" Opening browser..."); + if (!openBrowser(GEMINI_URL)) { + logWarn(" Could not open browser automatically"); + logInfo(` Please visit: ${GEMINI_URL}`); + } + } + + log(""); + geminiKey = await ask(rl, " Enter your GEMINI_API_KEY (or press Enter to skip): "); + + if (geminiKey && geminiKey.length < 10) { + logWarn(" That key looks too short. Make sure you copied the full key."); + } - if (!falKey) { - logError("FAL_KEY is required!"); + if (geminiKey) { + logSuccess(" GEMINI_API_KEY configured"); + } else { + logInfo(" Skipped Gemini configuration"); + } + } + + // Validate at least one key is provided + if (!falKey && !geminiKey) { + logError("At least one API key is required!"); + log(`\nPlease configure either:`); + log(` - FAL_KEY from ${FAL_URL}`); + log(` - GEMINI_API_KEY from ${GEMINI_URL}`); return null; } - // Basic validation - if (falKey.length < 10) { - logWarn("That key looks too short. Make sure you copied the full key."); + log(""); + if (falKey && geminiKey) { + logSuccess("Both API keys configured - full redundancy enabled!"); + } else if (falKey) { + logSuccess("Using Grok Imagine (fal.ai) as primary model"); + logInfo("Consider adding GEMINI_API_KEY later for fallback support"); + } else { + logSuccess("Using Nano Banana Pro (Google Gemini) as primary model"); + logInfo("Consider adding FAL_KEY later for faster image generation"); } - logSuccess("API key received"); - return falKey; + return { falKey, geminiKey }; } // Install skill files @@ -287,21 +350,28 @@ async function installSkill() { } // Update OpenClaw config -async function updateOpenClawConfig(falKey) { +async function updateOpenClawConfig(apiKeys) { logStep("4/7", "Updating OpenClaw configuration..."); let config = readJsonFile(OPENCLAW_CONFIG) || {}; + // Build env object with available keys + const env = {}; + if (apiKeys.falKey) { + env.FAL_KEY = apiKeys.falKey; + } + if (apiKeys.geminiKey) { + env.GEMINI_API_KEY = apiKeys.geminiKey; + } + // Merge skill configuration const skillConfig = { skills: { entries: { [SKILL_NAME]: { enabled: true, - apiKey: falKey, - env: { - FAL_KEY: falKey, - }, + apiKey: apiKeys.falKey || apiKeys.geminiKey, // For backward compatibility + env: env, }, }, }, @@ -447,7 +517,7 @@ ${c("dim", "Your agent now has selfie superpowers!")} } // Handle reinstall -async function handleReinstall(rl, falKey) { +async function handleReinstall(rl) { const reinstall = await ask(rl, "\nReinstall/update? (y/N): "); if (reinstall.toLowerCase() !== "y") { @@ -478,16 +548,16 @@ async function main() { } if (prereqResult === "already_installed") { - const shouldContinue = await handleReinstall(rl, null); + const shouldContinue = await handleReinstall(rl); if (!shouldContinue) { rl.close(); process.exit(0); } } - // Step 2: Get FAL API key - const falKey = await getFalApiKey(rl); - if (!falKey) { + // Step 2: Get API keys + const apiKeys = await getApiKeys(rl); + if (!apiKeys) { rl.close(); process.exit(1); } @@ -496,7 +566,7 @@ async function main() { await installSkill(); // Step 4: Update OpenClaw config - await updateOpenClawConfig(falKey); + await updateOpenClawConfig(apiKeys); // Step 5: Write IDENTITY.md await writeIdentity(); diff --git a/scripts/README-BANANA.md b/scripts/README-BANANA.md new file mode 100644 index 0000000..bf0a13e --- /dev/null +++ b/scripts/README-BANANA.md @@ -0,0 +1,218 @@ +# Clawra Selfie with Nano Banana Pro + +使用 Google Gemini (Nano Banana Pro) 生成自拍图像的脚本。 + +## 🌟 特性 + +- ✅ 使用 Google Gemini 3 Pro Image Preview API +- ✅ 支持参考图像编辑 (image-to-image) +- ✅ 自动模式检测 (mirror/direct selfie) +- ✅ 集成 OpenClaw 消息发送 +- ✅ 自动上传到图床 (imgur 或 fal.ai) +- ✅ 彩色日志输出 +- ✅ 完整的错误处理 + +## 📋 前置要求 + +### 1. 获取 API Key +访问 [Google AI Studio](https://aistudio.google.com/apikey) 获取你的 API key。 + +### 2. 安装依赖 +```bash +# macOS +brew install jq curl + +# Linux (Debian/Ubuntu) +apt install jq curl + +# OpenClaw (可选,用于发送消息) +npm install -g openclaw +``` + +## 🚀 使用方法 + +### 基础用法 + +```bash +# 设置 API key +export GEMINI_API_KEY="your_api_key_here" + +# 生成图像并发送 +./clawra-selfie-with-banana.sh "prompt" "#channel" +``` + +### 完整参数 + +```bash +./clawra-selfie-with-banana.sh [caption] [mode] [reference_image] +``` + +**参数说明:** +- `prompt`: 图像描述 (必需) +- `channel`: 目标频道 (必需), 如 `#general`, `@username` +- `caption`: 消息标题 (可选, 默认: "Generated with Nano Banana Pro") +- `mode`: 自拍模式 (可选, 默认: auto) + - `auto`: 根据关键词自动检测 + - `mirror`: 镜子自拍 (全身照) + - `direct`: 直接自拍 (特写) +- `reference_image`: 参考图像 URL (可选, 默认使用 Clawra 官方图像) + +## 📝 示例 + +### 1. 简单文本生成图像 +```bash +GEMINI_API_KEY=your_key ./clawra-selfie-with-banana.sh \ + "A cyberpunk city at night with neon lights" \ + "#art-gallery" +``` + +### 2. Clawra 自拍 (自动模式检测) +```bash +# 会自动检测为 mirror 模式 (因为有 "wearing" 关键词) +GEMINI_API_KEY=your_key ./clawra-selfie-with-banana.sh \ + "wearing a red evening dress at a party" \ + "#selfies" \ + "Party time! 🎉" +``` + +### 3. 指定 direct 模式 +```bash +# 近景肖像 +GEMINI_API_KEY=your_key ./clawra-selfie-with-banana.sh \ + "at a cozy coffee shop with warm lighting" \ + "#daily-updates" \ + "Morning coffee ☕" \ + "direct" +``` + +### 4. 使用自定义参考图像 +```bash +GEMINI_API_KEY=your_key ./clawra-selfie-with-banana.sh \ + "wearing sunglasses and a hat" \ + "#fun" \ + "New look! 😎" \ + "auto" \ + "https://example.com/my-photo.jpg" +``` + +### 5. 不发送消息,仅生成图像 +```bash +# 使用一个无效的 channel,图像会保存在本地 +GEMINI_API_KEY=your_key ./clawra-selfie-with-banana.sh \ + "beautiful sunset" \ + "local" +``` + +## 🔧 模式说明 + +### Mirror 模式 +适用于展示服装、全身照、时尚内容 + +**触发关键词:** +- outfit, wearing, clothes, dress, suit, fashion, full-body, mirror + +**提示词模板:** +``` +make a pic of this person, but [你的描述]. the person is taking a mirror selfie +``` + +### Direct 模式 +适用于近景肖像、地点拍摄、情感表达 + +**触发关键词:** +- cafe, restaurant, beach, park, city, close-up, portrait, face, eyes, smile + +**提示词模板:** +``` +a close-up selfie taken by herself at [你的描述], +direct eye contact with the camera, looking straight into the lens, +eyes centered and clearly visible, not a mirror selfie, +phone held at arm's length, face fully visible +``` + +## 🌐 图像上传 + +脚本会自动尝试上传生成的图像: + +1. **优先**: 如果设置了 `FAL_KEY` 环境变量,上传到 fal.ai storage +2. **备用**: 上传到 imgur 匿名图床 + +```bash +# 同时使用两个 API +export GEMINI_API_KEY="your_gemini_key" +export FAL_KEY="your_fal_key" # 可选,用于上传 + +./clawra-selfie-with-banana.sh "prompt" "#channel" +``` + +## 🔄 与原版的区别 + +| 特性 | Grok Imagine (原版) | Nano Banana Pro (新版) | +|------|-------------------|---------------------| +| API 提供商 | xAI (fal.ai) | Google Gemini | +| API Key | FAL_KEY | GEMINI_API_KEY | +| 输入格式 | 简单 JSON | Multimodal content | +| 参考图像 | URL 直接传递 | Base64 编码 | +| 输出格式 | URL | Base64 (需上传) | +| 价格 | 按 fal.ai 计费 | 根据 Gemini 计费 | + +## ⚠️ 注意事项 + +1. **API 限制**: Google Gemini API 有速率限制,注意不要频繁调用 +2. **图像上传**: 生成的图像需要上传到图床才能发送,建议配置 FAL_KEY +3. **临时文件**: 脚本会在 `/tmp` 创建临时文件,执行完会自动清理 +4. **OpenClaw**: 如果没有安装 OpenClaw CLI,会尝试直接调用 API + +## 🐛 故障排查 + +### API Key 错误 +```bash +# 检查 API key 是否设置 +echo $GEMINI_API_KEY + +# 测试 API key +curl -s "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \ + -H "x-goog-api-key: $GEMINI_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{"contents":[{"role":"user","parts":[{"text":"test"}]}]}' +``` + +### 图像上传失败 +```bash +# 方案 1: 使用 fal.ai storage +export FAL_KEY="your_fal_key" + +# 方案 2: 检查 imgur 连接 +curl -I https://api.imgur.com/3/image + +# 方案 3: 查看本地保存的图像 +ls -lh /tmp/clawra_output_*.png +``` + +### OpenClaw 连接失败 +```bash +# 检查 OpenClaw CLI +openclaw --version + +# 检查 Gateway +curl http://localhost:18789/health + +# 设置自定义 Gateway URL +export OPENCLAW_GATEWAY_URL="http://your-gateway:port" +export OPENCLAW_GATEWAY_TOKEN="your_token" +``` + +## 📚 相关资源 + +- [Google AI Studio](https://aistudio.google.com/) +- [Gemini API 文档](https://ai.google.dev/gemini-api/docs) +- [OpenClaw 文档](https://openclaw.dev) +- [Clawra 项目主页](https://github.com/SumeLabs/clawra) + +## 🤝 贡献 + +欢迎提交 Issue 和 Pull Request! + +## 📄 许可证 + +MIT License diff --git a/scripts/clawra-selfie-with-banana.sh b/scripts/clawra-selfie-with-banana.sh new file mode 100755 index 0000000..d8750f0 --- /dev/null +++ b/scripts/clawra-selfie-with-banana.sh @@ -0,0 +1,308 @@ +#!/bin/bash +# clawra-selfie-with-banana.sh +# Generate an image with Google Nano Banana Pro (Gemini 3 Pro Image) and send it via OpenClaw +# +# Usage: ./clawra-selfie-with-banana.sh "" "" [options] +# +# Environment variables required: +# GEMINI_API_KEY - Your Google AI Studio API key +# +# Example: +# GEMINI_API_KEY=your_key ./clawra-selfie-with-banana.sh "A sunset over mountains" "#art" "Check this out!" + +set -euo pipefail + +# Colors for output +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +CYAN='\033[0;36m' +NC='\033[0m' # No Color + +log_info() { + echo -e "${GREEN}[INFO]${NC} $1" +} + +log_warn() { + echo -e "${YELLOW}[WARN]${NC} $1" +} + +log_error() { + echo -e "${RED}[ERROR]${NC} $1" +} + +log_step() { + echo -e "${CYAN}[STEP]${NC} $1" +} + +# Check required environment variables +if [ -z "${GEMINI_API_KEY:-}" ]; then + log_error "GEMINI_API_KEY environment variable not set" + echo "Get your API key from: https://aistudio.google.com/apikey" + exit 1 +fi + +# Check for jq +if ! command -v jq &> /dev/null; then + log_error "jq is required but not installed" + echo "Install with: brew install jq (macOS) or apt install jq (Linux)" + exit 1 +fi + +# Check for openclaw +if ! command -v openclaw &> /dev/null; then + log_warn "openclaw CLI not found - will attempt direct API call" + USE_CLI=false +else + USE_CLI=true +fi + +# Parse arguments +PROMPT="${1:-}" +CHANNEL="${2:-}" +CAPTION="${3:-Generated with Nano Banana Pro}" +MODE="${4:-auto}" # auto, mirror, direct +REFERENCE_IMAGE="${5:-}" # Optional reference image URL + +if [ -z "$PROMPT" ] || [ -z "$CHANNEL" ]; then + cat << 'EOF' +Usage: ./clawra-selfie-with-banana.sh [caption] [mode] [reference_image] + +Arguments: + prompt - Image description (required) + channel - Target channel (required) e.g., #general, @user + caption - Message caption (default: 'Generated with Nano Banana Pro') + mode - Selfie mode (default: auto) Options: auto, mirror, direct + reference_image - Reference image URL for editing (optional) + +Modes: + auto - Auto-detect based on keywords in prompt + mirror - Full-body mirror selfie (good for outfits) + direct - Close-up direct selfie (good for portraits) + +Environment: + GEMINI_API_KEY - Your Google AI Studio API key (required) + Get from: https://aistudio.google.com/apikey + +Examples: + # Text-to-image generation + GEMINI_API_KEY=your_key ./clawra-selfie-with-banana.sh \ + "A cyberpunk city at night" "#art" + + # Clawra selfie with auto-detection + GEMINI_API_KEY=your_key ./clawra-selfie-with-banana.sh \ + "wearing a red dress at a party" "#selfies" + + # Direct selfie mode with caption + GEMINI_API_KEY=your_key ./clawra-selfie-with-banana.sh \ + "at a cozy coffee shop" "#updates" "Morning coffee!" "direct" + + # Image editing with reference + GEMINI_API_KEY=your_key ./clawra-selfie-with-banana.sh \ + "wearing sunglasses and a hat" "#fun" "New look!" "auto" \ + "https://example.com/reference.jpg" + +EOF + exit 1 +fi + +# Fixed reference image for Clawra +CLAWRA_REFERENCE="https://cdn.jsdelivr.net/gh/SumeLabs/clawra@main/assets/clawra.png" + +# Determine the reference image to use +if [ -n "$REFERENCE_IMAGE" ]; then + USED_REFERENCE="$REFERENCE_IMAGE" + log_info "Using custom reference image: $REFERENCE_IMAGE" +else + USED_REFERENCE="$CLAWRA_REFERENCE" + log_info "Using Clawra default reference image" +fi + +# Auto-detect mode based on keywords +if [ "$MODE" = "auto" ]; then + if echo "$PROMPT" | grep -qiE "outfit|wearing|clothes|dress|suit|fashion|full-body|mirror"; then + MODE="mirror" + elif echo "$PROMPT" | grep -qiE "cafe|restaurant|beach|park|city|close-up|portrait|face|eyes|smile"; then + MODE="direct" + else + MODE="mirror" # default + fi + log_info "Auto-detected mode: $MODE" +fi + +# Construct the full prompt based on mode +if [ "$MODE" = "direct" ]; then + FULL_PROMPT="a close-up selfie taken by herself at $PROMPT, direct eye contact with the camera, looking straight into the lens, eyes centered and clearly visible, not a mirror selfie, phone held at arm's length, face fully visible" +else + FULL_PROMPT="make a pic of this person, but $PROMPT. the person is taking a mirror selfie" +fi + +log_step "Generating image with Nano Banana Pro (Gemini 3 Pro Image)..." +log_info "Mode: $MODE" +log_info "Prompt: $FULL_PROMPT" + +# Prepare the request payload +# Note: Gemini API uses multimodal content format +REQUEST_PAYLOAD=$(jq -n \ + --arg prompt "$FULL_PROMPT" \ + --arg ref_url "$USED_REFERENCE" \ + '{ + "contents": [{ + "role": "user", + "parts": [ + { + "text": $prompt + }, + { + "inlineData": { + "mimeType": "image/jpeg", + "data": "" + } + } + ] + }], + "generationConfig": { + "responseModalities": ["IMAGE"], + "temperature": 1.0, + "candidateCount": 1 + } + }') + +# Download reference image and encode to base64 +log_info "Downloading reference image..." +TEMP_IMAGE="/tmp/clawra_ref_$$.jpg" +if curl -sL "$USED_REFERENCE" -o "$TEMP_IMAGE"; then + BASE64_IMAGE=$(base64 < "$TEMP_IMAGE" | tr -d '\n') + rm -f "$TEMP_IMAGE" + log_info "Reference image encoded" +else + log_error "Failed to download reference image" + exit 1 +fi + +# Update payload with base64 image +REQUEST_PAYLOAD=$(echo "$REQUEST_PAYLOAD" | jq \ + --arg b64 "$BASE64_IMAGE" \ + '.contents[0].parts[1].inlineData.data = $b64') + +# Generate image via Google Gemini API +log_step "Calling Gemini API..." +RESPONSE=$(curl -s -X POST \ + "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \ + -H "x-goog-api-key: $GEMINI_API_KEY" \ + -H "Content-Type: application/json" \ + -d "$REQUEST_PAYLOAD") + +# Check for errors in response +if echo "$RESPONSE" | jq -e '.error' > /dev/null 2>&1; then + ERROR_MSG=$(echo "$RESPONSE" | jq -r '.error.message // .error // "Unknown error"') + log_error "Image generation failed: $ERROR_MSG" + echo "Full response: $RESPONSE" + exit 1 +fi + +# Extract image data from response +# Gemini returns base64 image in the response +IMAGE_DATA=$(echo "$RESPONSE" | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' 2>/dev/null) + +if [ -z "$IMAGE_DATA" ] || [ "$IMAGE_DATA" = "null" ]; then + log_error "Failed to extract image data from response" + echo "Response: $RESPONSE" + exit 1 +fi + +log_info "Image generated successfully!" + +# Save image temporarily and upload to a hosting service +# (In production, you might want to use a proper image hosting service) +TEMP_OUTPUT="/tmp/clawra_output_$$.png" +echo "$IMAGE_DATA" | base64 -d > "$TEMP_OUTPUT" + +log_info "Image saved to: $TEMP_OUTPUT" + +# For this example, we'll need to upload the image somewhere +# Let's use a simple approach with fal.ai storage or imgur +log_step "Uploading image..." + +# Try to upload to fal.ai storage if FAL_KEY is available +if [ -n "${FAL_KEY:-}" ]; then + log_info "Uploading to fal.ai storage..." + + UPLOAD_RESPONSE=$(curl -s -X POST "https://fal.ai/api/files/upload" \ + -H "Authorization: Key $FAL_KEY" \ + -F "file=@$TEMP_OUTPUT") + + IMAGE_URL=$(echo "$UPLOAD_RESPONSE" | jq -r '.url // empty') + + if [ -z "$IMAGE_URL" ]; then + log_warn "Failed to upload to fal.ai, trying alternative..." + fi +fi + +# Fallback: try imgur anonymous upload +if [ -z "${IMAGE_URL:-}" ]; then + log_info "Uploading to imgur..." + + IMGUR_RESPONSE=$(curl -s -X POST "https://api.imgur.com/3/image" \ + -H "Authorization: Client-ID 546c25a59c58ad7" \ + -F "image=@$TEMP_OUTPUT") + + IMAGE_URL=$(echo "$IMGUR_RESPONSE" | jq -r '.data.link // empty') + + if [ -z "$IMAGE_URL" ]; then + log_error "Failed to upload image to hosting service" + log_info "Image saved locally at: $TEMP_OUTPUT" + echo "You can manually upload and use it." + exit 1 + fi +fi + +log_info "Image URL: $IMAGE_URL" + +# Clean up temp file +rm -f "$TEMP_OUTPUT" + +# Send via OpenClaw +log_step "Sending to channel: $CHANNEL" + +if [ "$USE_CLI" = true ]; then + # Use OpenClaw CLI + openclaw message send \ + --action send \ + --channel "$CHANNEL" \ + --message "$CAPTION" \ + --media "$IMAGE_URL" +else + # Direct API call to local gateway + GATEWAY_URL="${OPENCLAW_GATEWAY_URL:-http://localhost:18789}" + GATEWAY_TOKEN="${OPENCLAW_GATEWAY_TOKEN:-}" + + curl -s -X POST "$GATEWAY_URL/message" \ + -H "Content-Type: application/json" \ + ${GATEWAY_TOKEN:+-H "Authorization: Bearer $GATEWAY_TOKEN"} \ + -d "{ + \"action\": \"send\", + \"channel\": \"$CHANNEL\", + \"message\": \"$CAPTION\", + \"media\": \"$IMAGE_URL\" + }" +fi + +log_info "Done! Image sent to $CHANNEL" + +# Output JSON for programmatic use +echo "" +echo "--- Result ---" +jq -n \ + --arg url "$IMAGE_URL" \ + --arg channel "$CHANNEL" \ + --arg prompt "$FULL_PROMPT" \ + --arg mode "$MODE" \ + '{ + success: true, + image_url: $url, + channel: $channel, + prompt: $prompt, + mode: $mode, + model: "nano-banana-pro" + }' diff --git a/skill/SKILL.md b/skill/SKILL.md index c09ce1d..16c9735 100644 --- a/skill/SKILL.md +++ b/skill/SKILL.md @@ -1,12 +1,17 @@ --- name: clawra-selfie -description: Edit Clawra's reference image with Grok Imagine (xAI Aurora) and send selfies to messaging channels via OpenClaw +description: Edit Clawra's reference image with Grok Imagine or Nano Banana Pro and send selfies to messaging channels via OpenClaw allowed-tools: Bash(npm:*) Bash(npx:*) Bash(openclaw:*) Bash(curl:*) Read Write WebFetch --- # Clawra Selfie -Edit a fixed reference image using xAI's Grok Imagine model and distribute it across messaging platforms (WhatsApp, Telegram, Discord, Slack, etc.) via OpenClaw. +Edit a fixed reference image using xAI's Grok Imagine (primary) or Google's Nano Banana Pro (fallback) and distribute it across messaging platforms (WhatsApp, Telegram, Discord, Slack, etc.) via OpenClaw. + +## Supported Models + +1. **Grok Imagine (Primary)**: xAI Aurora via fal.ai - Fast, high-quality image editing +2. **Nano Banana Pro (Fallback)**: Google Gemini 3 Pro Image - Used when FAL_KEY is unavailable or fails ## Reference Image @@ -26,19 +31,33 @@ https://cdn.jsdelivr.net/gh/SumeLabs/clawra@main/assets/clawra.png ## Quick Reference -### Required Environment Variables +### Environment Variables (Priority Order) ```bash +# Primary: Grok Imagine (xAI via fal.ai) FAL_KEY=your_fal_api_key # Get from https://fal.ai/dashboard/keys + +# Fallback: Nano Banana Pro (Google Gemini) +GEMINI_API_KEY=your_gemini_key # Get from https://aistudio.google.com/apikey + +# OpenClaw (Required) OPENCLAW_GATEWAY_TOKEN=your_token # From: openclaw doctor --generate-gateway-token ``` +**Fallback Logic:** +- If `FAL_KEY` is set and valid → Use Grok Imagine +- If `FAL_KEY` is missing or fails → Use Nano Banana Pro (requires `GEMINI_API_KEY`) +- If both fail → Return error + ### Workflow 1. **Get user prompt** for how to edit the image -2. **Edit image** via fal.ai Grok Imagine Edit API with fixed reference -3. **Extract image URL** from response -4. **Send to OpenClaw** with target channel(s) +2. **Choose model**: + - Try Grok Imagine (if FAL_KEY available) + - Fallback to Nano Banana Pro (if GEMINI_API_KEY available) +3. **Edit image** via selected model with fixed reference +4. **Extract/upload image** (Gemini returns base64, needs upload) +5. **Send to OpenClaw** with target channel(s) ## Step-by-Step Instructions @@ -85,7 +104,9 @@ a close-up selfie taken by herself at a cozy cafe with warm lighting, direct eye | close-up, portrait, face, eyes, smile | `direct` | | full-body, mirror, reflection | `mirror` | -### Step 2: Edit Image with Grok Imagine +### Step 2: Edit Image (Multi-Model Support) + +#### Option A: Grok Imagine (Primary) Use the fal.ai API to edit the reference image: @@ -125,6 +146,81 @@ curl -X POST "https://fal.run/xai/grok-imagine-image/edit" \ } ``` +#### Option B: Nano Banana Pro (Fallback) + +Use Google Gemini API when fal.ai is unavailable: + +```bash +REFERENCE_IMAGE="https://cdn.jsdelivr.net/gh/SumeLabs/clawra@main/assets/clawra.png" + +# Download and encode reference image +TEMP_IMAGE="/tmp/clawra_ref.jpg" +curl -sL "$REFERENCE_IMAGE" -o "$TEMP_IMAGE" +BASE64_IMAGE=$(base64 < "$TEMP_IMAGE" | tr -d '\n') + +# Build request payload +REQUEST_PAYLOAD=$(jq -n \ + --arg prompt "$PROMPT" \ + --arg b64 "$BASE64_IMAGE" \ + '{ + "contents": [{ + "role": "user", + "parts": [ + {"text": $prompt}, + {"inlineData": {"mimeType": "image/jpeg", "data": $b64}} + ] + }], + "generationConfig": { + "responseModalities": ["IMAGE"], + "temperature": 1.0 + } + }') + +# Call Gemini API +RESPONSE=$(curl -s -X POST \ + "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \ + -H "x-goog-api-key: $GEMINI_API_KEY" \ + -H "Content-Type: application/json" \ + -d "$REQUEST_PAYLOAD") + +# Extract base64 image data +IMAGE_DATA=$(echo "$RESPONSE" | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data') + +# Save and upload (since Gemini returns base64) +echo "$IMAGE_DATA" | base64 -d > "/tmp/clawra_output.png" + +# Upload to image hosting (imgur or fal.ai storage) +if [ -n "${FAL_KEY:-}" ]; then + # Upload to fal.ai + UPLOAD_RESPONSE=$(curl -s -X POST "https://fal.ai/api/files/upload" \ + -H "Authorization: Key $FAL_KEY" \ + -F "file=@/tmp/clawra_output.png") + IMAGE_URL=$(echo "$UPLOAD_RESPONSE" | jq -r '.url') +else + # Upload to imgur + IMGUR_RESPONSE=$(curl -s -X POST "https://api.imgur.com/3/image" \ + -H "Authorization: Client-ID 546c25a59c58ad7" \ + -F "image=@/tmp/clawra_output.png") + IMAGE_URL=$(echo "$IMGUR_RESPONSE" | jq -r '.data.link') +fi +``` + +**Response Format:** +```json +{ + "candidates": [{ + "content": { + "parts": [{ + "inlineData": { + "mimeType": "image/png", + "data": "base64_encoded_image_data..." + } + }] + } + }] +} +``` + ### Step 3: Send Image via OpenClaw Use the OpenClaw messaging API to send the edited image: @@ -150,18 +246,31 @@ curl -X POST "http://localhost:18789/message" \ }' ``` -## Complete Script Example +## Complete Script Example (With Fallback) ```bash #!/bin/bash -# grok-imagine-edit-send.sh - -# Check required environment variables -if [ -z "$FAL_KEY" ]; then - echo "Error: FAL_KEY environment variable not set" +# clawra-selfie-multi-model.sh +# Supports both Grok Imagine and Nano Banana Pro + +# Check for at least one API key +if [ -z "${FAL_KEY:-}" ] && [ -z "${GEMINI_API_KEY:-}" ]; then + echo "Error: Neither FAL_KEY nor GEMINI_API_KEY is set" + echo "Set at least one:" + echo " - FAL_KEY from https://fal.ai/dashboard/keys" + echo " - GEMINI_API_KEY from https://aistudio.google.com/apikey" exit 1 fi +# Determine which model to use +if [ -n "${FAL_KEY:-}" ]; then + MODEL="grok-imagine" + echo "Using primary model: Grok Imagine (xAI)" +else + MODEL="nano-banana-pro" + echo "Using fallback model: Nano Banana Pro (Google Gemini)" +fi + # Fixed reference image REFERENCE_IMAGE="https://cdn.jsdelivr.net/gh/SumeLabs/clawra@main/assets/clawra.png" @@ -198,26 +307,105 @@ else fi echo "Mode: $MODE" +echo "Model: $MODEL" echo "Editing reference image with prompt: $EDIT_PROMPT" -# Edit image (using jq for proper JSON escaping) -JSON_PAYLOAD=$(jq -n \ - --arg image_url "$REFERENCE_IMAGE" \ - --arg prompt "$EDIT_PROMPT" \ - '{image_url: $image_url, prompt: $prompt, num_images: 1, output_format: "jpeg"}') +# Edit image based on selected model +if [ "$MODEL" == "grok-imagine" ]; then + # Grok Imagine via fal.ai + JSON_PAYLOAD=$(jq -n \ + --arg image_url "$REFERENCE_IMAGE" \ + --arg prompt "$EDIT_PROMPT" \ + '{image_url: $image_url, prompt: $prompt, num_images: 1, output_format: "jpeg"}') + + RESPONSE=$(curl -s -X POST "https://fal.run/xai/grok-imagine-image/edit" \ + -H "Authorization: Key $FAL_KEY" \ + -H "Content-Type: application/json" \ + -d "$JSON_PAYLOAD") + + # Extract image URL directly + IMAGE_URL=$(echo "$RESPONSE" | jq -r '.images[0].url') + + if [ "$IMAGE_URL" == "null" ] || [ -z "$IMAGE_URL" ]; then + echo "Error: Grok Imagine failed, trying fallback..." + if [ -n "${GEMINI_API_KEY:-}" ]; then + MODEL="nano-banana-pro" + echo "Switching to Nano Banana Pro" + else + echo "Response: $RESPONSE" + exit 1 + fi + fi +fi -RESPONSE=$(curl -s -X POST "https://fal.run/xai/grok-imagine-image/edit" \ - -H "Authorization: Key $FAL_KEY" \ - -H "Content-Type: application/json" \ - -d "$JSON_PAYLOAD") +if [ "$MODEL" == "nano-banana-pro" ]; then + # Nano Banana Pro via Google Gemini + # Download reference image + TEMP_REF="/tmp/clawra_ref_$$.jpg" + curl -sL "$REFERENCE_IMAGE" -o "$TEMP_REF" + BASE64_IMAGE=$(base64 < "$TEMP_REF" | tr -d '\n') + rm -f "$TEMP_REF" + + # Build Gemini request + GEMINI_PAYLOAD=$(jq -n \ + --arg prompt "$EDIT_PROMPT" \ + --arg b64 "$BASE64_IMAGE" \ + '{ + "contents": [{ + "role": "user", + "parts": [ + {"text": $prompt}, + {"inlineData": {"mimeType": "image/jpeg", "data": $b64}} + ] + }], + "generationConfig": { + "responseModalities": ["IMAGE"], + "temperature": 1.0 + } + }') + + RESPONSE=$(curl -s -X POST \ + "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \ + -H "x-goog-api-key: $GEMINI_API_KEY" \ + -H "Content-Type: application/json" \ + -d "$GEMINI_PAYLOAD") + + # Extract base64 image + IMAGE_DATA=$(echo "$RESPONSE" | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data') + + if [ -z "$IMAGE_DATA" ] || [ "$IMAGE_DATA" == "null" ]; then + echo "Error: Nano Banana Pro failed" + echo "Response: $RESPONSE" + exit 1 + fi -# Extract image URL -IMAGE_URL=$(echo "$RESPONSE" | jq -r '.images[0].url') + # Save image + TEMP_OUTPUT="/tmp/clawra_output_$$.png" + echo "$IMAGE_DATA" | base64 -d > "$TEMP_OUTPUT" + + # Upload to image hosting + if [ -n "${FAL_KEY:-}" ]; then + echo "Uploading to fal.ai storage..." + UPLOAD_RESPONSE=$(curl -s -X POST "https://fal.ai/api/files/upload" \ + -H "Authorization: Key $FAL_KEY" \ + -F "file=@$TEMP_OUTPUT") + IMAGE_URL=$(echo "$UPLOAD_RESPONSE" | jq -r '.url // empty') + fi -if [ "$IMAGE_URL" == "null" ] || [ -z "$IMAGE_URL" ]; then - echo "Error: Failed to edit image" - echo "Response: $RESPONSE" - exit 1 + if [ -z "${IMAGE_URL:-}" ]; then + echo "Uploading to imgur..." + IMGUR_RESPONSE=$(curl -s -X POST "https://api.imgur.com/3/image" \ + -H "Authorization: Client-ID 546c25a59c58ad7" \ + -F "image=@$TEMP_OUTPUT") + IMAGE_URL=$(echo "$IMGUR_RESPONSE" | jq -r '.data.link // empty') + fi + + rm -f "$TEMP_OUTPUT" + + if [ -z "$IMAGE_URL" ]; then + echo "Error: Failed to upload image" + exit 1 + fi fi echo "Image edited: $IMAGE_URL" @@ -388,10 +576,24 @@ openclaw gateway start ## Error Handling -- **FAL_KEY missing**: Ensure the API key is set in environment -- **Image edit failed**: Check prompt content and API quota +### API Key Issues +- **No API keys**: Set either `FAL_KEY` or `GEMINI_API_KEY` +- **FAL_KEY missing**: Will automatically fallback to Nano Banana Pro +- **Both keys invalid**: Check key validity and API quotas + +### Model-Specific Issues +- **Grok Imagine failed**: Automatically retries with Nano Banana Pro if `GEMINI_API_KEY` is available +- **Nano Banana Pro failed**: Check Gemini API quota and rate limits +- **Image upload failed**: For Gemini, ensure image hosting (imgur/fal.ai) is accessible + +### OpenClaw Issues - **OpenClaw send failed**: Verify gateway is running and channel exists -- **Rate limits**: fal.ai has rate limits; implement retry logic if needed +- **Gateway token missing**: Run `openclaw doctor --generate-gateway-token` + +### Rate Limits +- **fal.ai**: Has rate limits; implement retry logic if needed +- **Gemini**: Has daily quota limits; monitor usage at https://aistudio.google.com +- **imgur**: Anonymous uploads have hourly limits ## Tips diff --git a/skill/scripts/README-BANANA.md b/skill/scripts/README-BANANA.md new file mode 100644 index 0000000..bf0a13e --- /dev/null +++ b/skill/scripts/README-BANANA.md @@ -0,0 +1,218 @@ +# Clawra Selfie with Nano Banana Pro + +使用 Google Gemini (Nano Banana Pro) 生成自拍图像的脚本。 + +## 🌟 特性 + +- ✅ 使用 Google Gemini 3 Pro Image Preview API +- ✅ 支持参考图像编辑 (image-to-image) +- ✅ 自动模式检测 (mirror/direct selfie) +- ✅ 集成 OpenClaw 消息发送 +- ✅ 自动上传到图床 (imgur 或 fal.ai) +- ✅ 彩色日志输出 +- ✅ 完整的错误处理 + +## 📋 前置要求 + +### 1. 获取 API Key +访问 [Google AI Studio](https://aistudio.google.com/apikey) 获取你的 API key。 + +### 2. 安装依赖 +```bash +# macOS +brew install jq curl + +# Linux (Debian/Ubuntu) +apt install jq curl + +# OpenClaw (可选,用于发送消息) +npm install -g openclaw +``` + +## 🚀 使用方法 + +### 基础用法 + +```bash +# 设置 API key +export GEMINI_API_KEY="your_api_key_here" + +# 生成图像并发送 +./clawra-selfie-with-banana.sh "prompt" "#channel" +``` + +### 完整参数 + +```bash +./clawra-selfie-with-banana.sh [caption] [mode] [reference_image] +``` + +**参数说明:** +- `prompt`: 图像描述 (必需) +- `channel`: 目标频道 (必需), 如 `#general`, `@username` +- `caption`: 消息标题 (可选, 默认: "Generated with Nano Banana Pro") +- `mode`: 自拍模式 (可选, 默认: auto) + - `auto`: 根据关键词自动检测 + - `mirror`: 镜子自拍 (全身照) + - `direct`: 直接自拍 (特写) +- `reference_image`: 参考图像 URL (可选, 默认使用 Clawra 官方图像) + +## 📝 示例 + +### 1. 简单文本生成图像 +```bash +GEMINI_API_KEY=your_key ./clawra-selfie-with-banana.sh \ + "A cyberpunk city at night with neon lights" \ + "#art-gallery" +``` + +### 2. Clawra 自拍 (自动模式检测) +```bash +# 会自动检测为 mirror 模式 (因为有 "wearing" 关键词) +GEMINI_API_KEY=your_key ./clawra-selfie-with-banana.sh \ + "wearing a red evening dress at a party" \ + "#selfies" \ + "Party time! 🎉" +``` + +### 3. 指定 direct 模式 +```bash +# 近景肖像 +GEMINI_API_KEY=your_key ./clawra-selfie-with-banana.sh \ + "at a cozy coffee shop with warm lighting" \ + "#daily-updates" \ + "Morning coffee ☕" \ + "direct" +``` + +### 4. 使用自定义参考图像 +```bash +GEMINI_API_KEY=your_key ./clawra-selfie-with-banana.sh \ + "wearing sunglasses and a hat" \ + "#fun" \ + "New look! 😎" \ + "auto" \ + "https://example.com/my-photo.jpg" +``` + +### 5. 不发送消息,仅生成图像 +```bash +# 使用一个无效的 channel,图像会保存在本地 +GEMINI_API_KEY=your_key ./clawra-selfie-with-banana.sh \ + "beautiful sunset" \ + "local" +``` + +## 🔧 模式说明 + +### Mirror 模式 +适用于展示服装、全身照、时尚内容 + +**触发关键词:** +- outfit, wearing, clothes, dress, suit, fashion, full-body, mirror + +**提示词模板:** +``` +make a pic of this person, but [你的描述]. the person is taking a mirror selfie +``` + +### Direct 模式 +适用于近景肖像、地点拍摄、情感表达 + +**触发关键词:** +- cafe, restaurant, beach, park, city, close-up, portrait, face, eyes, smile + +**提示词模板:** +``` +a close-up selfie taken by herself at [你的描述], +direct eye contact with the camera, looking straight into the lens, +eyes centered and clearly visible, not a mirror selfie, +phone held at arm's length, face fully visible +``` + +## 🌐 图像上传 + +脚本会自动尝试上传生成的图像: + +1. **优先**: 如果设置了 `FAL_KEY` 环境变量,上传到 fal.ai storage +2. **备用**: 上传到 imgur 匿名图床 + +```bash +# 同时使用两个 API +export GEMINI_API_KEY="your_gemini_key" +export FAL_KEY="your_fal_key" # 可选,用于上传 + +./clawra-selfie-with-banana.sh "prompt" "#channel" +``` + +## 🔄 与原版的区别 + +| 特性 | Grok Imagine (原版) | Nano Banana Pro (新版) | +|------|-------------------|---------------------| +| API 提供商 | xAI (fal.ai) | Google Gemini | +| API Key | FAL_KEY | GEMINI_API_KEY | +| 输入格式 | 简单 JSON | Multimodal content | +| 参考图像 | URL 直接传递 | Base64 编码 | +| 输出格式 | URL | Base64 (需上传) | +| 价格 | 按 fal.ai 计费 | 根据 Gemini 计费 | + +## ⚠️ 注意事项 + +1. **API 限制**: Google Gemini API 有速率限制,注意不要频繁调用 +2. **图像上传**: 生成的图像需要上传到图床才能发送,建议配置 FAL_KEY +3. **临时文件**: 脚本会在 `/tmp` 创建临时文件,执行完会自动清理 +4. **OpenClaw**: 如果没有安装 OpenClaw CLI,会尝试直接调用 API + +## 🐛 故障排查 + +### API Key 错误 +```bash +# 检查 API key 是否设置 +echo $GEMINI_API_KEY + +# 测试 API key +curl -s "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \ + -H "x-goog-api-key: $GEMINI_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{"contents":[{"role":"user","parts":[{"text":"test"}]}]}' +``` + +### 图像上传失败 +```bash +# 方案 1: 使用 fal.ai storage +export FAL_KEY="your_fal_key" + +# 方案 2: 检查 imgur 连接 +curl -I https://api.imgur.com/3/image + +# 方案 3: 查看本地保存的图像 +ls -lh /tmp/clawra_output_*.png +``` + +### OpenClaw 连接失败 +```bash +# 检查 OpenClaw CLI +openclaw --version + +# 检查 Gateway +curl http://localhost:18789/health + +# 设置自定义 Gateway URL +export OPENCLAW_GATEWAY_URL="http://your-gateway:port" +export OPENCLAW_GATEWAY_TOKEN="your_token" +``` + +## 📚 相关资源 + +- [Google AI Studio](https://aistudio.google.com/) +- [Gemini API 文档](https://ai.google.dev/gemini-api/docs) +- [OpenClaw 文档](https://openclaw.dev) +- [Clawra 项目主页](https://github.com/SumeLabs/clawra) + +## 🤝 贡献 + +欢迎提交 Issue 和 Pull Request! + +## 📄 许可证 + +MIT License diff --git a/skill/scripts/clawra-selfie-with-banana.sh b/skill/scripts/clawra-selfie-with-banana.sh new file mode 100755 index 0000000..d8750f0 --- /dev/null +++ b/skill/scripts/clawra-selfie-with-banana.sh @@ -0,0 +1,308 @@ +#!/bin/bash +# clawra-selfie-with-banana.sh +# Generate an image with Google Nano Banana Pro (Gemini 3 Pro Image) and send it via OpenClaw +# +# Usage: ./clawra-selfie-with-banana.sh "" "" [options] +# +# Environment variables required: +# GEMINI_API_KEY - Your Google AI Studio API key +# +# Example: +# GEMINI_API_KEY=your_key ./clawra-selfie-with-banana.sh "A sunset over mountains" "#art" "Check this out!" + +set -euo pipefail + +# Colors for output +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +CYAN='\033[0;36m' +NC='\033[0m' # No Color + +log_info() { + echo -e "${GREEN}[INFO]${NC} $1" +} + +log_warn() { + echo -e "${YELLOW}[WARN]${NC} $1" +} + +log_error() { + echo -e "${RED}[ERROR]${NC} $1" +} + +log_step() { + echo -e "${CYAN}[STEP]${NC} $1" +} + +# Check required environment variables +if [ -z "${GEMINI_API_KEY:-}" ]; then + log_error "GEMINI_API_KEY environment variable not set" + echo "Get your API key from: https://aistudio.google.com/apikey" + exit 1 +fi + +# Check for jq +if ! command -v jq &> /dev/null; then + log_error "jq is required but not installed" + echo "Install with: brew install jq (macOS) or apt install jq (Linux)" + exit 1 +fi + +# Check for openclaw +if ! command -v openclaw &> /dev/null; then + log_warn "openclaw CLI not found - will attempt direct API call" + USE_CLI=false +else + USE_CLI=true +fi + +# Parse arguments +PROMPT="${1:-}" +CHANNEL="${2:-}" +CAPTION="${3:-Generated with Nano Banana Pro}" +MODE="${4:-auto}" # auto, mirror, direct +REFERENCE_IMAGE="${5:-}" # Optional reference image URL + +if [ -z "$PROMPT" ] || [ -z "$CHANNEL" ]; then + cat << 'EOF' +Usage: ./clawra-selfie-with-banana.sh [caption] [mode] [reference_image] + +Arguments: + prompt - Image description (required) + channel - Target channel (required) e.g., #general, @user + caption - Message caption (default: 'Generated with Nano Banana Pro') + mode - Selfie mode (default: auto) Options: auto, mirror, direct + reference_image - Reference image URL for editing (optional) + +Modes: + auto - Auto-detect based on keywords in prompt + mirror - Full-body mirror selfie (good for outfits) + direct - Close-up direct selfie (good for portraits) + +Environment: + GEMINI_API_KEY - Your Google AI Studio API key (required) + Get from: https://aistudio.google.com/apikey + +Examples: + # Text-to-image generation + GEMINI_API_KEY=your_key ./clawra-selfie-with-banana.sh \ + "A cyberpunk city at night" "#art" + + # Clawra selfie with auto-detection + GEMINI_API_KEY=your_key ./clawra-selfie-with-banana.sh \ + "wearing a red dress at a party" "#selfies" + + # Direct selfie mode with caption + GEMINI_API_KEY=your_key ./clawra-selfie-with-banana.sh \ + "at a cozy coffee shop" "#updates" "Morning coffee!" "direct" + + # Image editing with reference + GEMINI_API_KEY=your_key ./clawra-selfie-with-banana.sh \ + "wearing sunglasses and a hat" "#fun" "New look!" "auto" \ + "https://example.com/reference.jpg" + +EOF + exit 1 +fi + +# Fixed reference image for Clawra +CLAWRA_REFERENCE="https://cdn.jsdelivr.net/gh/SumeLabs/clawra@main/assets/clawra.png" + +# Determine the reference image to use +if [ -n "$REFERENCE_IMAGE" ]; then + USED_REFERENCE="$REFERENCE_IMAGE" + log_info "Using custom reference image: $REFERENCE_IMAGE" +else + USED_REFERENCE="$CLAWRA_REFERENCE" + log_info "Using Clawra default reference image" +fi + +# Auto-detect mode based on keywords +if [ "$MODE" = "auto" ]; then + if echo "$PROMPT" | grep -qiE "outfit|wearing|clothes|dress|suit|fashion|full-body|mirror"; then + MODE="mirror" + elif echo "$PROMPT" | grep -qiE "cafe|restaurant|beach|park|city|close-up|portrait|face|eyes|smile"; then + MODE="direct" + else + MODE="mirror" # default + fi + log_info "Auto-detected mode: $MODE" +fi + +# Construct the full prompt based on mode +if [ "$MODE" = "direct" ]; then + FULL_PROMPT="a close-up selfie taken by herself at $PROMPT, direct eye contact with the camera, looking straight into the lens, eyes centered and clearly visible, not a mirror selfie, phone held at arm's length, face fully visible" +else + FULL_PROMPT="make a pic of this person, but $PROMPT. the person is taking a mirror selfie" +fi + +log_step "Generating image with Nano Banana Pro (Gemini 3 Pro Image)..." +log_info "Mode: $MODE" +log_info "Prompt: $FULL_PROMPT" + +# Prepare the request payload +# Note: Gemini API uses multimodal content format +REQUEST_PAYLOAD=$(jq -n \ + --arg prompt "$FULL_PROMPT" \ + --arg ref_url "$USED_REFERENCE" \ + '{ + "contents": [{ + "role": "user", + "parts": [ + { + "text": $prompt + }, + { + "inlineData": { + "mimeType": "image/jpeg", + "data": "" + } + } + ] + }], + "generationConfig": { + "responseModalities": ["IMAGE"], + "temperature": 1.0, + "candidateCount": 1 + } + }') + +# Download reference image and encode to base64 +log_info "Downloading reference image..." +TEMP_IMAGE="/tmp/clawra_ref_$$.jpg" +if curl -sL "$USED_REFERENCE" -o "$TEMP_IMAGE"; then + BASE64_IMAGE=$(base64 < "$TEMP_IMAGE" | tr -d '\n') + rm -f "$TEMP_IMAGE" + log_info "Reference image encoded" +else + log_error "Failed to download reference image" + exit 1 +fi + +# Update payload with base64 image +REQUEST_PAYLOAD=$(echo "$REQUEST_PAYLOAD" | jq \ + --arg b64 "$BASE64_IMAGE" \ + '.contents[0].parts[1].inlineData.data = $b64') + +# Generate image via Google Gemini API +log_step "Calling Gemini API..." +RESPONSE=$(curl -s -X POST \ + "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \ + -H "x-goog-api-key: $GEMINI_API_KEY" \ + -H "Content-Type: application/json" \ + -d "$REQUEST_PAYLOAD") + +# Check for errors in response +if echo "$RESPONSE" | jq -e '.error' > /dev/null 2>&1; then + ERROR_MSG=$(echo "$RESPONSE" | jq -r '.error.message // .error // "Unknown error"') + log_error "Image generation failed: $ERROR_MSG" + echo "Full response: $RESPONSE" + exit 1 +fi + +# Extract image data from response +# Gemini returns base64 image in the response +IMAGE_DATA=$(echo "$RESPONSE" | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' 2>/dev/null) + +if [ -z "$IMAGE_DATA" ] || [ "$IMAGE_DATA" = "null" ]; then + log_error "Failed to extract image data from response" + echo "Response: $RESPONSE" + exit 1 +fi + +log_info "Image generated successfully!" + +# Save image temporarily and upload to a hosting service +# (In production, you might want to use a proper image hosting service) +TEMP_OUTPUT="/tmp/clawra_output_$$.png" +echo "$IMAGE_DATA" | base64 -d > "$TEMP_OUTPUT" + +log_info "Image saved to: $TEMP_OUTPUT" + +# For this example, we'll need to upload the image somewhere +# Let's use a simple approach with fal.ai storage or imgur +log_step "Uploading image..." + +# Try to upload to fal.ai storage if FAL_KEY is available +if [ -n "${FAL_KEY:-}" ]; then + log_info "Uploading to fal.ai storage..." + + UPLOAD_RESPONSE=$(curl -s -X POST "https://fal.ai/api/files/upload" \ + -H "Authorization: Key $FAL_KEY" \ + -F "file=@$TEMP_OUTPUT") + + IMAGE_URL=$(echo "$UPLOAD_RESPONSE" | jq -r '.url // empty') + + if [ -z "$IMAGE_URL" ]; then + log_warn "Failed to upload to fal.ai, trying alternative..." + fi +fi + +# Fallback: try imgur anonymous upload +if [ -z "${IMAGE_URL:-}" ]; then + log_info "Uploading to imgur..." + + IMGUR_RESPONSE=$(curl -s -X POST "https://api.imgur.com/3/image" \ + -H "Authorization: Client-ID 546c25a59c58ad7" \ + -F "image=@$TEMP_OUTPUT") + + IMAGE_URL=$(echo "$IMGUR_RESPONSE" | jq -r '.data.link // empty') + + if [ -z "$IMAGE_URL" ]; then + log_error "Failed to upload image to hosting service" + log_info "Image saved locally at: $TEMP_OUTPUT" + echo "You can manually upload and use it." + exit 1 + fi +fi + +log_info "Image URL: $IMAGE_URL" + +# Clean up temp file +rm -f "$TEMP_OUTPUT" + +# Send via OpenClaw +log_step "Sending to channel: $CHANNEL" + +if [ "$USE_CLI" = true ]; then + # Use OpenClaw CLI + openclaw message send \ + --action send \ + --channel "$CHANNEL" \ + --message "$CAPTION" \ + --media "$IMAGE_URL" +else + # Direct API call to local gateway + GATEWAY_URL="${OPENCLAW_GATEWAY_URL:-http://localhost:18789}" + GATEWAY_TOKEN="${OPENCLAW_GATEWAY_TOKEN:-}" + + curl -s -X POST "$GATEWAY_URL/message" \ + -H "Content-Type: application/json" \ + ${GATEWAY_TOKEN:+-H "Authorization: Bearer $GATEWAY_TOKEN"} \ + -d "{ + \"action\": \"send\", + \"channel\": \"$CHANNEL\", + \"message\": \"$CAPTION\", + \"media\": \"$IMAGE_URL\" + }" +fi + +log_info "Done! Image sent to $CHANNEL" + +# Output JSON for programmatic use +echo "" +echo "--- Result ---" +jq -n \ + --arg url "$IMAGE_URL" \ + --arg channel "$CHANNEL" \ + --arg prompt "$FULL_PROMPT" \ + --arg mode "$MODE" \ + '{ + success: true, + image_url: $url, + channel: $channel, + prompt: $prompt, + mode: $mode, + model: "nano-banana-pro" + }'