data-scraper-agent
OfficialBuild a fully automated AI-powered data collection agent for any public source — job boards, prices, news, GitHub, sports, anything. Scrapes on a schedule, enriches data with a free LLM (Gemini Flash), stores results in Notion/Sheets/Supabase, and learns from user feedback. Runs 100% free on GitHub Actions. Use when the user wants to monitor, collect, or track any public data automatically.
What this skill does
When applied, it prepends a system prompt before your request is sent — no extra calls and no change to how you are billed beyond the added tokens.
---
name: data-scraper-agent
description: Build a fully automated AI-powered data collection agent for any public source — job boards, prices, news, GitHub, sports, anything. Scrapes on a schedule, enriches data with a free LLM (Gemini Flash), stores results in Notion/Sheets/Supabase, and learns from user feedback. Runs 100% free on GitHub Actions. Use when the user wants to monitor, collect, or track any public data automatically.
origin: community
---
# Data Scraper Agent
Build a production-ready, AI-powered data collection agent for any public data source.
Runs on a schedule, enriches results with a free LLM, stores to a database, and improves over time.
**Stack: Python · Gemini Flash (free) · GitHub Actions (free) · Notion / Sheets / Supabase**
## When to Activate
- User wants to scrape or monitor any public website or API
- User says "build a bot that checks...", "monitor X for me", "collect data from..."
- User wants to track jobs, prices, news, repos, sports scores, events, listings
- User asks how to automate data collection without paying for hosting
- User wants an agent that gets smarter over time based on their decisions
## Core Concepts
### The Three Layers
Every data scraper agent has three layers:
```
COLLECT → ENRICH → STORE
│ │ │
Scraper AI (LLM) Database
runs on scores/ Notion /
schedule summarises Sheets /
& classifies Supabase
```
### Free Stack
| Layer | Tool | Why |
|---|---|---|
| **Scraping** | `requests` + `BeautifulSoup` | No cost, covers 80% of public sites |
| **JS-rendered sites** | `playwright` (free) | When HTML scraping fails |
| **AI enrichment** | Gemini Flash via REST API | 500 req/day, 1M tokens/day — free |
| **Storage** | Notion API | Free tier, great UI for review |
| **Schedule** | GitHub Actions cron | Free for public repos |
| **Learning** | JSON feedback file in repo | Zero infra, persists in git |
### AI Model Fallback Chain
Build agents to auto-fallback across Gemini mUse this skill
Add a "skill" field with the skill’s ID to your chat completion request. It is applied server-side before your prompt is sent — no extra calls.
{
"model": "gpt-4o-mini",
"skill": "imp-2388bb56-2855-4dcd-b05a-c37fe08867c3",
"messages": [{ "role": "user", "content": "…" }]
}Install the skill, enable it in your dashboard and (optionally) limit it to specific models. It then applies automatically to every matching request — with no "skill" field to send each time.
Set it up in your dashboardMore skills
Set up and use 1Password CLI for sign-in, desktop integration, and reading or injecting secrets.
Create, view, edit, delete, search, move, or export Apple Notes via the memo CLI on macOS.
List, add, edit, complete, or delete Apple Reminders and reminder lists via remindctl.
Create, search, and manage Bear notes via grizzly CLI.
Monitor blogs and RSS/Atom feeds for updates using the blogwatcher CLI.
BluOS CLI (blu) for discovery, playback, grouping, and volume.
Capture frames or clips from RTSP/ONVIF cameras.
Search, install, update, sync, or publish agent skills with the ClawHub CLI and registry.