trafilatura

Here are 16 public repositories matching this topic...

opendatalab / MinerU-HTML

MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.

nlp scraping text-extraction web-scraping corpus-tools article-extractor rag trafilatura webagent

Updated Dec 25, 2025
HTML

michaelsoftmd / pebkac-chrome

Star

pebkac Chrome Nonautomation - A Local LLM-Driven Web Co-Browser using Smolagents, Zendriver, Trafilatura.

chrome automation ai openai webscraping atlas claude llms claude-ai trafilatura nodriver ai-browser-automation smolagents ai-browser-control aibrowser zendriver ai-browser pebkac

Updated Jan 7, 2026
Python

Gdi87 / Webscrapper

Star

web Scrapper In Python

scraper web pandas python3 scrapping scrapping-python scrapper-script trafilatura

Updated Sep 6, 2023
Python

brzvsk / longreader

Star

Telegram Mini App that saves internet articles to read them later

telegram nextjs scraping telegrambot fastapi readitlater trafilatura telegramminiapp

Updated Dec 11, 2025
Python

mazzasaverio / url2md4ai

Star

Lean Python tool for extracting clean, LLM-optimized markdown from web pages. Handles dynamic content with Playwright + Trafilatura for maximum information extraction efficiency.

html-to-markdown text-extraction openai playwright html-to-markdown-converter trafilatura

Updated Jul 6, 2025
HTML

vedantvisoliya / Flutter-ChatGPT-Clone

Star

ChatGPT AI Clone

python dart websockets gemini web-scraping flutter fastapi sentence-transformers flutter-web-app trafilatura tavily ai-clones

Updated Jul 4, 2025
Dart

MUGISHA-Pascal / Flutter-Perplexity-FastAPI

Star

Real-time AI search and chat backend with WebSocket streaming, powered by Tavily web search and Google Gemini for Flutter apps.

gemini-api fastapi trafilatura tavily-api

Updated Aug 2, 2025
Python

10kseok / BlogToBook

Star

블로그 글을 전자책으로 만들어주는 서비스

pdf ebook calibre fastapi trafilatura

Updated Aug 12, 2025
Python

fvanevski / trafilatura_mcp

Star

Trafilatura MCP Server

fetch web-scraping trafilatura mcp-server

Updated Oct 1, 2025
Python

Pookie-n-Rookie / Crawlr

Star

A web scraper with an LLM-powered document suggestion system that combines web crawling, data extraction, and advanced AI capabilities to recommend relevant documents.

multiagent llm langchain trafilatura crewai tavily agentic-rag

Updated May 10, 2025
Python

Senavictors / WebFetchMailer

Star

Agente simples que busca notícias em RSS, extrai o conteúdo das páginas, gera um resumo em HTML usando `Gemini` e envia por email via SMTP. Ideal para um boletim diário de tecnologia.

python python3 artificial-intelligence trafilatura google-generativeai

Updated Dec 4, 2025
Python

This project is a Python-based web scraping tool that uses the Trafilatura library to extract and save text content from a list of specified websites. The program is designed to process multiple URLs, extract their main content, and save each website's content to a separate .txt file.

html xml trafilatura

Updated Nov 1, 2024
Jupyter Notebook

elvismdev / trafilatura-api

Star

Trafilatura API for html content info extract

python nlp docker flask rest-api text-extraction web-scraping content-extraction metadata-extraction news-scraper article-extraction trafilatura

Updated Dec 2, 2025
Python

maximilianromer / Tor-Search-MCP

Star

Tools for LLMs to anonymously search and browse the web

mcp selenium tor duckduckgo selenium-webdriver tor-network geckodriver tor-hidden-services trafilatura mcp-server fastmcp ddgs tbselenium

Updated Dec 31, 2025
Python

gokhaneraslan / multi-agent-systems

Sponsor

Star

🤖 Collection of AI agents for web search, RAG, and multi-agent collaboration. Features phi-agent + Groq integration, Ollama support, DuckDuckGo/Google search, web scraping, and local knowledge base querying with vector embeddings.

duckduckgo web-scraping knowledge-base semantic-search google-search multi-agent-systems ai-agents conversational-ai rag groq vector-database sentence-transformers llm retrieval-augmented-generation lancedb trafilatura ollama crawl4ai phi-agent

Updated Jun 7, 2025
Python

rajan-bhateja / Article_Summarizer_and_Sentiment_Analyzer

Star

Summarize articles using NLTK, Gemini and Trafilatura

sentiment-analysis nltk summarization gemini-api trafilatura

Updated Apr 9, 2025
Python

Improve this page

Add a description, image, and links to the trafilatura topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the trafilatura topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trafilatura

Here are 16 public repositories matching this topic...

opendatalab / MinerU-HTML

michaelsoftmd / pebkac-chrome

Gdi87 / Webscrapper

brzvsk / longreader

mazzasaverio / url2md4ai

vedantvisoliya / Flutter-ChatGPT-Clone

MUGISHA-Pascal / Flutter-Perplexity-FastAPI

10kseok / BlogToBook

fvanevski / trafilatura_mcp

Pookie-n-Rookie / Crawlr

Senavictors / WebFetchMailer

fa12hovo / Web_scrapping

elvismdev / trafilatura-api

maximilianromer / Tor-Search-MCP

gokhaneraslan / multi-agent-systems

rajan-bhateja / Article_Summarizer_and_Sentiment_Analyzer

Improve this page

Add this topic to your repo