A Python-based scraper to extract posts from X (formerly Twitter) based on specific search criteria, using Selenium with an undetected ChromeDriver for web automation.
Originally built for personal research (e.g., #Terimakasihjokowi during Prabowo's inauguration in late 2024). If ur interested in collaborating or have questions, feel free to reach out!
Note: X's interface and policies may change, potentially affecting the scraper's functionality.
- Filtered search with full query composition (keywords, accounts, hashtags, min counts, replies/links) (see X Advanced Search)
- Date-range crawling with automatic day stepback when no posts are found
- Duplicate protection across resumed sessions
- Auto-save and resume from savepoints
- CSV and JSON export options
Example data can be seen in Process/jokowi_twitterACC and Process/MBG. Legacy code data can be seen in Legacy/terimaKasihJokowi.csv.
| User | Date | post_text | quotedPost_text | Reply_count | Repost_count | Like_count | View_count |
|---|---|---|---|---|---|---|---|
| @PolitisiTidurr | 2026-01-15-19:30:18 | Ini bukan lagi program "Makan Bergizi Gratis" (MBG), tapi "Malapetaka Beracun Gratis". Angka 21.254 korban bukan sekadar statistik, itu adalah nyawa anak-anak yang dijadikan kelinci percobaan kebijakan populis yang dipaksakan tanpa kesiapan sanitasi dan pengawasan logistik. | 0 | 0 | 0 | 86 | |
| @diydiydi | 2026-01-15-20:15:26 | Anak kicikku dapat makan berGIZI gratisTapi slalu ada produk ultra proses food entah itu satu atau bahkan makanan utamanya, alhasil mubazir Aturannya kan gaboleh makek upf 😭 | 0 | 0 | 0 | 6 | |
| @venusdocxx | 2026-01-15-18:55:38 | Wapres Gibran tinjau langsung ke Wamena Papua pastikan program makan bergizi gratis berjalan optimal di daerah terpencil wujud komitmen dan Kerja Nyata untuk anak bangsa #LanjutkanMBG#LanjutkanMBG Apink JAEMIN FOR LEE JEANS #IamPOLCASAN_MV#IamPOLCASAN_MV PERTHSANTA PRESENTER DRPONG | 0 | 0 | 0 | 198 |
- Python 3.9+
- Google Chrome installed, preferably version 144.
-
Clone the repository:
git clone
-
Edit credentials at Credentials/twitter.json with the format:
{ "username": "your_username", "password": "your_password", "email": "your_email" } -
Install dependencies:
pip install -r requirements.txt
You can use the scraper in two ways:
The main notebook is Notebook.IPYNB. It contains:
- The
twitterScrapperclass (imported from src.py) - Examples of filter configuration
- Example
session.start(...)calls
Quick flow:
- Install dependencies:
pip install -r requirements.txt - Open the notebook
- Update filters
- Run the scraping cell
Download the latest app from the Releases page and run it directly.
Quick flow:
- Download the latest release
- Launch the app
- Select credentials, configure filters, and start scraping
Refer to X Advanced Search for filter explanations. Example filter configuration:
FILTERS = {
# Basic filters
"all_these_words": "", # Example: what’s happening · contains both “what’s” and “happening”
"this_exact_phrase": "", # Example: what’s happening · contains the exact phrase “what’s happening”
"any_of_these_words": "'", # Example: what’s happening · contains either “what’s” or “happening”
"none_of_these_words": "", # Example: what’s happening · does not contain the words “what’s” or “happening”
"these_hashtags": "", # Example: #whatshappening · contains the hashtag #whatshappening
# Account filters
"from_accounts": "", # Example: from:Twitter · Tweets sent from the account Twitter
"to_accounts": "", # Example: to:Twitter · Tweets sent in reply to the account Twitter
"mentioning_accounts": "", # Example: @Twitter · Tweets that mention the account Twitter
"language": "", # Example: lang:en · Tweets in English | Use "" for all languages
# Additional filters
"Minimum_replies": "", # Example: min_replies:100 · Tweets with at least 100 replies
"Minimum_likes": "", # Example: min_faves:100 · Tweets with at least 100 likes
"Minimum_retweets": "", # Example: min_retweets:100 · Tweets with at least 100 retweets
"links": True, # Example: filter:links · Include posts with links | If disabled, only posts without links
"replies": True, # Example: filter:replies · Include replies and original posts | If disabled, only original posts
}On "any_of_these_words", normally it would seperate string with spaces. However, if you don't want the string to be seperated, you can put \' in the start and end of the string. E.g, "any_of_these_words": "\'Makan Bergizi Gratis\' \'MBG\'" would search for posts that contains either "Makan Bergizi Gratis" or "MBG".
The following languages are supported for filtering:
'Arabic'
'Arabic (Feminine)'
'Bangla'
'Basque'
'Bulgarian
'Catalan'
'Croatian'
'Czech'
'Danish'
'Dutch'
'English'
'Finnish'
'French'
'German'
'Greek'
'Gujarati'
'Hebrew'
'Hindi'
'Hungarian'
'Indonesian'
'Italian'
'Japanese'
'Kannada'
'Korean'
'Marathi
'Norwegian'
'Persian'
'Polish'
'Portuguese'
'Romanian'
'Russian'
'Serbian'
'Simplified Chinese'
'Slovak'
'Spanish'
'Swedish'
'Tamil'
'Thai'
'Traditional Chinese'
'Turkish'
'Ukrainian'
'Urdu'
'Vietnamese'
By default, outputs are stored in:
- Process for current runs
- Savepoints under the selected process directory
- Final CSV/JSON on completion
- src.py: main implementation
- Notebook.IPYNB: main notebook for running the scraper
- requirements.txt: dependencies
- Credentials: Credentials storage
- Process: runtime outputs and savepoints
- LEGACY: old versions (deprecated)
- X may trigger “suspicious login attempt” and require email verification.
- If scraping detection occurs, the scraper can auto-save and wait before continuing.
This tool is intended for educational and research purposes only. Ensure compliance with X’s terms of service and privacy policies when using this scraper. The author is not responsible for any misuse of this tool.
