A Python client to fetch data from the GDELT 2.0 API.
This client supports both the DOC API for article search and timelines, as well as direct access to GDELT's raw event data files (events, mentions, and GKG). This allows for simpler, small-scale analysis of news coverage and events data without having to deal with the complexities of downloading and managing the raw files from S3, or working with the BigQuery export.
The implementation has been forked from gdeltdoc.
gdelt-client is on PyPi and is installed through pip:
pip install gdelt-clientSearch for news articles and get timeline data via the GDELT DOC API.
from gdelt_client import GdeltClient, Filters
f = Filters(
keyword="climate change",
start_date="2020-05-10",
end_date="2020-05-11"
)
gd = GdeltClient()
# Search for articles matching the filters
articles = gd.article_search(f)
# Get a timeline of coverage volume
timeline = gd.timeline_search("timelinevol", f)Async example:
import asyncio
from gdelt_client import GdeltClient, Filters
async def main():
f = Filters(keyword="climate change", start_date="2020-05-10", end_date="2020-05-11")
# Use async context manager to properly cleanup resources
async with GdeltClient() as gd:
# Async article search
articles = await gd.aarticle_search(f)
# Async timeline search
timeline = await gd.atimeline_search("timelinevol", f)
asyncio.run(main())Download and parse GDELT's raw data files directly. Returns data with CAMEO code descriptions for events.
from gdelt_client import GdeltClient, GdeltTable, OutputFormat
gd = GdeltClient()
# Download events for a single date
events = gd.search(
date="2020-05-10",
table=GdeltTable.EVENTS,
output=OutputFormat.DATAFRAME
)
# Download mentions for a date range with full 15-min coverage
mentions = gd.search(
date=["2020-05-10", "2020-05-11"],
table=GdeltTable.MENTIONS,
coverage=True # Download all 15-minute intervals
)
# Get GeoDataFrame with geometry for mapping
geo_events = gd.search(
date="2020-05-10",
table=GdeltTable.EVENTS,
output=OutputFormat.GEODATAFRAME
)
# View table schema
schema = gd.schema(GdeltTable.EVENTS)Async example (downloads files concurrently for better performance):
import asyncio
from gdelt_client import GdeltClient, GdeltTable
async def main():
# Use async context manager to properly cleanup resources
async with GdeltClient() as gd:
# Async search with concurrent file downloads
events = await gd.asearch(
date=["2020-05-10", "2020-05-11"],
table=GdeltTable.EVENTS,
coverage=True
)
print(events[:5])
print(f"Total records {len(events)}")
asyncio.run(main())Available tables: EVENTS, MENTIONS, GKG
Available output formats: DATAFRAME, JSON, CSV, GEODATAFRAME
The article_search() method (and async aarticle_search()) generates a list of news articles that match the filters. Returns a pandas DataFrame with columns: url, url_mobile, title, seendate, socialimage, domain, language, sourcecountry.
The timeline_search() method (and async atimeline_search()) supports 5 modes:
timelinevol- Timeline of coverage volume as a percentage of all monitored articlestimelinevolraw- Timeline with actual article counts instead of percentagestimelinelang- Coverage broken down by language (each language as a column)timelinesourcecountry- Coverage broken down by source country (each country as a column)timelinetone- Average tone of articles over time (see GDELT docs for tone metric details)
All modes return a pandas DataFrame with a datetime column and data columns.
The search query passed to the API is constructed from a gdelt_client.Filters object.
from gdelt_client import Filters, near, repeat
f = Filters(
start_date = "2020-05-01",
end_date = "2020-05-02",
num_records = 250,
keyword = "climate change",
domain = ["bbc.co.uk", "nytimes.com"],
country = ["UK", "US"],
theme = "GENERAL_HEALTH",
near = near(10, "airline", "carbon"),
repeat = repeat(5, "planet")
)Filters for keyword, domain, domain_exact, country, language and theme can be passed either as a single string or as a list of strings. If a list is passed, the values in the list are wrappeed in a boolean OR.
You must pass either start_date and end_date, or timespan
start_date- The start date for the filter in YYYY-MM-DD format or as a datetime object in UTC time. Passing a datetime allows you to specify a time down to seconds granularity. The API officially only supports the most recent 3 months of articles. Making a request for an earlier date range may still return data, but it's not guaranteed.end_date- The end date for the filter in YYYY-MM-DD format or as a datetime object in UTC time.timespan- A timespan to search for, relative to the time of the request. Must match one of the API's timespan formats - https://blog.gdeltproject.org/gdelt-doc-2-0-api-debuts/num_records- The number of records to return. Only used in article list mode and can be up to 250.keyword- Return articles containing the exact phrasekeywordwithin the article text.domain- Return articles from the specified domain. Does not require an exact match so passing "cnn.com" will match articles fromcnn.com,subdomain.cnn.comandnotactuallycnn.com.domain_exact- Similar todomain, but requires an exact match.country- Return articles published in a country or list of countries, formatted as the FIPS 2 letter country code.language- Return articles published in the given language, formatted as the ISO 639 language code.theme- Return articles that cover one of GDELT's GKG Themes. A full list of themes can be found herenear- Return articles containing words close to each other in the text. Usenear()to construct. eg.near = near(5, "airline", "climate"), ormulti_near()if you want to use multiple restrictions eg.multi_near([(5, "airline", "crisis"), (10, "airline", "climate", "change")], method="AND")finds "airline" and "crisis" within 5 words, and "airline", "climate", and "change" within 10 wordsrepeat- Return articles containing a single word repeated at least a number of times. Userepeat()to construct. eg.repeat =repeat(3, "environment"), ormulti_repeat()if you want to use multiple restrictions eg.repeat = multi_repeat([(2, "airline"), (3, "airport")], "AND")tone- Return articles above or below a particular tone score (ie more positive or more negative than a certain threshold). To use, specify either a greater than or less than sign and a positive or negative number (either an integer or floating point number). To find fairly positive articles, usetone=">5"or to search for fairly negative articles, usetone="<-5"- tone_absolute - The same as
tonebut ignores the positive/negative sign and lets you search for high emotion or low emotion articles, regardless of whether they were happy or sad in tone
The JSON schema data files in this package (src/gdelt_client/data/schemas/) are based on schemas from gdeltPyR, which is licensed under the GNU General Public License v3.0.
PRs & issues are very welcome!
It's recommended to use a virtual environment for development. Set one up with uv
uv sync
Tests for this package use pytest. Run them with
uv run pytest tests --cov=src/gdelt_client --cov-report=xml --cov-report=term-missing
If your PR adds a new feature or helper, please also add some tests
There's a bit of automation set up to help publish a new version of the package to PyPI,
- Make sure the version string has been updated since the last release. This package follows semantic versioning.
- Create a new release in the Github UI, using the new version as the release name
- Watch as the
publish.ymlGithub action builds the package and pushes it to PyPI