Skip to content

Conversation

@JurekBauer
Copy link

Hi, this is the following PR. I heavily relied on LLMs, but worked for me fine. Up for a critical review:

Problem

Understat changed their website to load data dynamically via JavaScript, so the datesData and shotsData variables are no longer in the initial HTML. This caused understat-db ingest to fail with a ValueError when trying to extract JSON from script tags.

Solution
Switched to direct API endpoints instead of parsing HTML:
Matches: getLeagueData/{league}/{season} — returns JSON with dates, teams, and players
Shots: getMatchData/{match_id} — returns JSON with shots, rosters, and tmpl

Changes
Updated matches() to call the API endpoint directly
Updated shots() to call the API endpoint directly
Added required HTTP headers (Referer, User-Agent, Accept, X-Requested-With) for the API
Improved error handling in extract_json() with clearer messages
Kept fallback to the old HTML parsing method for backwards compatibility

Testing
✅ matches() works (tested with Bundesliga 2025)
✅ shots() works (tested with match ID 30224)
✅ understat-db ingest command now works correctly

Backwards Compatibility
The old HTML parsing method is still available as a fallback for cases where the API might not be accessible, ensuring backwards compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant