-
Notifications
You must be signed in to change notification settings - Fork 12
Description
I've run into a downstream problem in wkls where the Overture divisions dataset can be queried in a user-friendly way using the ISO 3166-1 and 3166-2 codes for countries and their subdivisions, but querying a valid ISO 3166 code can return zero results. The main issue appears to be that complex subdivision structures aren't represented correctly in the dataset.
For example, Puerto Rico has its own ISO 3166-1 country code PR, but as a territory of the United States it is also assigned the ISO 3166-2 subdivision code US-PR. The divisions dataset currently has country as PR and region as NULL.
This script compares the Overture dataset with the curated lists in the iso3166_2 package:
Overture has 272 countries and 3544 regions
iso3166_2 has 250 countries and 5049 subdivisions
2 countries in ISO 3166 not in Overture: ['EH', 'PS']
24 countries in Overture not in ISO 3166: ['CP', 'XA', 'XB', 'XC', 'XD', 'XE', 'XG', 'XH', 'XI', 'XJ', 'XL', 'XM', 'XN', 'XO', 'XP', 'XQ', 'XR', 'XS', 'XT', 'XU', 'XW', 'XX', 'XY', 'XZ']
1514 subdivisions in ISO 3166 not in Overture: ['AG-03', 'AG-04', 'AG-05', 'AG-06', 'AG-07', ...]
9 regions in Overture not in ISO 3166: ['ET-SE', 'LY-SU', 'NO-31', 'NO-32', 'NO-33', 'NO-39', 'NO-40', 'NO-55', 'NO-56']
I'm not too worried about the couple dozen mismatched country codes and the nine extra regions in Overture -- these are likely edge cases that can be addressed if there's interest.
However, the 1,514 subdivision codes missing from the Overture dataset's region attribute need to be queryable somehow. This affects about 50 countries, and any downstream geocoding based on ISO 3166-2 codes won't work as expected.