-
-
Notifications
You must be signed in to change notification settings - Fork 90
Description
Feature Request: Geospatial Indexing
Overview
Add full geospatial indexing to ArcadeDB using the native LSM-Tree engine as storage backend and a geo.* SQL function namespace, consistent with ArcadeDB's existing dot-namespace convention (e.g. vector.neighbors, vector.cosineSimilarity).
The design mirrors the existing LSMTreeFullTextIndex pattern: a thin wrapper that tokenizes geometry into GeoHash cells via Apache Lucene's lucene-spatial-extras library, stored in the LSM-Tree — inheriting ACID, WAL, HA, and compaction for free.
Proposed API
New index type: GEOSPATIAL
CREATE DOCUMENT TYPE Location;
CREATE PROPERTY Location.coords STRING;
CREATE INDEX ON Location (coords) GEOSPATIAL;Configurable GeoHash precision (default 11, ~2.4 m resolution; range 1–12), persisted in the schema JSON.
Automatic query optimizer integration
No search_index() call required. Any WHERE clause using a geo.* spatial predicate on an indexed field is automatically routed through the geospatial index:
-- Uses GEOSPATIAL index automatically
SELECT FROM Location
WHERE geo.within(coords, geo.geomFromText('POLYGON ((10 38, 16 38, 16 44, 10 44, 10 38))')) = true
-- Falls back to full scan transparently when no index exists12 geo.* constructor / accessor functions
| Function | Description |
|---|---|
geo.geomFromText(wkt) |
Parse any WKT string → Shape |
geo.point(x, y) |
Returns POINT (x y) WKT |
geo.lineString(pts) |
Returns LINESTRING (...) WKT |
geo.polygon(pts) |
Returns POLYGON ((…)) WKT, auto-closes ring |
geo.buffer(geom, dist) |
OGC buffer via JTS Geometry.buffer() |
geo.envelope(geom) |
Bounding rectangle as WKT |
geo.distance(g1, g2 [,unit]) |
Haversine; units: m (default), km, mi, nmi |
geo.area(geom) |
Area in square degrees via Spatial4j |
geo.asText(geom) |
Shape → WKT string |
geo.asGeoJson(geom) |
Shape → GeoJSON string |
geo.x(point) |
Extract longitude |
geo.y(point) |
Extract latitude |
9 geo.* spatial predicate functions
All implement IndexableSQLFunction for automatic optimizer integration.
| Function | Indexed | Notes |
|---|---|---|
geo.within(g, shape) |
✅ | g fully within shape |
geo.intersects(g, shape) |
✅ | g and shape share any point |
geo.contains(g, shape) |
❌ | containment direction flips index semantics |
geo.dWithin(g, shape, dist) |
❌ | requires bounding-circle expansion (future work) |
geo.disjoint(g, shape) |
❌ | disjoint records are absent from index result |
geo.equals(g, shape) |
❌ | requires exact coordinate match |
geo.crosses(g, shape) |
❌ | DE-9IM; full scan with JTS post-filter |
geo.overlaps(g, shape) |
❌ | DE-9IM; full scan with JTS post-filter |
geo.touches(g, shape) |
❌ | DE-9IM; full scan with JTS post-filter |
All predicates return null when either argument is null (three-valued SQL logic).
Breaking changes
The old non-standard geo functions would be removed and replaced by geo.* equivalents:
| Removed | Replacement |
|---|---|
point(x, y) |
geo.point(x, y) |
distance(p1, p2) |
geo.distance(p1, p2) |
circle(c, r) |
geo.buffer(geom, dist) |
polygon(pts) |
geo.polygon(pts) |
lineString(pts) |
geo.lineString(pts) |
rectangle(pts) |
geo.envelope(geom) |
Cypher point(lat, lon) and distance(p1, p2) would be preserved via CypherFunctionFactory.
New dependency
org.apache.lucene:lucene-spatial-extras (Apache 2.0). Lucene core is already a transitive dependency; this is a sibling module.
Implementation notes
- Storage: GeoHash tokenization via
lucene-spatial-extras; tokens stored in LSM-Tree — no new storage engine needed. - Index semantics: The index returns a GeoHash-cell superset of candidates; exact JTS predicate applied as a post-filter (
shouldExecuteAfterSearch = true). - SQL parser: The ANTLR4 grammar is extended with a
FUNCTION_NAMESPACESvisitor-level rewrite sogeo.function(args)parses correctly without breakingfield.method()patterns. - Geometry format: WKT strings stored in
STRINGproperties; index is transparent to the application.
See PR #3510 for a full implementation.