Skip to content

feat: Geospatial indexing with geo.* SQL functions (LSM-Tree native storage) #3513

@robfrank

Description

@robfrank

Feature Request: Geospatial Indexing

Overview

Add full geospatial indexing to ArcadeDB using the native LSM-Tree engine as storage backend and a geo.* SQL function namespace, consistent with ArcadeDB's existing dot-namespace convention (e.g. vector.neighbors, vector.cosineSimilarity).

The design mirrors the existing LSMTreeFullTextIndex pattern: a thin wrapper that tokenizes geometry into GeoHash cells via Apache Lucene's lucene-spatial-extras library, stored in the LSM-Tree — inheriting ACID, WAL, HA, and compaction for free.


Proposed API

New index type: GEOSPATIAL

CREATE DOCUMENT TYPE Location;
CREATE PROPERTY Location.coords STRING;
CREATE INDEX ON Location (coords) GEOSPATIAL;

Configurable GeoHash precision (default 11, ~2.4 m resolution; range 1–12), persisted in the schema JSON.

Automatic query optimizer integration

No search_index() call required. Any WHERE clause using a geo.* spatial predicate on an indexed field is automatically routed through the geospatial index:

-- Uses GEOSPATIAL index automatically
SELECT FROM Location
WHERE geo.within(coords, geo.geomFromText('POLYGON ((10 38, 16 38, 16 44, 10 44, 10 38))')) = true

-- Falls back to full scan transparently when no index exists

12 geo.* constructor / accessor functions

Function Description
geo.geomFromText(wkt) Parse any WKT string → Shape
geo.point(x, y) Returns POINT (x y) WKT
geo.lineString(pts) Returns LINESTRING (...) WKT
geo.polygon(pts) Returns POLYGON ((…)) WKT, auto-closes ring
geo.buffer(geom, dist) OGC buffer via JTS Geometry.buffer()
geo.envelope(geom) Bounding rectangle as WKT
geo.distance(g1, g2 [,unit]) Haversine; units: m (default), km, mi, nmi
geo.area(geom) Area in square degrees via Spatial4j
geo.asText(geom) Shape → WKT string
geo.asGeoJson(geom) Shape → GeoJSON string
geo.x(point) Extract longitude
geo.y(point) Extract latitude

9 geo.* spatial predicate functions

All implement IndexableSQLFunction for automatic optimizer integration.

Function Indexed Notes
geo.within(g, shape) g fully within shape
geo.intersects(g, shape) g and shape share any point
geo.contains(g, shape) containment direction flips index semantics
geo.dWithin(g, shape, dist) requires bounding-circle expansion (future work)
geo.disjoint(g, shape) disjoint records are absent from index result
geo.equals(g, shape) requires exact coordinate match
geo.crosses(g, shape) DE-9IM; full scan with JTS post-filter
geo.overlaps(g, shape) DE-9IM; full scan with JTS post-filter
geo.touches(g, shape) DE-9IM; full scan with JTS post-filter

All predicates return null when either argument is null (three-valued SQL logic).


Breaking changes

The old non-standard geo functions would be removed and replaced by geo.* equivalents:

Removed Replacement
point(x, y) geo.point(x, y)
distance(p1, p2) geo.distance(p1, p2)
circle(c, r) geo.buffer(geom, dist)
polygon(pts) geo.polygon(pts)
lineString(pts) geo.lineString(pts)
rectangle(pts) geo.envelope(geom)

Cypher point(lat, lon) and distance(p1, p2) would be preserved via CypherFunctionFactory.


New dependency

org.apache.lucene:lucene-spatial-extras (Apache 2.0). Lucene core is already a transitive dependency; this is a sibling module.


Implementation notes

  • Storage: GeoHash tokenization via lucene-spatial-extras; tokens stored in LSM-Tree — no new storage engine needed.
  • Index semantics: The index returns a GeoHash-cell superset of candidates; exact JTS predicate applied as a post-filter (shouldExecuteAfterSearch = true).
  • SQL parser: The ANTLR4 grammar is extended with a FUNCTION_NAMESPACES visitor-level rewrite so geo.function(args) parses correctly without breaking field.method() patterns.
  • Geometry format: WKT strings stored in STRING properties; index is transparent to the application.

See PR #3510 for a full implementation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions