Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
42c63f7
docs: add geospatial indexing design document
robfrank Feb 22, 2026
3cd3763
docs: add geospatial indexing implementation plan
robfrank Feb 22, 2026
f6a4003
feat(geo): add lucene-spatial-extras dependency for geospatial indexing
robfrank Feb 22, 2026
1f96233
feat(geo): add GeoIndexMetadata for geospatial index configuration
robfrank Feb 22, 2026
71dfa4d
feat(geo): add LSMTreeGeoIndex with GeohashPrefixTree spatial token d…
robfrank Feb 22, 2026
d232ccb
fix(geo): honor limit parameter in LSMTreeGeoIndex.get()
robfrank Feb 22, 2026
a3d6afe
refactor(geo): address code quality issues in LSMTreeGeoIndex
robfrank Feb 22, 2026
c490567
feat(geo): register GEOSPATIAL index type in Schema enum and LocalSchema
robfrank Feb 22, 2026
549e1c7
fix(geo): persist precision in LSMTreeGeoIndex.toJSON()
robfrank Feb 22, 2026
0a23d27
fix(geo): correct toJSON() pattern and add persistence round-trip tests
robfrank Feb 22, 2026
4dbc71a
feat(geo): add ST_* constructor/accessor functions, remove legacy geo…
robfrank Feb 22, 2026
8b09fc7
feat(geo): add ST_* spatial predicate functions with IndexableSQLFunc…
robfrank Feb 22, 2026
8a8482d
feat(geo): add ST_* spatial predicate functions with IndexableSQLFunc…
robfrank Feb 22, 2026
12671e2
fix(geo): correct allowsIndexedExecution for ST_Contains/Equals/Cross…
robfrank Feb 22, 2026
4172951
chore(geo): add lucene-spatial-extras to ATTRIBUTIONS.md and fix Cyph…
robfrank Feb 22, 2026
5b43ff4
test(geo): add transaction replay test for looksLikeGeoHashToken path
robfrank Feb 22, 2026
b43fb31
docs: add design doc for geo.* function rename
robfrank Feb 23, 2026
3898664
docs: add geo.* rename implementation plan
robfrank Feb 23, 2026
7470007
refactor(geo): rename SQLFunctionST_Predicate to SQLFunctionGeoPredicate
robfrank Feb 23, 2026
ad1f1f3
refactor(geo): rename ST_* constructor/accessor classes to geo.* naming
robfrank Feb 23, 2026
5ffefea
refactor(geo): fix stale ST_* comments and import ordering in factory
robfrank Feb 23, 2026
f7dbf29
refactor(geo): rename ST_* predicate classes to geo.* naming
robfrank Feb 23, 2026
95f5a8e
refactor(geo): fix stale ST_* comments in GeoPredicate and sort regis…
robfrank Feb 23, 2026
410689c
test(geo): update SQL strings from ST_* to geo.* naming
robfrank Feb 23, 2026
049feaa
fix(geo): support geo.* function calls without breaking field.method(…
robfrank Feb 23, 2026
1054e54
test(geo): update ST_* references in comments to geo.* naming
robfrank Feb 23, 2026
78379e5
docs: update geo function references from ST_* to geo.* naming
robfrank Feb 23, 2026
3dccb16
test(geo): update ST_* section header comments to geo.* naming
robfrank Feb 23, 2026
c843792
refactor(geo): unify factory imports and document FUNCTION_NAMESPACES…
robfrank Feb 23, 2026
f969aa0
fix(geo): address code review issues — remove dev files, fix build() …
robfrank Feb 23, 2026
1bbeaa9
add geolocation index on photo to lead tests
robfrank Feb 23, 2026
d7080ac
reduce numbers
robfrank Feb 24, 2026
02cf697
chore: assertj and fqns
robfrank Feb 24, 2026
07f9574
test(geo): add GeoHashIndexTest with geohash index tests
robfrank Feb 24, 2026
046e560
test(geo): add GeoConstructionFunctionsTest with SQL, execute(), and …
robfrank Feb 24, 2026
39dddb5
test(geo): use specific IllegalArgumentException in geomFromText inva…
robfrank Feb 24, 2026
3681d94
test(geo): add GeoMeasurementFunctionsTest with SQL, execute(), and e…
robfrank Feb 24, 2026
c89e4d1
test(geo): use specific exception types and final vars in GeoMeasurem…
robfrank Feb 24, 2026
3ef6327
test(geo): add GeoConversionFunctionsTest with SQL, execute(), and er…
robfrank Feb 24, 2026
c143539
test(geo): add GeoPredicateFunctionsTest with SQL, execute(), and err…
robfrank Feb 24, 2026
ea65236
test(geo): add missing DWithin null-second-arg and Crosses SQL false …
robfrank Feb 24, 2026
8da5617
test(geo): remove SQLGeoFunctionsTest — superseded by 4 focused test …
robfrank Feb 24, 2026
3057d4f
docs: add geo test coverage design and implementation plan
robfrank Feb 24, 2026
87a37d3
fix(geo): add deprecated aliases for removed SQL functions and clarif…
robfrank Feb 24, 2026
9bd4033
fix(geo): address correctness and performance issues from code review
robfrank Feb 25, 2026
e7f08b7
docs(geo): clarify geo.dWithin distance unit in Javadoc
robfrank Feb 25, 2026
5753bff
chore(studio): regenerate function-reference.json with geo.* functions
robfrank Feb 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
hs_err_pid*

.idea/
.claude/settings.local.json
### Maven template
target/
gen/
Expand Down
1 change: 1 addition & 0 deletions ATTRIBUTIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,7 @@ The following table lists runtime dependencies bundled with ArcadeDB distributio
| org.apache.lucene | lucene-queries | 10.3.2 | Apache 2.0 | https://lucene.apache.org/ |
| org.apache.lucene | lucene-sandbox | 10.3.2 | Apache 2.0 | https://lucene.apache.org/ |
| org.apache.lucene | lucene-facet | 10.3.2 | Apache 2.0 | https://lucene.apache.org/ |
| org.apache.lucene | lucene-spatial-extras | 10.3.2 | Apache 2.0 | https://lucene.apache.org/ |

**Apache Lucene Notice:** Lucene is a registered trademark of The Apache Software Foundation. See the NOTICE file for Lucene's own third-party attributions.

Expand Down
5 changes: 3 additions & 2 deletions bolt/src/main/java/com/arcadedb/bolt/BoltNetworkExecutor.java
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.TreeSet;
import java.util.logging.Level;

import static com.arcadedb.query.opencypher.executor.steps.FinalProjectionStep.PROJECTION_NAME_METADATA;
Expand Down Expand Up @@ -1012,7 +1013,7 @@ private boolean handleSystemQuery(final String query) throws IOException {
syntheticResults.add(List.of((Object) relTypes));

// Property keys (from all non-composite types)
final Set<String> allKeys = new java.util.TreeSet<>();
final Set<String> allKeys = new TreeSet<>();
for (final DocumentType type : database.getSchema().getTypes())
if (!type.getName().contains("~"))
allKeys.addAll(type.getPropertyNames());
Expand Down Expand Up @@ -1047,7 +1048,7 @@ private boolean handleSystemQuery(final String query) throws IOException {
currentFields = List.of("propertyKey");
syntheticResults = new ArrayList<>();
if (database != null) {
final Set<String> allKeys = new java.util.TreeSet<>();
final Set<String> allKeys = new TreeSet<>();
for (final DocumentType type : database.getSchema().getTypes()) {
if (!type.getName().contains("~"))
allKeys.addAll(type.getPropertyNames());
Expand Down
237 changes: 237 additions & 0 deletions docs/plans/2026-02-22-geospatial-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,237 @@
# Geospatial Indexing Design

**Date:** 2026-02-22
**Branch:** lsmtree-geospatial
**Status:** Approved

## Overview

Port OrientDB-style geospatial indexing to ArcadeDB, using the native LSM-Tree engine as storage (following the same pattern as `LSMTreeFullTextIndex`) and the `geo.*` SQL function namespace.

## Goals

- Support all OGC spatial predicate functions OrientDB supported: `geo.within`, `geo.intersects`, `geo.contains`, `geo.dWithin`, `geo.disjoint`, `geo.equals`, `geo.crosses`, `geo.overlaps`, `geo.touches`
- Replace existing non-standard geo functions (`point()`, `distance()`, `circle()`, etc.) with `geo.*` equivalents
- Automatic query optimizer integration — no explicit `search_index()` call needed
- WKT as the geometry storage format (consistent with existing partial support)
- LSM-Tree as storage backend (ACID, WAL, HA, compaction all inherited for free)

## Non-Goals

- GeoJSON storage format
- New native schema `Type` entries for geometry (WKT strings in existing STRING properties)
- 3D geometry support
- Raster data

## Architecture

### Layers

```
SQL Query: WHERE geo.within(location, geo.geomFromText('POLYGON(...)')) = true
SelectExecutionPlanner
detects IndexableSQLFunction on geo.within
calls allowsIndexedExecution()
LSMTreeGeoIndex.get(shape)
decomposes shape → GeoHash tokens via lucene-spatial-extras
looks up each token in underlying LSMTreeIndex
returns candidate RIDs
geo.within.shouldExecuteAfterSearch() = true
→ exact Spatial4j predicate post-filters candidates
```

### Dependencies

- `lucene-spatial-extras` (version 10.3.2, Apache 2.0) — adds `GeohashPrefixTree` and `RecursivePrefixTreeStrategy` for geometry decomposition into GeoHash tokens. Lucene core is already a dependency; this is a sibling module.
- `spatial4j` 0.8 — already present
- `jts-core` 1.20.0 — already present

## Component 1: LSMTreeGeoIndex

**Package:** `com.arcadedb.index.geospatial`
**File:** `LSMTreeGeoIndex.java`

Wraps `LSMTreeIndex` (identical to how `LSMTreeFullTextIndex` wraps it).

### Indexing (`put(keys, rid)`)

1. Parse the WKT string value using Spatial4j/JTS → `Shape`
2. Call `RecursivePrefixTreeStrategy.createIndexableFields(shape)` → Lucene `Field[]`
3. Extract string token values from the `TextField` among those fields
4. Store each token → RID in the underlying `LSMTreeIndex` (non-unique)

### Querying (`get(keys)`)

1. The key is a `Shape` (passed from the `geo.*` function)
2. Generate covering GeoHash cells via `SpatialArgs` + `RecursivePrefixTreeStrategy`
3. Extract cell token strings from the Lucene query
4. Look up each token in the LSM-Tree, union all matching RIDs
5. Return `TempIndexCursor` (candidates; exact post-filter happens in the `geo.*` function)

### Configuration

- **Precision level:** configurable at index creation (default: 11, ~2.4m resolution). Stored in index metadata JSON.
- **Metadata class:** `GeoIndexMetadata` (analogous to `FullTextIndexMetadata`)

### Schema Registration

- Add `GEOSPATIAL` to `Schema.INDEX_TYPE` enum
- Register `LSMTreeGeoIndex.GeoIndexFactoryHandler` in `LocalSchema` alongside `LSM_TREE`, `FULL_TEXT`, `LSM_VECTOR`

## Component 2: geo.* SQL Functions

**Package:** `com.arcadedb.function.sql.geo`
**Registered in:** `DefaultSQLFunctionFactory`

### Constructor / Accessor Functions (pure compute, no index)

| Function | Replaces | Notes |
|---|---|---|
| `geo.geomFromText(wkt)` | — | Parse any WKT string → Spatial4j `Shape` |
| `geo.point(x, y)` | `point(x,y)` | Returns Spatial4j `Point` as WKT |
| `geo.lineString(pts)` | `lineString(pts)` | |
| `geo.polygon(pts)` | `polygon(pts)` | |
| `geo.buffer(geom, dist)` | `circle(c,r)` | OGC buffer around any geometry |
| `geo.envelope(geom)` | `rectangle(pts)` | Bounding rectangle as WKT |
| `geo.distance(g1, g2 [,unit])` | `distance(...)` | Haversine; keeps SQL and Cypher-style params |
| `geo.area(geom)` | — | Area in square degrees via Spatial4j |
| `geo.asText(geom)` | — | Spatial4j `Shape` → WKT string |
| `geo.asGeoJson(geom)` | — | Shape → GeoJSON string via JTS |
| `geo.x(point)` | — | Extract X coordinate |
| `geo.y(point)` | — | Extract Y coordinate |

### Spatial Predicate Functions (implement `SQLFunction` + `IndexableSQLFunction`)

| Function | Semantics | Post-filter |
|---|---|---|
| `geo.within(g, shape)` | g is fully within shape | yes |
| `geo.intersects(g, shape)` | g and shape share any point | yes |
| `geo.contains(g, shape)` | g fully contains shape | yes |
| `geo.dWithin(g, shape, dist)` | g is within dist of shape | yes |
| `geo.disjoint(g, shape)` | g and shape share no points | yes |
| `geo.equals(g, shape)` | geometrically equal | yes |
| `geo.crosses(g, shape)` | g crosses shape | yes |
| `geo.overlaps(g, shape)` | g overlaps shape | yes |
| `geo.touches(g, shape)` | g touches shape boundary | yes |

All predicates return `null` when either argument is null (SQL three-valued logic).

**Implementation notes on `allowsIndexedExecution()`:**

- `geo.disjoint` — returns `false`. The GeoHash index stores records whose geometry intersects
the indexed cells. Disjoint records are precisely those *not* present in the intersection
result, so the index cannot produce a valid candidate superset. The predicate always falls
back to a full scan with inline evaluation.
- `geo.dWithin` — returns `false`. The current implementation evaluates proximity as a
straight-line distance between geometry centers. The GeoHash index returns cells that
intersect the query shape, which does not correspond to a distance radius. Correct indexed
proximity would require first expanding the search geometry into a bounding circle before
GeoHash querying; this is a planned future enhancement. The predicate always falls back to
full scan.

Each predicate's `IndexableSQLFunction` implementation:
- `allowsIndexedExecution()` — returns `true` when first argument is a bare field reference AND a `GEOSPATIAL` index exists on that field in the target type
- `canExecuteInline()` — always `true` (falls back to full-scan with exact Spatial4j predicate if no index)
- `shouldExecuteAfterSearch()` — always `true` (index returns superset; exact predicate post-filters)
- `searchFromTarget()` — resolves the field's `LSMTreeGeoIndex`, evaluates the shape argument, calls `index.get(shape)`, returns `Iterable<Record>`

## Component 3: Query Optimizer Integration

No changes to `SelectExecutionPlanner` required. The existing `indexedFunctionConditions` path fully supports this pattern:

1. `block.getIndexedFunctionConditions(typez, context)` collects conditions where the left `Expression` is a function call implementing `IndexableSQLFunction`
2. `geo.within.allowsIndexedExecution()` checks for a `GEOSPATIAL` index on the referenced field
3. `BinaryCondition.executeIndexedFunction()` → `geo.within.searchFromTarget()` executes the indexed search
4. `shouldExecuteAfterSearch() = true` → exact post-filter applied to all returned candidates

**Multi-bucket:** `searchFromTarget()` iterates all per-bucket `LSMTreeGeoIndex` instances via `TypeIndex.getIndexesOnBuckets()` and unions results, matching the full-text search pattern.

## Error Handling

| Scenario | Behavior |
|---|---|
| Invalid WKT in `geo.geomFromText()` | `IllegalArgumentException` with clear message |
| Null geometry argument in predicate | returns `null` (three-valued SQL logic) |
| No geospatial index on field | falls back to full-scan; no error |
| Non-WKT value in indexed property | `put()` skips record, logs warning |
| Antimeridian / polar shapes | handled correctly by `GeohashPrefixTree` |
| Precision change after indexing | must rebuild index (same as full-text analyzer change) |

## Testing

All tests in `engine/src/test/java/com/arcadedb/`:

### `index/geospatial/LSMTreeGeoIndexTest`
- Index and query a point; verify RID returned
- Index and query a circle; verify candidates include nearby points
- Index and query a polygon; verify post-filter removes false positives
- Null / invalid WKT handling
- No-index fallback path

### `function/sql/geo/SQLGeoFunctionsTest` (extend existing)
- All `geo.*` constructor and accessor functions
- Verify old `point()`, `distance()`, etc. throw "unknown function"

### `function/sql/geo/SQLGeoIndexedQueryTest` (new)
- Create type with `GEOSPATIAL` index on WKT property
- Insert records with point WKT values at known coordinates
- `SELECT ... WHERE geo.within(...) = true` — verify correct results
- `SELECT ... WHERE geo.intersects(...) = true` — verify
- `SELECT ... WHERE geo.dWithin(..., dist) = true` — proximity radius query
- All nine predicate functions covered
- Query with no index (fallback) produces same results

All assertions use `assertThat(...).isTrue()` / `isFalse()` / `isEqualTo()` per project conventions.

## File Layout

```
engine/src/main/java/com/arcadedb/
index/geospatial/
LSMTreeGeoIndex.java
GeoIndexMetadata.java
function/sql/geo/
SQLFunctionGeoGeomFromText.java
SQLFunctionGeoPoint.java
SQLFunctionGeoLineString.java
SQLFunctionGeoPolygon.java
SQLFunctionGeoBuffer.java
SQLFunctionGeoEnvelope.java
SQLFunctionGeoDistance.java
SQLFunctionGeoArea.java
SQLFunctionGeoAsText.java
SQLFunctionGeoAsGeoJson.java
SQLFunctionGeoX.java
SQLFunctionGeoY.java
SQLFunctionGeoWithin.java ← implements IndexableSQLFunction
SQLFunctionGeoIntersects.java ← implements IndexableSQLFunction
SQLFunctionGeoContains.java ← implements IndexableSQLFunction
SQLFunctionGeoDWithin.java ← implements IndexableSQLFunction
SQLFunctionGeoDisjoint.java ← implements IndexableSQLFunction
SQLFunctionGeoEquals.java ← implements IndexableSQLFunction
SQLFunctionGeoCrosses.java ← implements IndexableSQLFunction
SQLFunctionGeoOverlaps.java ← implements IndexableSQLFunction
SQLFunctionGeoTouches.java ← implements IndexableSQLFunction
GeoUtils.java ← extend existing
LightweightPoint.java ← keep existing

engine/src/test/java/com/arcadedb/
index/geospatial/
LSMTreeGeoIndexTest.java
function/sql/geo/
SQLGeoFunctionsTest.java ← extend existing
SQLGeoIndexedQueryTest.java ← new

engine/pom.xml ← add lucene-spatial-extras dependency
```

## Open Questions

- Should `geo.distance` return meters by default (Neo4j/Cypher compat) or kilometers (current `distance()` SQL default)? Current implementation keeps both styles based on argument count — recommend preserving this.
- Should `geo.buffer` accept distance in meters, kilometers, or degrees? Spatial4j works in degrees; conversion at the function boundary needed for user-facing meter/km inputs.
Loading
Loading