diff --git a/ATTRIBUTIONS.md b/ATTRIBUTIONS.md index 5f4108c124..48de6fe9fc 100644 --- a/ATTRIBUTIONS.md +++ b/ATTRIBUTIONS.md @@ -145,6 +145,12 @@ The following table lists runtime dependencies bundled with ArcadeDB distributio **License Note:** UPL = Universal Permissive License 1.0 +### Compression + +| Group ID | Artifact ID | Version | License | Homepage | +|----------|-------------|---------|---------|----------| +| org.xerial.snappy | snappy-java | 1.1.10.7 | Apache 2.0 | https://github.com/xerial/snappy-java | + ### Server and Networking | Group ID | Artifact ID | Version | License | Homepage | diff --git a/docs/timeseries.md b/docs/timeseries.md new file mode 100644 index 0000000000..aaf483faae --- /dev/null +++ b/docs/timeseries.md @@ -0,0 +1,2474 @@ +# ArcadeDB TimeSeries Module — Research Report & Implementation Plan + +## Implementation Progress (last updated: 2026-02-23) + +### Completed +- **Phase 1: Core Storage Engine** — TimeSeries type, columnar storage with Gorilla/Delta-of-Delta/Simple-8b/Dictionary codecs, sealed bucket compaction, shard-per-core parallelism, Line Protocol ingestion (HTTP handler), retention policies, `CREATE TIMESERIES TYPE` SQL statement, `FetchFromTimeSeriesStep` query executor, basic SQL queries (`SELECT`, `WHERE`, `GROUP BY`, `ORDER BY`) +- **Phase 2: Analytical Functions** — All 9 timeseries SQL functions implemented and tested: + - `ts.first(value, ts)` / `ts.last(value, ts)` — first/last value by timestamp + - `ts.rate(value, ts [, counterResetDetection])` — per-second rate of change with optional counter reset detection (3rd param = `true` enables Prometheus-style reset handling for monotonic counters) + - `ts.delta(value, ts)` — difference between first and last values + - `ts.movingAvg(value, window)` — moving average with configurable window + - `ts.interpolate(value, method [, timestamp])` — gap filling (zero/prev/linear/none methods; linear interpolation requires timestamp parameter) + - `ts.correlate(a, b)` — Pearson correlation coefficient + - `ts.timeBucket(interval, ts)` — time bucketing for GROUP BY aggregation + - `ts.percentile(value, percentile)` — approximate percentile calculation (0.0-1.0, e.g. 0.95 for p95, 0.99 for p99) with sorted exact computation and linear rank interpolation + - `ts.lag(value, offset, timestamp [, default])` — previous row value (window function) + - `ts.lead(value, offset, timestamp [, default])` — next row value (window function) + - `ts.rowNumber(timestamp)` — sequential 1-based row numbering (window function) + - `ts.rank(value, timestamp)` — rank with ties, gaps after ties (window function) + +- **Phase 3: Continuous Aggregates** — Watermark-based incremental refresh for pre-computed timeseries rollups: + - `ContinuousAggregate` interface and `ContinuousAggregateImpl` with watermark tracking, atomic refresh guard, JSON persistence, metrics + - `ContinuousAggregateRefresher` — incremental refresh: deletes stale buckets from watermark, re-runs filtered query, inserts results, advances watermark + - `ContinuousAggregateBuilder` — fluent API with validation (source must be TimeSeries type, query must contain `ts.timeBucket()`, must have GROUP BY) + - Schema integration: `LocalSchema` stores/persists CAs in JSON, protects source/backing types from drop, crash recovery (BUILDING→STALE on restart) + - Post-commit trigger via `SaveElementStep.saveToTimeSeries()` → `TransactionContext.addAfterCommitCallbackIfAbsent()` schedules incremental refresh + - SQL DDL: `CREATE CONTINUOUS AGGREGATE [IF NOT EXISTS] name AS select`, `DROP CONTINUOUS AGGREGATE [IF EXISTS] name`, `REFRESH CONTINUOUS AGGREGATE name` + - Schema metadata: `SELECT FROM schema:continuousAggregates` returns name, query, sourceType, bucketColumn, bucketIntervalMs, watermarkTs, status, metrics + - 19 tests (12 API + 7 SQL), all passing + +- **Streaming Query Pipeline** — Full OOM fix for large dataset queries: + - Lazy page-level iterators replacing materialized `List` throughout the query chain: `TimeSeriesBucket.iterateRange()` → `TimeSeriesSealedStore.iterateRange()` → `TimeSeriesShard.iterateRange()` → `TimeSeriesEngine.iterateQuery()` + - Memory usage reduced from O(totalRows) to O(shardCount × blockSize) — constant memory regardless of dataset size + - Merge-sort across shard iterators using `PriorityQueue` min-heap sorted by timestamp + - Binary search on sealed block directory for O(log B) block selection instead of linear scan + - Binary search within blocks using `lowerBound()`/`upperBound()` on sorted timestamp arrays + - Lazy column decompression: timestamps decoded first, value columns only if rows match time range + - Early termination: stops scanning blocks once `minTimestamp > toTs` + - Empty bucket short-circuit: `getSampleCount() == 0` skips scanning entirely (critical after compaction clears mutable pages) + - Chunked compaction: writes 65K-row sealed blocks instead of one giant block per shard (configurable via `SEALED_BLOCK_SIZE`) + - Sealed store directory persistence: inline block metadata (`BLOCK_MAGIC_VALUE = 0x5453424C`) enables cold queries after close/reopen without losing block index + - Profiling integration: `FetchFromTimeSeriesStep` uses `context.isProfiling()` pattern with `cost` and `rowCount` accumulation, visible via `PROFILE SELECT ...` + - `BitReader` optimization: byte-level batch reads instead of per-bit loop for faster codec decompression +- **Cold Open Persistence** — TimeSeries types and data survive database close/reopen: + - `.tstb` file extension registered in `SUPPORTED_FILE_EXT` (FileManager) and `ComponentFactory` (schema loader) + - `TimeSeriesBucket.PaginatedComponentFactoryHandler` creates stub buckets on load; columns set later via `setColumns()` + - `TimeSeriesShard` constructor detects already-loaded buckets via `LocalSchema.getFileByName()` to avoid duplicate creation + - `LocalSchema.readConfiguration()` calls `initEngine()` on `LocalTimeSeriesType` instances during deserialization + - Sealed store block directory reconstructed from inline metadata on cold open (`loadDirectory()`) + +- **Block-Level Aggregation Statistics** — Per-block min/max/sum statistics stored alongside compressed data: + - `BlockEntry` stores `columnMins[]`, `columnMaxs[]`, `columnSums[]` for numeric columns + - Fast path in `aggregateMultiBlocks()`: when entire block fits in a single time bucket, uses block stats directly — zero decompression + - Stats section persisted in sealed store block header and reconstructed on cold open + +- **Aggregation Performance Optimization** — 50M-row aggregation reduced from ~3,400ms to ~710ms (4.8x improvement): + - `AggregationMetrics` instrumentation: timing breakdown per phase (I/O, timestamp decompression, value decompression, accumulation) with block category counters (fast/slow/skipped). Displayed in `PROFILE` output via `AggregateFromTimeSeriesStep` + - Flat array accumulation in `MultiColumnAggregationResult`: pre-allocated `double[][]`/`long[][]` indexed by `(bucketTs - firstBucket) / interval`, eliminating 50M HashMap lookups and Long autoboxing. Data range computed from `getGlobalMinTimestamp()`/`getGlobalMaxTimestamp()` across shards + - SIMD vectorized accumulation: slow path uses `TimeSeriesVectorOps.sum()/min()/max()` on contiguous timestamp segments within each block, turning 65,536 per-element operations into ~2 vectorized segment calls per block via binary search on bucket boundaries + - Parallel shard processing: sealed stores processed concurrently via `CompletableFuture.supplyAsync()` per shard, results merged via `MultiColumnAggregationResult.mergeFrom()`. Mutable buckets processed sequentially on calling thread (requires database context) + - Coalesced I/O: single `pread()` per block via `readBlockData()` reads all column data contiguously, then `sliceColumn()` extracts individual columns — halves syscall count + - Reusable decode buffers: `long[65536]` and `double[65536]` allocated once per `aggregateMultiBlocks()` call, reused across all blocks. Buffer-reuse `decode()` overloads added to `DeltaOfDeltaCodec` and `GorillaXORCodec` + - `BitReader` sliding-window register: pre-loaded 64-bit MSB-aligned window with lazy refill — `readBits(n)` extracts top n bits via single shift, refill amortized every ~7-8 bytes consumed. Eliminates per-call byte-assembly loop (decompVal 1305ms → 1224ms, ~6% improvement — JIT already optimized the old loop effectively) + - Bucket-aligned compaction: `COMPACTION_INTERVAL` DDL option splits sealed blocks at time bucket boundaries during compaction, ensuring each block fits entirely within one bucket for 100% fast-path aggregation. SQL syntax: `CREATE TIMESERIES TYPE ... COMPACTION_INTERVAL 1 HOURS`. Config persisted in schema JSON and threaded through `TimeSeriesEngine` → `TimeSeriesShard` + - 213 timeseries tests passing, zero regressions + +- **Phase 4: Downsampling Policies** — Automatic resolution reduction for old data: + - `DownsamplingTier` record: `afterMs` (age threshold) + `granularityMs` (target resolution), with validation + - Schema persistence: `downsamplingTiers` field in `LocalTimeSeriesType` with JSON serialization/deserialization — backward-compatible with old schemas (null-safe `getJSONArray`) + - Builder API: `TimeSeriesTypeBuilder.withDownsamplingTiers(List)` for programmatic type creation + - DDL: `ALTER TIMESERIES TYPE ADD DOWNSAMPLING POLICY AFTER GRANULARITY [AFTER ...]` and `ALTER TIMESERIES TYPE DROP DOWNSAMPLING POLICY` + - Grammar: 3 new lexer tokens (`DOWNSAMPLING`, `POLICY`, `GRANULARITY`), parser rules (`alterTimeSeriesTypeBody`, `downsamplingTierClause`, `tsTimeUnit`), soft-keyword registration + - `AlterTimeSeriesTypeStatement` DDL statement with `SQLASTBuilder.visitAlterTimeSeriesTypeStmt()` visitor — time unit parsing reuses existing DAYS/HOURS/MINUTES tokens + - `TimeSeriesEngine.applyDownsampling(tiers, nowMs)`: iterates tiers sorted by afterMs, identifies timestamp/tag/numeric column roles, delegates to sealed store per shard + - `TimeSeriesSealedStore.downsampleBlocks()`: density-check idempotency (blocks already at target resolution are skipped), tag-grouped AVG aggregation per `(bucketTs, tagKey)`, atomic tmp-file rewrite with CRC32 + - Multi-tier behavior: tiers applied independently; density check naturally handles hierarchy (1min blocks pass 1min check but fail 1hr check when tier 2 cutoff reached) + - 7 new tests: DDL add/drop with persistence across close/reopen, single-tier accuracy (AVG=30.5 for 1..60), multi-tier, idempotency, multi-tag grouping, retention interaction, empty engine no-op + +- **Phase 5: HTTP API & Studio Integration** — Dedicated REST endpoints and web-based TimeSeries Explorer: + - **REST Endpoints**: 3 dedicated timeseries HTTP handlers registered in `HttpServer`: + - `POST /api/v1/ts/{database}/write` — InfluxDB Line Protocol ingestion with configurable precision (`ns`/`us`/`ms`/`s`), `requiresJsonPayload()=false` to avoid JSON parsing of plain-text body + - `POST /api/v1/ts/{database}/query` — JSON query endpoint supporting raw queries (time range, field projection, tag filtering, limit) and aggregated queries (`AVG`/`SUM`/`MIN`/`MAX`/`COUNT` with configurable bucket intervals via `aggregateMulti()`) + - `GET /api/v1/ts/{database}/latest?type=name&tag=key:value` — returns most recent data point with optional single-tag filter + - **Studio TimeSeries Tab**: Full-featured explorer accessible from the main navigation sidebar: + - Header bar: database selector (synced `.inputDatabase` class), TimeSeries type dropdown with live sample count, Create/Drop type buttons + - **Query sub-tab**: single-row controls (Time Range, Aggregation, Bucket Interval, Field checkboxes), Query/Latest/Auto-refresh buttons, ApexCharts line/area chart with datetime x-axis and zoom, DataTable with pagination, chart/table toggle switches with per-database localStorage persistence, query execution time display + - **Schema sub-tab**: detailed type introspection — TimeSeries Columns (with TIMESTAMP/TAG/FIELD role badges), Diagnostics cards (total samples, shards, time range), Configuration table, Downsampling Tiers, per-Shard Details (sealed/mutable block counts, timestamps) + - **Ingestion sub-tab**: comprehensive documentation with 4 ingestion methods (SQL CREATE TYPE, InfluxDB Line Protocol with curl/Python examples, SQL INSERT, Java embedded API), method comparison table + - **API Panel Integration**: TimeSeries section in HTTP API Reference with all 3 endpoints documented, including query parameter support, request/response examples, and working "Try It" playground (with `text/plain` content type handling for Line Protocol) + - **Query tab layout**: "Connected as" bar moved above Database Info sidebar for uniform positioning across Query/Database/TimeSeries tabs + - 10 integration tests in `TimeSeriesQueryHandlerIT`: raw query, aggregated query, tag filter, field projection, missing/invalid type errors, latest value, latest with tag, latest on empty type + +- **Competitive Gap Closure (P0/P1)** — Critical features identified from gap analysis against top 10 TSDBs (InfluxDB 3, TimescaleDB, Prometheus, QuestDB, TDengine, ClickHouse, Kdb+, Apache IoTDB, VictoriaMetrics, Grafana Mimir): + - **Counter Reset Detection in `ts.rate()`** — Optional 3rd parameter enables Prometheus-style counter reset handling. When `true`, detects value decreases and treats post-reset values as increments from 0. Default behavior (simple `(last-first)/timeDelta`) preserved for backward compatibility with gauge-type data + - **Time Range Operators Beyond BETWEEN** — `SelectExecutionPlanner.extractTimeRange()` now handles `>`, `>=`, `<`, `<=`, `=` operators on the timestamp column, pushing them down to the TimeSeries engine. Multiple range conditions ANDed together (tightest bounds win). Previously only `BETWEEN` was pushed down; other operators caused full scans + - **Multi-Tag Filtering** — `TagFilter` redesigned to support multiple tag conditions ANDed together via `and()` and `andIn()` chaining methods. HTTP query handler updated to iterate over all tags in the request JSON (previously only used the first tag). Backward compatible: existing `eq()` and `in()` factory methods still work + - **Automatic Retention/Downsampling Scheduler** — `TimeSeriesMaintenanceScheduler` runs as a daemon thread (60s interval), automatically applying retention and downsampling policies. Follows the same pattern as `MaterializedViewScheduler`. Integrated into `LocalSchema` (lazy init, shutdown on close) and `TimeSeriesTypeBuilder` (scheduled on type creation). Previously required explicit `applyRetention()` / `applyDownsampling()` calls from application code + - **Linear Interpolation** — `ts.interpolate(value, 'linear', timestamp)` added as a 4th method. Interpolates null values using linear interpolation between surrounding non-null values. Requires optional 3rd parameter for timestamps + - **Approximate Percentiles** — New `ts.percentile(value, percentile)` function (registered as `SQLFunctionTsPercentile`). Works with GROUP BY for per-bucket percentile calculation (e.g., `ts.percentile(latency, 0.99)` for p99) + - 19 new tests in `TimeSeriesGapAnalysisTest` + +- **Block-Level Tag Metadata for Sealed Blocks** — Per-block distinct tag values stored in sealed block headers, enabling three-way block decision during tag-filtered queries: + - **Block header tag metadata section**: after numeric stats, stores `tagColCount` (2 bytes) + per-TAG column `distinctCount` + UTF-8 encoded distinct values. Minimal overhead: a block with one tag column holding "TSLA" costs 10 bytes per block header + - **Sealed format version upgrade**: `CURRENT_VERSION` bumped from 0 to 1. `loadDirectory()` reads tag metadata for version ≥ 1. Auto-migration via `upgradeFileToVersion1()` rewrites version-0 blocks with empty tag metadata on first `appendBlock()` + - **Three-way block decision** (`BlockMatchResult` enum): + - **SKIP** — filtered tag value not in block's distinct set → skip entire block (zero decompression) + - **FAST_PATH** — block has exactly 1 distinct value for the filtered tag AND it matches → use block-level stats (min/max/sum/count) directly, zero decompression + - **SLOW_PATH** — block has multiple distinct values including the filtered one → decompress tag columns via `DictionaryCodec.decode()`, filter rows inline + - **Compaction integration**: `TimeSeriesShard.compact()` collects distinct tag values per chunk via `LinkedHashSet` and passes `String[][] tagDistinctValues` to `appendBlock()` + - **Aggregation path**: `aggregateMultiBlocks()` accepts `TagFilter` parameter, applies three-way decision per block. Removed the `aggregateMultiWithTagFilter()` row-by-row fallback from `TimeSeriesEngine` — sealed store handles block-level skipping internally. Mutable bucket tag filtering remains row-level + - **Query path**: `scanRange()` and `iterateRange()` accept `TagFilter` parameter, skip non-matching blocks entirely + - **Downsampling integration**: `downsampleBlocks()` computes tag metadata for newly created blocks + - CRC32 integrity covers tag metadata section + - 3 new tests in `TimeSeriesGapAnalysisTest`: `testTagFilterBlockSkipping`, `testTagFilterAggregationAfterCompaction`, `testTagFilterNonexistentTag` + - All 213 timeseries tests passing, zero regressions + +- **Grafana Integration** — Grafana DataFrame-compatible HTTP endpoints for visualization via the Grafana Infinity datasource plugin (no custom plugin needed): + - `GET /api/v1/ts/{database}/grafana/health` — datasource health check (verifies database exists) + - `GET /api/v1/ts/{database}/grafana/metadata` — discovers TimeSeries types, fields, tags, and available aggregation types + - `POST /api/v1/ts/{database}/grafana/query` — multi-target query returning Grafana DataFrame wire format (columnar arrays with schema metadata); supports raw queries, aggregated queries (SUM/AVG/MIN/MAX/COUNT), tag filtering, field projection, and automatic bucket interval calculation from `maxDataPoints` + - Shared `TimeSeriesHandlerUtils` utility class extracted from `PostTimeSeriesQueryHandler` (tag filter building, column index resolution) + - 8 integration tests in `GrafanaTimeSeriesHandlerIT`: health, metadata, raw query, aggregated query, multi-target, tag filter, auto maxDataPoints, missing targets + +- **Phase 6: PromQL Query Language** — Native PromQL support for Prometheus-compatible observability: + - **`PromQLParser`** — Hand-written recursive-descent parser covering: instant vector selectors (`metric{label=~"re"}`), range vector selectors (`metric[5m]`), binary expressions (`+`, `-`, `*`, `/`, `%`, `^`, comparison ops, logical `and`/`or`/`unless`), aggregations (`sum`, `min`, `max`, `avg`, `count`, `topk`, `bottomk`, `quantile` with `by`/`without` clauses), function calls (`rate`, `irate`, `increase`, `delta`, `idelta`, `avg_over_time`, `min_over_time`, `max_over_time`, `sum_over_time`, `count_over_time`, `stddev_over_time`, `absent`, `ceil`, `floor`, `round`, `abs`, `sqrt`, `exp`, `ln`, `log2`, `log10`, `clamp_min`, `clamp_max`, `histogram_quantile`, `label_replace`, `label_join`, `vector`, `scalar`, `time`, `pi`), unary negation, and scalar literals. Duration parsing (`5m`, `1h`, `30s`, `1d`, `1w`, `1y`). `PromQLParser.parseDuration(str)` also available as utility + - **`PromQLEvaluator`** — Two-phase evaluation: `evaluateInstant(expr, evalTimeMs)` for instant vector results, `evaluateRange(expr, startMs, endMs, stepMs)` for matrix results. Vector selector → 5-minute lookback window (configurable via `PromQLEvaluator(database, lookbackMs)` constructor). Binary op label matching, aggregation with group-by/without, range function window computation via `TimeSeriesEngine.iterateQuery()`. ReDoS-safe regex compilation with pattern cache (ConcurrentHashMap, 512-entry LRU-style eviction). Validates regex patterns against nested quantifier and alternation+quantifier patterns before compilation + - **`PromQLResult`** — Sealed interface with three implementations: `InstantVector(List)`, `MatrixResult(List)`, `ScalarResult(double)`. `VectorSample` holds labels, value, and timestamp. `MatrixSeries` holds labels and a list of `(timestamp, value)` pairs + - **`PromQLFunctions`** — All range functions implemented: `rate`/`irate`/`increase`/`delta`/`idelta` use first/last sample arithmetic; `avg_over_time` etc. compute window statistics. `absent()` inverts presence. Math functions delegate to `Math.*` + - **HTTP API — PromQL-compatible endpoints** (base: `/ts/{database}/prom`): + - `GET /ts/{database}/prom/api/v1/query?query=&time=` — instant query; `time` is Unix seconds (float); defaults to current time; optional `lookback_delta` override + - `GET /ts/{database}/prom/api/v1/query_range?query=&start=&end=&step=` — range query; all timestamps in Unix seconds; `step` can be a duration string or seconds float + - `GET /ts/{database}/prom/api/v1/labels` — lists all TimeSeries type names (metric names) and their tag column names as PromQL label names + - `GET /ts/{database}/prom/api/v1/label/{name}/values` — returns distinct values for a given label across all TimeSeries types + - `GET /ts/{database}/prom/api/v1/series?match[]=` — returns series matching the given selector(s); evaluates each selector against all TimeSeries types + - All endpoints return standard Prometheus JSON `{status: "success", data: {...}}` format via `PromQLResponseFormatter` + - **SQL function `promql(expr [, evalTimeMs])`** — Calls the PromQL evaluator from within SQL. Returns a result row per matched sample (each map's keys become row properties; `__value__` holds the numeric value): + ```sql + -- Instant query at explicit time + RETURN promql('cpu_usage{host="srv1"}', 1700000000000) + -- Scalar arithmetic + RETURN promql('2 + 3 * 4', 1000) + -- Current time (no second arg) + RETURN promql('rate(http_requests_total[5m])') + ``` + Registered as `promql` in `DefaultSQLFunctionFactory`; min 1 arg, max 2 args + - **Studio — PromQL Explorer tab**: new "PromQL" sub-tab alongside Query/Schema/Ingestion in the TimeSeries Explorer; expression input, instant / range toggle, time controls, executes via `/prom/api/v1/query` or `/prom/api/v1/query_range`, renders results in DataTable and ApexCharts line chart + - **14 integration tests in `PromQLHttpHandlerIT`**: instant query (scalar, instant vector, rate, sum by, binary op), range query, labels, label values, series, error cases (missing query param, invalid PromQL, zero step) + +- **Prometheus Remote Write / Read Protocol** — Drop-in Prometheus remote storage backend: + - **`POST /ts/{database}/prom/write`** (`PostPrometheusWriteHandler`) — accepts Prometheus `remote_write` Protobuf payload (snappy-compressed). Decodes `WriteRequest.timeseries[]` using hand-written `ProtobufDecoder`; each `TimeSeries` is mapped to an ArcadeDB TimeSeries type named after the `__name__` label. Tags become TimeSeries tag columns (via `CREATE TIMESERIES TYPE IF NOT EXISTS`). Samples bulk-appended via `DatabaseAsyncAppendSamples`. Returns HTTP 204 on success + - **`POST /ts/{database}/prom/read`** (`PostPrometheusReadHandler`) — accepts Prometheus `remote_read` Protobuf query payload (snappy-compressed). Each `Query` has matchers, start/end timestamps. Evaluates against the named TimeSeries type; encodes the response as a `ReadResponse.QueryResult` containing `TimeSeries[]` Protobuf objects, snappy-compressed. Supports `=`, `!=`, `=~`, `!~` label matchers + - **`ProtobufDecoder`** — Minimal hand-written protobuf wire-format decoder (no generated code, no protobuf library dependency): varint (wire type 0), fixed64 (wire type 1), length-delimited (wire type 2). Varint capped at 64 bits; length-delimited bounds-checked against remaining buffer. Used for both write and read request decoding + - **`ProtobufEncoder`** — Minimal protobuf encoder for building `ReadResponse` messages: `writeTag()`, `writeVarint()`, `writeLengthDelimited()`, `writeFixed64AsDouble()`, `writeSInt64()`. Outputs to `ByteArrayOutputStream` + - **`PrometheusTypes`** — Protobuf field constants for all message types in the `remote.proto` schema: `WriteRequest`, `TimeSeries`, `Label`, `Sample`, `ReadRequest`, `Query`, `LabelMatcher`, `QueryResult`, `ReadResponse` + - **snappy-java dependency** added to `server/pom.xml` (Apache 2.0) for frame-format decompression/compression of all remote_write/read payloads. Recorded in `ATTRIBUTIONS.md` + - **8 integration tests in `PrometheusRemoteWriteReadIT`**: write single series, write multi-series, read back written data, label matcher `=`/`!=`/`=~`/`!~`, empty result for unmatched matcher, round-trip write+read consistency + +### In Progress / Not Yet Started + +#### Competitive Gap Analysis — Prioritized Roadmap + +Gap analysis comparing ArcadeDB's TimeSeries against top 10 TSDBs: InfluxDB 3, TimescaleDB, Prometheus, QuestDB, TDengine, ClickHouse, Kdb+, Apache IoTDB, VictoriaMetrics, Grafana Mimir. + +**P0 — Table Stakes (standard in 7+/10 TSDBs):** +- ~~Counter reset handling in `ts.rate()`~~ — **DONE** (optional 3rd param) +- ~~Time range operators beyond BETWEEN~~ — **DONE** (`>`, `>=`, `<`, `<=`, `=` pushed down) +- ~~Multi-tag filtering~~ — **DONE** (ANDed multi-tag conditions) +- ~~Automatic retention/downsampling scheduler~~ — **DONE** (daemon thread, 60s interval) +- ~~PromQL / MetricsQL query language~~ — **DONE** (native `PromQLParser` + `PromQLEvaluator`; 5 HTTP endpoints under `/ts/{db}/prom/api/v1/`; `promql()` SQL function; Studio PromQL Explorer tab; 14 integration tests) +- ~~Grafana native datasource plugin~~ — **DONE** (Grafana DataFrame-compatible endpoints: `GET /ts/{db}/grafana/health`, `GET /ts/{db}/grafana/metadata`, `POST /ts/{db}/grafana/query` — works with Grafana Infinity datasource plugin, no custom plugin needed) +- ~~Prometheus `remote_write` / `remote_read` protocol~~ — **DONE** (`POST /ts/{db}/prom/write` + `POST /ts/{db}/prom/read`; hand-written protobuf decoder/encoder; snappy compression; 8 integration tests) +- **Alerting & recording rules** — Built-in alerting on metric thresholds with routing (email, Slack, PagerDuty). Recording rules pre-compute expensive queries (7+/10 TSDBs) + +**P1 — Core Analytics (present in 4-6/10 TSDBs):** +- ~~Approximate percentiles (p50/p95/p99)~~ — **DONE** (`ts.percentile` function) +- ~~Linear interpolation in gap filling~~ — **DONE** (`ts.interpolate` 'linear' method) +- **OpenTelemetry (OTLP) ingestion** — CNCF standard for observability. OTLP (gRPC + HTTP) becoming the universal ingest protocol (5/10 and growing) +- ~~Window functions for TimeSeries queries~~ — **DONE** (`ts.lag`, `ts.lead`, `ts.rowNumber`, `ts.rank`) +- **Cardinality management & monitoring** — Tools to explore, limit, and alert on cardinality growth. The `DictionaryCodec` has a hard 65,535 limit per block that throws at runtime with no warning +- **Streaming / real-time aggregation at ingestion** — Pre-aggregate data at ingestion time to reduce storage and query cost (e.g., reduce 1s samples to 1min before storage) +- **ASOF JOIN / temporal joins** — Find closest timestamp match between two time series without exact alignment. Critical for correlating data from sensors with different sampling rates (3/10 TSDBs) + +**P2 — Ecosystem & Advanced Features (present in 2-3/10 TSDBs):** +- **TimeSeries via PostgreSQL wire protocol** — ArcadeDB already has `postgresw` module. Enabling TS queries through it would unlock all PostgreSQL client libraries, BI tools (Tableau, Metabase, Superset), JDBC/ODBC connectivity +- **Native histogram support** — Modern way to capture latency distributions without pre-defined bucket boundaries (Prometheus 3.0, VictoriaMetrics 3.0) +- **Tiered storage (hot/warm/cold with object storage)** — Object storage (S3/GCS) backends reduce cost 10-100x for historical data (5/10 TSDBs) +- **Parquet/Arrow export/import** — Standard format for data lakes. Enables interop with Spark, Pandas, DuckDB +- **MQTT protocol support** — Dominant protocol for IoT devices. Native ingestion eliminates broker middleware (TDengine, Apache IoTDB) +- **Exemplars (trace-to-metrics linking)** — Attach trace IDs to metric samples for click-through from metric spike to distributed trace +- **Anomaly detection** — ML-based anomaly scoring on time series (emerging feature, VictoriaMetrics enterprise) + +**Competitive Advantages (what ArcadeDB has that others don't):** +- Multi-model in one engine: Graph + Document + K/V + TimeSeries in the same database with cross-model queries +- Embeddable: can run as a Java library inside an application (no separate process) +- InfluxDB Line Protocol ingestion: already implemented (most TSDBs except InfluxDB don't have this natively) +- Continuous aggregates with auto-refresh on commit: post-INSERT trigger-based refresh is more immediate than TimescaleDB's policy-based refresh +- SIMD-vectorized aggregation: Java Vector API usage for aggregation is cutting-edge + +- **Graph + TimeSeries Integration** — Cross-model queries (e.g., `MATCH {type: Device} -HAS_METRIC-> {type: Sensor} WHERE ts.rate(value, ts) > 100`) + +--- + +## High Availability (HA) and Multi-Node Behaviour + +### How TimeSeries data flows in an HA cluster + +ArcadeDB's HA layer replicates data at the page level through the `PaginatedComponent` infrastructure. TimeSeries storage uses two layers: + +| Layer | File extension | Storage mechanism | Replicated? | +|---|---|---|---| +| Mutable bucket | `.tstb` | `TimeSeriesBucket extends PaginatedComponent` | **Yes** — changes are page-replicated to all followers in real time | +| Sealed store | `.ts.sealed` | `TimeSeriesSealedStore` via `RandomAccessFile` / `FileChannel` | **No** — local-only files | + +### Why sealed stores are not replicated + +This is by design. The sealed store is a *derived* artefact: it is produced by compacting the mutable bucket. Since the mutable bucket data is already replicated, every HA node independently holds all the source data it needs to perform its own compaction. Replicating the sealed files as well would double the I/O cost for no benefit. + +The `TimeSeriesMaintenanceScheduler` runs as a daemon thread on each node and periodically compacts mutable data into the local sealed store. Each node therefore converges to the same sealed state independently. + +### Behaviour after a failover + +Immediately after a follower is promoted to leader (or a new follower joins), it may not yet have compacted all mutable data into its sealed store. In that case: + +- **Range queries** still return correct results: the engine queries both the sealed store and the mutable bucket for every time range. +- **Aggregation queries** also remain correct: `aggregateMulti()` processes sealed blocks (fast path) and then iterates the mutable bucket (slow path) for the same time window. +- **Compaction lag**: a follower that has not yet compacted may serve reads slightly more slowly until its maintenance scheduler runs the next compaction cycle (default interval: 60 seconds). + +### Summary + +| Concern | Answer | +|---|---| +| Is in-flight mutable data replicated? | Yes, via the normal page-replication protocol | +| Are sealed store files replicated? | No — each node compacts independently | +| Are reads consistent immediately after failover? | Yes — the mutable bucket covers the gap | +| Is there a performance impact after failover? | Queries may be slower until compaction catches up | + +--- + +## Context + +ArcadeDB users are requesting native TimeSeries support, with the key requirement being **fast range queries**. ArcadeDB is uniquely positioned as a multi-model database (Graph, Document, Key/Value, Search, Vector) to become the first production database that **natively unifies graph traversal with timeseries aggregation** in a single query engine — a gap confirmed by a January 2025 SIGMOD survey paper (arXiv:2601.00304). + +This document presents: (1) a competitive landscape analysis, (2) the underlying technology that makes TSDBs fast, (3) how ArcadeDB's existing architecture compares, (4) graph+timeseries integration opportunities, (5) the query & ingestion interface (SQL, OpenCypher, HTTP Line Protocol, Java API), (6) the two-layer storage architecture with shard-per-core parallelism, and (7-8) a phased implementation plan. + +--- + +## Part 1: Competitive Landscape — Top TimeSeries Databases + +### 1.1 Open Source + +| Database | Storage Engine | Compression | Fast Range Query Technique | Query Language | License | +|---|---|---|---|---|---| +| **InfluxDB 3.0** | Apache Arrow + Parquet (Rust rewrite) | Parquet native (Delta, Dict, Snappy/ZSTD) | Time-partitioned Parquet files + DataFusion vectorized execution + predicate pushdown | SQL + InfluxQL | MIT (core) | +| **TimescaleDB** | PostgreSQL extension: row-based "hypertable" chunks → columnar compression | 7 algorithms: Gorilla (floats), Delta-of-delta (timestamps), Simple-8b RLE, Dictionary, LZ4 | Chunk exclusion (prune time ranges), B-tree on time column per chunk, continuous aggregates | Full PostgreSQL SQL | Apache 2.0 (core) | +| **QuestDB** | Custom columnar, one memory-mapped file per column per partition (Java+C++) | ZFS-level + Parquet for cold tier | SIMD-accelerated scans (SSE2/AVX2), time partitions, zero-copy mmap, parallel partition execution | SQL (PG wire protocol) | Apache 2.0 | +| **ClickHouse** | MergeTree — columnar parts sorted by primary key | Composable codecs: DoubleDelta + Gorilla + T64 + LZ4/ZSTD | Sparse primary index (1 entry per 8192-row granule), partition pruning, vectorized SIMD execution | Full SQL | Apache 2.0 | +| **TDengine** | LSM-tree with SkipList/Red-Black Tree memtables | Delta + Gorilla + LZ4/ZSTD (two-level encoding) | Time-based partitioning, one sub-table per device, SkipList in-memory | TDengine SQL | **AGPL 3.0** | +| **VictoriaMetrics** | MergeTree-inspired (Go), column-oriented parts | Gorilla + ZSTD → **0.4 bytes/sample** (best in class) | Monthly partitions, MergeSet label index, bitmap series filtering | PromQL / MetricsQL | Apache 2.0 | +| **Prometheus** | Block-based chunks with inverted label index (Go) | Gorilla encoding → ~1.37 bytes/sample | Block-level time metadata pruning, posting list intersection for label matching | PromQL | Apache 2.0 | +| **Apache IoTDB** | LSM-tree + custom TsFile columnar format | Delta, RLE, Gorilla, Snappy/LZ4/ZSTD per chunk | Chunk-level min/max stats, device-level data grouping, in-memory index | SQL-like | Apache 2.0 | + +### 1.2 Commercial / Cloud + +| Database | Architecture | Key Differentiator | +|---|---|---| +| **Kdb+ (KX)** | In-memory RDB → Intraday IDB → Historical HDB (columnar flat files, mmap'd) | 30+ years in finance; q language is inherently vectorized; sub-millisecond on tick data | +| **Amazon Timestream** | Serverless; memory store (row) → magnetic store (columnar); **now deprecated in favor of InfluxDB 3** | Auto lifecycle management; but closed to new customers as of June 2025 | +| **Azure Data Explorer (Kusto)** | Distributed columnar extents; EngineV3 | Built-in ML: seasonality detection, anomaly detection, forecasting; KQL language | +| **Datadog Monocle** | Rust, shard-per-core LSM (one LSM instance per CPU core, lock-free writes) | 60x ingestion improvement; tag-hash-based sharding; zero-contention architecture | + +--- + +## Part 2: Underlying Technology — What Makes TSDBs Fast + +### 2.1 The Three Pillars of Fast Range Queries + +**Pillar 1: Time-Based Partitioning (Eliminate I/O)** +Every top TSDB partitions by time. When you query `WHERE timestamp BETWEEN X AND Y`, partitions outside that range are **never touched** — no I/O at all. This is the single biggest speedup. Granularity varies: hours (Prometheus), days (QuestDB, Kdb+), weeks (TimescaleDB), months (VictoriaMetrics), or configurable. + +**Pillar 2: Columnar Storage + Compression (Minimize I/O)** +Once you've narrowed to the right partitions, columnar storage ensures you only read the columns you need. `SELECT avg(temperature)` reads only the temperature column, not humidity, pressure, etc. This can reduce I/O by 10-100x for wide tables. Combined with timeseries-specific compression: + +| Algorithm | Target | How It Works | Compression | +|---|---|---|---| +| **Delta-of-delta** | Timestamps | Regular intervals → delta is constant → delta-of-delta is 0 → 1 bit | 96% → 1 bit | +| **Gorilla XOR** | Float values | XOR consecutive IEEE 754 floats → many leading/trailing zeros → store only middle bits | 51% → 1 bit, avg 1.37 B/pair | +| **Simple-8b RLE** | Integers | Pack multiple small ints into 64-bit words with run-length encoding | 4-8x | +| **Dictionary** | Tags/labels | Map low-cardinality strings to integer IDs | 10-100x for tags | +| **T64** | Integers | Find minimum bit-width needed | 2-4x | + +Combined result: **0.4 to 1.37 bytes per (timestamp, value) pair** vs. 16 bytes uncompressed. + +**Pillar 3: Vectorized Execution + SIMD (Maximize CPU)** +Once data is in memory in columnar format, process it in batches using CPU SIMD instructions: +- QuestDB: AVX2 for filtering and aggregation +- ClickHouse: Processes 65,505-row blocks with SIMD +- Kdb+: Language-level vectorization (all operations work on arrays) +- InfluxDB 3: DataFusion's vectorized Arrow-based execution + +### 2.2 Additional Key Techniques + +- **Memory-mapped I/O**: QuestDB and Kdb+ mmap column files for zero-copy access +- **Sparse indexing**: ClickHouse stores 1 index entry per 8192 rows (vs. per-row), saving memory +- **Inverted label indexes**: VictoriaMetrics and Prometheus use inverted indexes for tag/label matching +- **Out-of-order handling**: WAL-based sorting (QuestDB), SkipList memtables (TDengine), dedup indexes (InfluxDB 3) +- **Continuous aggregates**: Pre-compute common rollups (TimescaleDB, InfluxDB, ClickHouse materialized views) +- **Retention policies**: Auto-delete data older than X (every major TSDB) +- **Downsampling**: Reduce resolution of old data (5-second → 1-minute → 1-hour) + +--- + +## Part 3: ArcadeDB's LSM-Tree — Strengths & Gaps + +### 3.1 Current Architecture (from codebase analysis) + +ArcadeDB's LSM-Tree index (`com.arcadedb.index.lsm.*`): +- **Two-level structure**: Mutable (Level-0, append-only pages) → Compacted (Level-1, immutable merged pages) +- **Page size**: 256KB for indexes, 64KB for bucket data +- **Range queries**: `RangeIndex.range(ascending, beginKeys, beginInclusive, endKeys, endInclusive)` — fully supported +- **Compaction**: Multi-way merge with configurable RAM budget, deletion markers, root page with min-keys +- **Bucket system**: Multiple buckets per type, with pluggable `BucketSelectionStrategy` (RoundRobin, Partitioned, Thread-based) +- **Date types**: DATETIME, DATETIME_MICROS, DATETIME_NANOS, DATETIME_SECOND — all present +- **Aggregations**: COUNT, SUM, AVG, MIN, MAX with GROUP BY — present but row-at-a-time + +### 3.2 Where ArcadeDB's LSM-Tree Is Competitive + +- **Write throughput**: LSM-trees excel at append-only workloads (proven by InfluxDB v1/v2, TDengine, IoTDB, VictoriaMetrics, Datadog all using LSM) +- **Sequential I/O**: Flush and compaction produce sequential writes +- **Existing range query support**: The `RangeIndex` interface already handles ordered scans +- **Multi-model flexibility**: No dedicated TSDB offers graph + timeseries natively + +### 3.3 Gaps vs. Dedicated TSDBs + +| Gap | Impact | Dedicated TSDB Approach | +|---|---|---| +| **Row-oriented storage** | Reads ALL columns even if query needs one | Columnar files (1 file per column per partition) | +| **No timeseries compression** | 10-40x more disk/memory than needed | Gorilla, Delta-of-delta, Dictionary encoding | +| **No time-based partitioning** | Range queries scan all data | Automatic time-windowed partitions | +| **Row-at-a-time execution** | CPU underutilized | Vectorized batch execution (Arrow-style) | +| **No SIMD** | 4-8x slower aggregation | AVX2/SSE2 for SUM, AVG, MIN, MAX | +| **~~No continuous aggregates~~** | ~~Repeated expensive queries~~ | ~~Pre-computed rollup tables~~ ✅ **Implemented** (watermark-based incremental refresh) | +| **No retention/downsampling** | Manual data lifecycle | Automatic TTL + resolution reduction | +| **No out-of-order optimization** | Late data may cause performance issues | WAL sorting, SkipList memtables | + +--- + +## Part 4: Graph + TimeSeries — The Killer Multi-Model Feature + +### 4.1 The Opportunity + +A SIGMOD 2025 survey confirms: **no existing production database natively unifies graph traversal with timeseries aggregation**. The HyGraph research project (EDBT 2025, University of Leipzig) proposes this theoretically but has no production implementation. + +ArcadeDB can be first-to-market here. + +### 4.2 High-Value Use Cases + +| Market | Graph Model | TimeSeries Data | Combined Query Example | +|---|---|---|---| +| **Industrial IoT** | Device topology (sensors → machines → lines → plants) | Sensor telemetry (temp, vibration, pressure) | "Average temperature of all sensors downstream of HVAC unit #3 in the last hour" | +| **Observability** | Service dependency graph | Latency, error rate, CPU metrics | "When payment-gateway latency > P99, what's the blast radius on all downstream services?" | +| **FinTech / AML** | Account/entity transaction network | Transaction velocity, amounts over time | "Find accounts receiving from 5+ distinct sources within 10 minutes with no prior history" | +| **Cybersecurity** | Network topology (hosts, services, firewalls) | Security events, traffic volume | "Show hosts that communicated with compromised server + their traffic anomaly patterns" | +| **Digital Twins** | Physical structure (building → floor → room → device) | Live telemetry | "If Pump #3 fails, which downstream components are affected? Show their current operating margins" | +| **Energy / Utilities** | Grid topology | Load, generation, frequency | "Hierarchical energy consumption rollup: campus → building → floor → meter" | +| **Supply Chain** | Supplier → manufacturer → distributor → retailer | Throughput, lead times, inventory levels | "Find bottlenecks where throughput dropped 20% while supplier count stayed constant" | + +### 4.3 Proposed Query Patterns + +**Pattern 1: Graph Traversal + TimeSeries Aggregation** +```sql +SELECT sensor.name, avg(ts.value) AS avg_temp +FROM ( + TRAVERSE out('InstalledIn') FROM (SELECT FROM Building WHERE name = 'Building X') + WHILE $depth <= 3 +) AS sensor +TIMESERIES sensor.temperature AS ts + FROM '2026-02-19' TO '2026-02-20' +WHERE sensor.@type = 'Sensor' +GROUP BY sensor.name +``` + +**Pattern 2: Blast Radius Analysis** +```sql +SELECT service.name, $depth AS hops, + avg(ts.value) AS avg_latency, max(ts.value) AS peak_latency +FROM ( + TRAVERSE out('DependsOn') FROM (SELECT FROM Service WHERE name = 'payment-gateway') + MAXDEPTH 5 +) AS service +TIMESERIES service.latency_p99 AS ts + FROM '2026-02-20T10:00:00Z' TO '2026-02-20T11:00:00Z' + GRANULARITY '1m' +GROUP BY service.name, $depth +ORDER BY $depth, peak_latency DESC +``` + +**Pattern 3: Anomaly Detection with Graph Context** +```sql +SELECT sensor.name, last(ts.value) AS current, + avg(neighbor_ts.value) AS neighbor_avg +FROM Sensor AS sensor +LET neighbors = (SELECT expand(both('ConnectedTo')) FROM $parent.sensor) +TIMESERIES sensor.temperature AS ts LAST '1h' +TIMESERIES neighbors.temperature AS neighbor_ts LAST '1h' +WHERE abs(current - neighbor_avg) > 3 * stdev(neighbor_ts.value) +``` + +**Pattern 4: Correlation Across Connected Entities** +```sql +SELECT e.in.name AS sensor_a, e.out.name AS sensor_b, + correlate(ts_a, ts_b) AS correlation +FROM ConnectedTo AS e +TIMESERIES e.in.vibration AS ts_a LAST '1h' +TIMESERIES e.out.vibration AS ts_b LAST '1h' +WHERE correlation > 0.85 +ORDER BY correlation DESC +``` + +--- + +## Part 5: Query & Ingestion Interface — SQL, Cypher, HTTP, Java API + +### 5.1 SQL DDL — Schema Definition + +New `CREATE TIMESERIES TYPE` statement extending `CreateTypeAbstractStatement` (same pattern as `CreateDocumentTypeStatement` and `CreateVertexTypeStatement`): + +```sql +-- Full syntax +CREATE TIMESERIES TYPE SensorReading [IF NOT EXISTS] + TIMESTAMP ts PRECISION NANOSECOND -- mandatory: designated timestamp column + TAGS (sensor_id STRING, location STRING) -- indexed, low-cardinality + FIELDS ( -- value columns, high-cardinality + temperature DOUBLE, + humidity DOUBLE, + pressure DOUBLE + ) + [SHARDS 8] -- default = availableProcessors() + [PARTITION BY (sensor_id)] -- tag-hash sharding (default: thread affinity) + [RETENTION 90 DAYS] -- auto-delete old data + [COMPACTION INTERVAL 30s] -- how often mutable → sealed + [BLOCK SIZE 50000] -- samples per sealed block + +-- Minimal syntax (defaults for everything optional) +CREATE TIMESERIES TYPE SensorReading + TIMESTAMP ts + TAGS (sensor_id STRING) + FIELDS (temperature DOUBLE) + +-- ALTER: add fields, change retention, adjust shards +ALTER TIMESERIES TYPE SensorReading + ADD FIELD wind_speed DOUBLE + +ALTER TIMESERIES TYPE SensorReading + RETENTION 180 DAYS + +-- DROP +DROP TIMESERIES TYPE SensorReading [IF EXISTS] +``` + +**Timestamp precision options**: `SECOND`, `MILLISECOND`, `MICROSECOND`, `NANOSECOND` (default). Maps to ArcadeDB's `DATETIME_SECOND`, `DATETIME`, `DATETIME_MICROS`, `DATETIME_NANOS` types. + +**Implementation**: New `CreateTimeSeriesTypeStatement extends CreateTypeAbstractStatement`. Overrides `createType(Schema schema)` to call a new `schema.buildTimeSeriesType()` builder that creates the `TimeSeriesType`, N `TimeSeriesShard` instances, and configures the `BucketSelectionStrategy`. + +### 5.2 SQL DML — Ingestion + +#### Single-Row INSERT (Compatible with Existing Syntax) + +```sql +-- Standard ArcadeDB INSERT syntax works +INSERT INTO SensorReading + SET ts = '2026-02-20T10:00:00.000Z', + sensor_id = 'sensor-A', + location = 'building-1', + temperature = 22.5, + humidity = 65.0, + pressure = 1013.25 + +-- Content syntax also works +INSERT INTO SensorReading + CONTENT { + "ts": "2026-02-20T10:00:00.000Z", + "sensor_id": "sensor-A", + "location": "building-1", + "temperature": 22.5, + "humidity": 65.0, + "pressure": 1013.25 + } +``` + +This goes through the standard SQL parser → `InsertExecutionPlanner` → routes to `TimeSeriesEngine.appendSamples()` instead of `LocalBucket.createRecord()`. Works but is slower than batch APIs due to per-row SQL parsing overhead. + +#### Batch INSERT (New Syntax for High-Throughput) + +```sql +-- Batch insert: multiple rows in one statement +INSERT INTO SensorReading + (ts, sensor_id, location, temperature, humidity, pressure) + VALUES + ('2026-02-20T10:00:00Z', 'sensor-A', 'building-1', 22.5, 65.0, 1013.25), + ('2026-02-20T10:00:01Z', 'sensor-A', 'building-1', 22.6, 64.8, 1013.20), + ('2026-02-20T10:00:02Z', 'sensor-A', 'building-1', 22.4, 65.2, 1013.30), + ('2026-02-20T10:00:00Z', 'sensor-B', 'building-2', 19.1, 70.0, 1012.50) + +-- Batch with subquery (import from another type) +INSERT INTO SensorReading + SELECT ts, sensor_id, location, temperature, humidity, pressure + FROM RawImportBuffer + WHERE ts > '2026-02-20' +``` + +Batch inserts are parsed once, then all rows are appended in a single transaction. Shard routing happens per row (different rows may go to different shards based on `BucketSelectionStrategy`). + +### 5.3 SQL Query — TimeSeries Functions + +#### time_bucket() — The Core Aggregation Primitive + +Equivalent to TimescaleDB's `time_bucket()` and QuestDB's `SAMPLE BY`. Implemented as a `SQLFunction` registered via `SQLFunctionFactoryTemplate`. + +```sql +-- Basic time bucketing: 1-hour averages +SELECT time_bucket('1h', ts) AS hour, + sensor_id, + avg(temperature) AS avg_temp, + max(temperature) AS max_temp, + min(temperature) AS min_temp, + count(*) AS sample_count +FROM SensorReading +WHERE ts BETWEEN '2026-02-19' AND '2026-02-20' + AND sensor_id = 'sensor-A' +GROUP BY hour, sensor_id +ORDER BY hour + +-- Supported intervals: 's' (seconds), 'm' (minutes), 'h' (hours), +-- 'd' (days), 'w' (weeks), 'M' (months) +-- Also numeric: '5m', '15m', '30s', '4h', '1d', '1w', '1M' + +-- Gap filling: fill missing time buckets +SELECT time_bucket('1h', ts) AS hour, + sensor_id, + coalesce(avg(temperature), prev(avg(temperature))) AS avg_temp +FROM SensorReading +WHERE ts BETWEEN '2026-02-19' AND '2026-02-20' +GROUP BY hour, sensor_id +ORDER BY hour +``` + +**How it works**: `time_bucket('1h', ts)` truncates the timestamp to the nearest hour boundary: `floor(ts / interval) * interval`. The `AggregateProjectionCalculationStep` uses the returned value as a GROUP BY key. + +#### TimeSeries-Specific Aggregate Functions + +New `SQLFunction` implementations, registered alongside existing functions: + +```sql +-- first/last: value at earliest/latest timestamp in window +SELECT time_bucket('1h', ts) AS hour, + first(temperature) AS open_temp, -- first value in the hour + last(temperature) AS close_temp, -- last value in the hour + max(temperature) AS high_temp, + min(temperature) AS low_temp +FROM SensorReading +GROUP BY hour + +-- rate: per-second rate of change (for monotonic counters) +SELECT time_bucket('5m', ts) AS window, + sensor_id, + rate(request_count) AS requests_per_sec +FROM ServiceMetrics +WHERE ts > now() - INTERVAL '1h' +GROUP BY window, sensor_id + +-- delta: difference between last and first value in window +SELECT time_bucket('1h', ts) AS hour, + delta(energy_kwh) AS energy_consumed +FROM MeterReading +GROUP BY hour + +-- moving_avg: sliding window average +SELECT ts, temperature, + moving_avg(temperature, 10) AS smoothed -- 10-sample window +FROM SensorReading +WHERE sensor_id = 'sensor-A' +ORDER BY ts + +-- percentile: approximate percentile (t-digest) +SELECT time_bucket('1h', ts) AS hour, + percentile(latency_ms, 0.99) AS p99_latency, + percentile(latency_ms, 0.50) AS median_latency +FROM ServiceMetrics +GROUP BY hour + +-- interpolate: fill gaps with interpolated values +SELECT time_bucket('1m', ts) AS minute, + interpolate(temperature, 'linear') AS temp_interpolated +FROM SensorReading +WHERE ts BETWEEN '2026-02-20T10:00:00Z' AND '2026-02-20T11:00:00Z' +GROUP BY minute + +-- downsample: reduce resolution (convenience wrapper) +SELECT downsample(temperature, '1h', 'avg') AS hourly_avg_temp +FROM SensorReading +WHERE ts BETWEEN '2026-02-01' AND '2026-02-20' +``` + +**Complete list of new SQL functions** (Phase 1 = MVP, Phase 2 = later): + +| Function | Phase | Description | +|---|---|---| +| `time_bucket(interval, timestamp)` | 1 | Truncate timestamp to interval boundary | +| `first(value)` | 1 | First value by timestamp in group | +| `last(value)` | 1 | Last value by timestamp in group | +| `rate(value)` | 2 | Per-second rate of change | +| `delta(value)` | 2 | Difference between last and first in group | +| `moving_avg(value, window)` | 2 | Sliding window average | +| `percentile(value, p)` | 2 | Approximate percentile (t-digest) | +| `interpolate(value, method)` | 2 | Fill gaps: 'linear', 'prev', 'next', 'none' | +| `downsample(value, interval, agg)` | 2 | Convenience: resample at lower frequency | +| `correlate(series_a, series_b)` | 2 | Pearson correlation between two series | + +### 5.4 SQL Query — Graph + TimeSeries Integration + +These patterns combine ArcadeDB's existing graph traversal with timeseries range queries (see Part 4 for use cases): + +```sql +-- Pattern 1: Traverse graph, then aggregate timeseries for found vertices +SELECT sensor.name, avg(ts.temperature) AS avg_temp +FROM ( + TRAVERSE out('InstalledIn') FROM (SELECT FROM Building WHERE name = 'HQ') + WHILE $depth <= 3 +) AS sensor +WHERE sensor.@type = 'Sensor' + AND ts.ts BETWEEN '2026-02-19' AND '2026-02-20' +TIMESERIES sensor -> SensorReading AS ts -- link vertex to its timeseries type +GROUP BY sensor.name + +-- Pattern 2: Blast radius with timeseries context +SELECT service.name, $depth AS hops, + avg(ts.latency_ms) AS avg_latency +FROM ( + TRAVERSE out('DependsOn') FROM #12:0 MAXDEPTH 5 +) AS service +TIMESERIES service -> ServiceMetrics AS ts + LAST '1h' + GRANULARITY '1m' +GROUP BY service.name, $depth +ORDER BY avg_latency DESC +``` + +**`TIMESERIES ... AS` clause**: New SQL clause that links a graph vertex to its timeseries type. The query planner: +1. First resolves the graph traversal → set of vertex RIDs +2. For each RID, looks up the linked timeseries data in `TimeSeriesEngine` +3. Applies time range filter and aggregation +4. Joins results back with vertex properties + +This is parsed by extending the `SelectStatement` grammar in `SQLParser.g4`. + +### 5.5 OpenCypher — TimeSeries Extensions + +ArcadeDB has a **native OpenCypher engine** (`com.arcadedb.query.opencypher`) — a full implementation with its own ANTLR4 Cypher25 grammar, AST builder, execution planner, cost-based optimizer, and 50+ execution steps. It is NOT transpiled to Gremlin. + +TimeSeries support integrates through **two mechanisms**: + +#### 1. Namespaced Functions (registered in `CypherFunctionRegistry`) + +The existing `CypherFunctionRegistry` supports namespaced functions (e.g., `text.split`, `math.sigmoid`, `date.format`). TimeSeries functions follow the same `ts.*` namespace pattern: + +```cypher +// Query timeseries data for a specific vertex +MATCH (s:Sensor {name: 'sensor-A'}) +RETURN s.name, + ts.avg(s, 'SensorReading', 'temperature', '2026-02-19', '2026-02-20') AS avg_temp + +// Traverse graph + aggregate timeseries +MATCH (b:Building {name: 'HQ'})<-[:InstalledIn*1..3]-(s:Sensor) +WITH s +RETURN s.name, + ts.avg(s, 'SensorReading', 'temperature', '2026-02-19', '2026-02-20') AS avg_temp, + ts.max(s, 'SensorReading', 'temperature', '2026-02-19', '2026-02-20') AS max_temp +ORDER BY avg_temp DESC + +// Latest value per sensor +MATCH (s:Sensor)-[:InstalledIn]->(r:Room) +RETURN r.name, s.name, + ts.last(s, 'SensorReading', 'temperature') AS current_temp + +// Time-bucketed aggregation +MATCH (s:Sensor {name: 'sensor-A'}) +WITH s, ts.query(s, 'SensorReading', 'temperature', '2026-02-19', '2026-02-20', '1h') AS buckets +UNWIND buckets AS bucket +RETURN bucket.time, bucket.avg, bucket.min, bucket.max + +// Rate of change (counter metrics) +MATCH (svc:Service {name: 'api-gateway'}) +RETURN ts.rate(svc, 'ServiceMetrics', 'request_count', '2026-02-20T10:00:00Z', '2026-02-20T11:00:00Z') AS rps +``` + +**Function signatures** (registered in `CypherFunctionRegistry` under `ts` namespace): + +| Function | Arguments | Returns | Description | +|---|---|---|---| +| `ts.avg(vertex, type, field, from, to)` | Vertex, String, String, String, String | Double | Average value in time range | +| `ts.sum(vertex, type, field, from, to)` | Same | Double | Sum of values | +| `ts.min(vertex, type, field, from, to)` | Same | Double | Minimum value | +| `ts.max(vertex, type, field, from, to)` | Same | Double | Maximum value | +| `ts.count(vertex, type, field, from, to)` | Same | Long | Sample count | +| `ts.first(vertex, type, field)` | Vertex, String, String | Object | Earliest value | +| `ts.last(vertex, type, field)` | Vertex, String, String | Object | Latest value | +| `ts.rate(vertex, type, field, from, to)` | Same as avg | Double | Per-second rate of change | +| `ts.query(vertex, type, field, from, to, granularity)` | + String | List\ | Time-bucketed results | + +Each function internally resolves the vertex → linked `TimeSeriesType` → `TimeSeriesEngine.aggregate()`, returning scalar or structured results. + +#### 2. Procedures (registered in `CypherProcedureRegistry`) + +For more complex operations that return tabular results (multiple rows), use procedures via `CALL`: + +```cypher +// Range query returning raw samples +CALL ts.range('SensorReading', 'sensor-A', '2026-02-19', '2026-02-20', ['temperature', 'humidity']) +YIELD time, temperature, humidity +RETURN time, temperature, humidity +ORDER BY time + +// Time-bucketed aggregation as procedure (returns rows) +CALL ts.aggregate('SensorReading', { + from: '2026-02-19', + to: '2026-02-20', + field: 'temperature', + granularity: '1h', + filter: {sensor_id: 'sensor-A'} +}) +YIELD bucket_time, avg_value, min_value, max_value, count +RETURN bucket_time, avg_value, count + +// Combined: traverse graph, then fetch timeseries for each vertex +MATCH (b:Building {name: 'HQ'})<-[:InstalledIn*1..3]-(s:Sensor) +CALL ts.range('SensorReading', s.sensor_id, '2026-02-20T10:00:00Z', '2026-02-20T11:00:00Z', ['temperature']) +YIELD time, temperature +RETURN s.name, time, temperature +ORDER BY s.name, time +``` + +**Implementation**: +- Register `ts.*` functions in `CypherFunctionRegistry` (same as `text.*`, `math.*`, `date.*`) +- Register `ts.range`, `ts.aggregate` procedures in `CypherProcedureRegistry` (same as `algo.dijkstra`, `path.expand`) +- Functions are evaluated by `ExpressionEvaluator` via `CypherFunctionFactory`, which already supports namespaced function resolution +- Procedures are executed by `CallStep`, which already handles YIELD clauses +- No grammar changes needed — the Cypher25 grammar already supports namespaced functions and CALL procedures + +### 5.6 HTTP Ingestion Endpoint — InfluxDB Line Protocol Compatible + +#### Why InfluxDB Line Protocol? + +ILP is the **de-facto standard** for timeseries ingestion. It is natively supported by: +- InfluxDB (v1, v2, v3) — the originator +- QuestDB — recommended ingestion path +- VictoriaMetrics — multiple endpoints +- GreptimeDB, openGemini, M3DB, Amazon Timestream for InfluxDB + +Supporting ILP means **instant compatibility** with: +- **Telegraf** (300+ input plugins: system metrics, SNMP, MQTT, Kafka, etc.) +- **Grafana Agent** / Grafana Alloy +- **Vector** (Datadog's collection agent) +- Any IoT device or application that speaks ILP + +#### Line Protocol Format + +``` +[,=[,...]] =[,...] [] +``` + +Examples: +``` +SensorReading,sensor_id=sensor-A,location=building-1 temperature=22.5,humidity=65.0,pressure=1013.25 1708430400000000000 +SensorReading,sensor_id=sensor-B,location=building-2 temperature=19.1,humidity=70.0 1708430400000000000 +``` + +Rules: +- Measurement name = timeseries type name (auto-created if doesn't exist — configurable) +- Tags = comma-separated key=value after measurement name (no spaces around `=`) +- Fields = space-separated from tags, comma-separated key=value (floats default, `i` suffix for integers, quoted for strings) +- Timestamp = optional, nanosecond Unix epoch (precision configurable via query param) +- Multiple lines = multiple samples, newline-separated +- Batch = one HTTP POST with thousands of lines + +#### HTTP Endpoint + +``` +POST /api/v1/ts/{database}/write?precision= +Authorization: Bearer (or Basic auth) +Content-Type: text/plain; charset=utf-8 +Content-Encoding: gzip (optional, for compressed batches) + + +``` + +**Response codes:** +- `204 No Content` — success (all lines written) +- `400 Bad Request` — parse error (line protocol syntax invalid) +- `401 Unauthorized` — authentication failed +- `404 Not Found` — database not found +- `422 Unprocessable Entity` — valid syntax but semantic error (e.g., type mismatch) +- `500 Internal Server Error` + +**Compatibility endpoint** (for existing Telegraf configurations): +``` +POST /api/v2/write?org=default&bucket={database} +``` +Maps directly to the same handler. Telegraf users just point their `output.influxdb_v2` config at ArcadeDB. + +#### Implementation + +New `PostTimeSeriesWriteHandler extends AbstractServerHttpHandler`: + +```java +public class PostTimeSeriesWriteHandler extends AbstractServerHttpHandler { + + @Override + protected ExecutionResponse execute(final HttpServerExchange exchange, + final ServerSecurityUser user, final JSONObject payload) { + + final String databaseName = exchange.getQueryParameters().get("database"); + final String precision = exchange.getQueryParameters().getOrDefault("precision", "ns"); + final Database database = httpServer.getServer().getDatabase(databaseName); + + // 1. Read raw body (line protocol text, possibly gzip-compressed) + final String body = readBody(exchange); + + // 2. Parse line protocol → batch of (type, tags, fields, timestamp) + final List samples = LineProtocolParser.parse(body, precision); + + // 3. Group by type + shard, then append in parallel + database.transaction(() -> { + for (final LineProtocolSample sample : samples) { + final TimeSeriesEngine engine = database.getSchema() + .getTimeSeriesType(sample.measurement).getEngine(); + engine.appendSample(sample); + } + }); + + return new ExecutionResponse(204, ""); // No Content = success + } +} +``` + +**Registered in `HttpServer.setupRoutes()`:** +```java +routes.addPrefixPath("/api/v1", basicRoutes + // ... existing routes ... + .post("/ts/{database}/write", new PostTimeSeriesWriteHandler(this)) +); +// Compatibility alias +routes.addPrefixPath("/api/v2", basicRoutes + .post("/write", new PostTimeSeriesWriteHandler(this)) // InfluxDB v2 compat +); +``` + +#### Auto-Schema Creation (Configurable) + +When ILP sends data for a type that doesn't exist: +- **Default (strict mode)**: Return 404, require explicit `CREATE TIMESERIES TYPE` first +- **Auto-create mode** (opt-in via server config `arcadedb.tsAutoCreateType=true`): + - First line defines the schema: measurement → type, tags → TAG columns, fields → FIELD columns + - Field types inferred: no suffix = DOUBLE, `i` = LONG, quoted = STRING, true/false = BOOLEAN + - Subsequent lines with new fields → auto-alter to add columns (same as QuestDB behavior) + +#### Performance: Why a Dedicated Endpoint Beats SQL + +| Path | Operations per sample | Overhead | +|---|---|---| +| SQL INSERT | Parse SQL → plan → create Document → route → append | ~50-100μs/sample | +| HTTP Line Protocol | Parse text line → route → append (no SQL, no Document object) | ~1-5μs/sample | +| Java API (direct) | Route → append | ~0.5-1μs/sample | + +The dedicated endpoint **skips SQL parsing, query planning, and Document object creation**. It parses the lightweight line protocol text directly into primitive arrays and calls `TimeSeriesEngine.appendSamples()`. For 1M samples/sec ingestion, this difference is critical. + +### 5.7 Java API — Programmatic Access (Fastest Path) + +The Java API bypasses all protocol overhead. Use it for embedded applications or custom ingestion pipelines: + +```java +// Get the timeseries engine for a type +final TimeSeriesEngine engine = database.getSchema() + .getTimeSeriesType("SensorReading").getEngine(); + +// Batch append — fastest path (primitive arrays, no object creation) +final long[] timestamps = { 1708430400000000000L, 1708430401000000000L, ... }; +final String[] sensorIds = { "sensor-A", "sensor-A", ... }; +final String[] locations = { "building-1", "building-1", ... }; +final double[] temperatures = { 22.5, 22.6, ... }; +final double[] humidities = { 65.0, 64.8, ... }; + +database.transaction(() -> { + engine.appendSamples(timestamps, + new Object[] { sensorIds, locations, temperatures, humidities }); +}); + +// Async batch append — zero-contention, shard-per-core +database.async().timeseriesAppend("SensorReading", + timestamps, new Object[] { sensorIds, locations, temperatures, humidities }, + successCallback, errorCallback); + +// Query — range scan with column projection +try (TimeSeriesCursor cursor = engine.query( + fromTimestamp, toTimestamp, + new int[] { 0, 2 }, // columns: timestamp + temperature only + TagFilter.eq("sensor_id", "sensor-A"))) { + + while (cursor.hasNext()) { + final TimeSeriesRecord record = cursor.next(); + final long ts = record.getTimestamp(); + final double temp = record.getDouble(2); + } +} + +// Aggregation push-down — computed inside the engine, not row-by-row +final AggregationResult result = engine.aggregate( + fromTimestamp, toTimestamp, + 2, // column index: temperature + AggregationType.AVG, + Duration.ofHours(1).toNanos(), // 1-hour buckets + TagFilter.eq("sensor_id", "sensor-A")); + +for (final TimeBucket bucket : result.getBuckets()) { + System.out.println(bucket.getTimestamp() + " → " + bucket.getValue()); +} +``` + +### 5.8 HTTP Query Endpoint + +TimeSeries queries can use the existing ArcadeDB query endpoint (SQL goes through the standard parser): + +```bash +# Via existing /api/v1/query endpoint (SQL) +curl -X POST "http://localhost:2480/api/v1/query/mydb" \ + -H "Content-Type: application/json" \ + -d '{ + "language": "sql", + "command": "SELECT time_bucket('"'"'1h'"'"', ts) AS hour, avg(temperature) AS avg_temp FROM SensorReading WHERE ts BETWEEN '"'"'2026-02-19'"'"' AND '"'"'2026-02-20'"'"' GROUP BY hour" + }' +``` + +Optionally, a dedicated timeseries query endpoint with a simpler JSON request format: + +``` +POST /api/v1/ts/{database}/query +Content-Type: application/json + +{ + "type": "SensorReading", + "from": "2026-02-19T00:00:00Z", + "to": "2026-02-20T00:00:00Z", + "columns": ["temperature", "humidity"], + "filter": { "sensor_id": "sensor-A" }, + "aggregation": "avg", + "granularity": "1h" +} +``` + +Response: +```json +{ + "result": [ + { "time": "2026-02-19T00:00:00Z", "temperature": 22.3, "humidity": 64.5 }, + { "time": "2026-02-19T01:00:00Z", "temperature": 21.8, "humidity": 65.1 }, + ... + ] +} +``` + +This simplified endpoint is **Grafana-friendly** — it can power a Grafana JSON data source plugin with minimal configuration. + +### 5.9 Protocol Compatibility Matrix + +| Client / Tool | Protocol | ArcadeDB Endpoint | Notes | +|---|---|---|---| +| **Telegraf** | InfluxDB Line Protocol v2 | `POST /api/v2/write` | Point `output.influxdb_v2` at ArcadeDB | +| **Grafana Agent** | ILP or Prometheus remote write | `POST /api/v1/ts/{db}/write` | Via InfluxDB output | +| **curl / scripts** | ILP text | `POST /api/v1/ts/{db}/write` | Simplest integration | +| **PostgreSQL clients** | SQL (PG wire) | Port 5432 (postgresw module) | Full SQL, `time_bucket()` works | +| **Any SQL client** | SQL (HTTP) | `POST /api/v1/query/{db}` | Standard ArcadeDB SQL | +| **Java embedded** | Java API (direct) | `TimeSeriesEngine` class | Fastest: ~0.5-1μs/sample | +| **Grafana dashboards** | JSON query | `POST /api/v1/ts/{db}/query` | Simplified JSON request/response | +| **Cypher clients** | OpenCypher | `POST /api/v1/query/{db}` | `ts.*` functions for graph+TS | +| **IoT devices** | ILP over TCP (future) | Raw TCP socket | Like QuestDB's port 9009 | + +### 5.10 Summary: What's New vs. What's Reused + +| Component | New or Reused | Details | +|---|---|---| +| `CREATE TIMESERIES TYPE` parser | **New** | Extends `CreateTypeAbstractStatement`, adds TIMESTAMP/TAGS/FIELDS/SHARDS | +| `INSERT INTO` for timeseries | **Reused** | Existing `InsertStatement`, routes to `TimeSeriesEngine` instead of `LocalBucket` | +| `time_bucket()` function | **New** | `SQLFunctionTimeBucket extends SQLFunctionAbstract`, registered in `SQLFunctionFactoryTemplate` | +| `first()`, `last()` functions | **New** | `SQLFunctionFirst`, `SQLFunctionLast` — track min/max timestamp during aggregation | +| `GROUP BY` execution | **Reused** | Existing `AggregateProjectionCalculationStep` — `time_bucket()` returns a key, standard grouping | +| `TIMESERIES ... AS` clause | **New** | Extends `SelectStatement` grammar in `SQLParser.g4` for graph+TS joins | +| `ts.*` Cypher functions | **New** | Registered in native `CypherFunctionRegistry` (same as `text.*`, `math.*`), evaluated by `ExpressionEvaluator` | +| `ts.*` Cypher procedures | **New** | Registered in `CypherProcedureRegistry` (same as `algo.*`, `path.*`), executed by `CallStep` | +| HTTP ingestion endpoint | **New** | `PostTimeSeriesWriteHandler extends AbstractServerHttpHandler`, ILP parser | +| HTTP query endpoint | **New** | `PostTimeSeriesQueryHandler`, simplified JSON format | +| HTTP routing | **Reused** | Existing `HttpServer.setupRoutes()` — just add new routes | +| Authentication | **Reused** | Existing `AbstractServerHttpHandler` handles Basic/Bearer auth | + +--- + +## Part 6: Storage Architecture — Two-Layer Design + +### 6.1 Core Insight: Mutable Data Needs Pages, Immutable Data Does Not + +ArcadeDB's WAL logs changes at the **page level**: `(fileId, pageNumber, deltaFrom, deltaTo, content)`. Replication sends pages. Transactions track modified pages via MVCC. These guarantees are essential for **mutable** data — data being written by concurrent transactions. + +However, once timeseries data is **sealed** (compacted), it is never modified again. Sealed data has already been WAL-logged and replicated when it was mutable. Therefore: + +- **Mutable data** → MUST be paginated (`PaginatedComponent`) for WAL, MVCC, transactions, replication +- **Sealed data** → does NOT need pages. It is immutable, so no WAL, no MVCC, no transactions. Each server can compact independently. + +This leads to a **two-layer architecture** that separates the hot write path from the cold read-optimized storage: + +``` +MUTABLE LAYER (.tsbucket) SEALED LAYER (per-column files) +───────────────────────── ──────────────────────────────── +PaginatedComponent (64KB pages) Plain binary files +WAL-logged, MVCC, replicated NOT in WAL, NOT replicated +Row-oriented (append-friendly) Columnar (one file per column) +Holds last seconds/minutes of data Holds 99%+ of all historical data +Concurrent transactions write here Never modified after creation +Fixed 64KB page size Variable-size blocks, ZERO waste + Each server compacts independently +``` + +### 6.2 Shard-Per-Core Parallelism — Zero-Contention Ingestion + +#### The Problem with a Single Mutable File + +If all threads write to a single `TimeSeriesBucket`, MVCC conflicts serialize writes: thread 1 commits, thread 2 retries, thread 3 waits. On an 8-core machine, 7 cores are idle most of the time during ingestion bursts. + +#### ArcadeDB's Existing Solution: N Buckets Per Type + +ArcadeDB already solves this for regular document/graph types: +- A type has **N buckets** (default = number of cores), each a separate `LocalBucket` file +- `ThreadBucketSelectionStrategy`: `Thread.currentThread().threadId() % N` → deterministic, lock-free +- Each bucket has its **own LSM index partition** (via `TypeIndex` → `List`) +- The async API (`DatabaseAsyncExecutorImpl`) routes tasks to thread slots via `getSlot(bucket.getFileId())` +- WAL files are also per-thread: `activeWALFilePool[threadId % poolSize]` +- Result: **zero contention** — each core writes to its own bucket, its own index, its own WAL file + +#### TimeSeries Shard-Per-Core: Same Principle + +A `TimeSeriesType` with N shards creates N independent write/compact/read units: + +``` +SHARD-PER-CORE ARCHITECTURE (8-core example): + + Thread 0 ──→ Shard 0: mutable_0.tsbucket + sealed_0.ts.* (own files, own compaction) + Thread 1 ──→ Shard 1: mutable_1.tsbucket + sealed_1.ts.* (own files, own compaction) + Thread 2 ──→ Shard 2: mutable_2.tsbucket + sealed_2.ts.* (own files, own compaction) + ... + Thread 7 ──→ Shard 7: mutable_7.tsbucket + sealed_7.ts.* (own files, own compaction) + + No locks. No MVCC conflicts. No shared state during writes. + Each shard is a fully independent timeseries storage unit. +``` + +**What is a shard?** Each shard consists of: +- One `TimeSeriesBucket` (mutable, paginated, `PaginatedComponent`) +- One `TimeSeriesSealedStore` (sealed column files + index) +- Its own compaction thread/schedule +- Its own free page list, compaction watermark, checkpoint state + +**Shard assignment:** Uses `BucketSelectionStrategy`, same as regular types: +- **`ThreadBucketSelectionStrategy`** (default for TimeSeries): `threadId % N` → maximum write parallelism, zero contention. Best for high-throughput ingestion from many sources. +- **`PartitionedBucketSelectionStrategy`**: hash(tag_values) % N → all data for a specific series (e.g., `sensor_id='A'`) lands in the same shard. Best for single-series query performance (no cross-shard merge needed for point queries). + +**Async API integration:** The existing `DatabaseAsyncExecutorImpl` routes TimeSeries writes exactly like document writes: +```java +// Thread-affine routing (existing infrastructure) +TimeSeriesBucket shard = type.getShardByRecord(record, async); // threadId % N +int slot = asyncExecutor.getSlot(shard.getFileId()); +asyncExecutor.scheduleTask(slot, new AsyncTimeSeriesAppend(shard, samples, ...)); +``` + +**WAL parallelism:** Each async thread already writes to its own WAL file (`activeWALFilePool[threadId % poolSize]`). Since each shard's mutable pages are only modified by one thread, WAL writes are lock-free. + +#### Why This Achieves Datadog Monocle-Level Performance + +| Aspect | Datadog Monocle | ArcadeDB TimeSeries | +|---|---|---| +| Architecture | Shard-per-core LSM (Rust) | Shard-per-core two-layer (Java) | +| Write contention | Zero (one LSM per core) | Zero (one mutable file per core) | +| Thread model | Lock-free, core-pinned | Thread-affine via `BucketSelectionStrategy` | +| Compaction | Per-shard | Per-shard | +| WAL | Per-core | Per-thread (`WALFilePool`) | +| Tag routing | Tag-hash sharding | Configurable: thread or tag-hash | + +#### Read Path with Shards + +Queries transparently merge across all shards: + +``` +Query: SELECT avg(temperature) FROM SensorReading + WHERE timestamp BETWEEN T1 AND T2 + +For each shard (0..N-1) IN PARALLEL: + 1. Query shard's sealed store (binary search its index) + 2. Query shard's mutable bucket (scan active pages) + 3. Produce partial aggregation (sum, count) + +Final merge: + Combine partial aggregations from all shards → final result + (SUM = sum of sums, COUNT = sum of counts, AVG = total_sum / total_count) +``` + +**Key optimization**: Shard queries run **in parallel** (one per core). A range query on an 8-shard type uses all 8 cores for both sealed and mutable reads. This is the same parallel-scan pattern ArcadeDB already uses for `database.scanType()` across buckets. + +**Single-series queries with `PartitionedBucketSelectionStrategy`**: If the type uses tag-hash partitioning (e.g., partition by `sensor_id`), a query like `WHERE sensor_id = 'A'` can determine the exact shard: `hash('A') % N`. Only one shard is queried — zero cross-shard overhead. + +#### Shard Count Configuration + +```sql +-- Default: one shard per available core (maximum ingestion parallelism) +CREATE TIMESERIES TYPE SensorReading + TIMESTAMP ts PRECISION NANOSECOND + TAGS (sensor_id STRING, location STRING) + FIELDS (temperature DOUBLE, humidity DOUBLE, pressure DOUBLE) + +-- Explicit shard count +CREATE TIMESERIES TYPE SensorReading + SHARDS 16 + ... + +-- Tag-hash partitioning (data locality for single-series queries) +CREATE TIMESERIES TYPE SensorReading + PARTITION BY (sensor_id) + ... +``` + +Default shard count = `Runtime.getRuntime().availableProcessors()` (same convention as `ASYNC_WORKER_THREADS`). + +### 6.3 File Layout Per TimeSeries Type + +For a type `SensorReading` with 5 columns and 4 shards (4-core machine): + +``` +SHARD 0: + MUTABLE (paginated — WAL, MVCC, replication) + SensorReading_0.tsbucket + SEALED (immutable — per-column files, no page overhead) + SensorReading_0.ts.index ← block directory (in memory) + SensorReading_0.ts.col.0.timestamp ← delta-of-delta compressed + SensorReading_0.ts.col.1.sensor_id ← dictionary + RLE compressed + SensorReading_0.ts.col.2.temperature ← Gorilla XOR compressed + SensorReading_0.ts.col.3.humidity ← Gorilla XOR compressed + SensorReading_0.ts.col.4.pressure ← Gorilla XOR compressed + +SHARD 1: + SensorReading_1.tsbucket + SensorReading_1.ts.index + SensorReading_1.ts.col.0.timestamp + ... (same column files) + +SHARD 2: + SensorReading_2.tsbucket + SensorReading_2.ts.index + SensorReading_2.ts.col.0.timestamp + ... + +SHARD 3: + SensorReading_3.tsbucket + SensorReading_3.ts.index + SensorReading_3.ts.col.0.timestamp + ... +``` + +Each shard is **completely independent**: its own mutable file, its own sealed files, its own compaction watermark, its own free page list. No shared state between shards during writes or compaction. + +### 6.4 Mutable File (.tsbucket) — The Transactional Write Buffer + +`TimeSeriesBucket extends PaginatedComponent` — uses ArcadeDB's standard page infrastructure for full ACID compliance. + +#### Page Types in the Mutable File + +**Header Page (Page 0):** + +``` +[Standard page header: version(4B) + contentSize(4B)] + magic_number (4B) "TSBC" + format_version (2B) + column_count (2B) total columns (1 timestamp + N tags + M fields) + column_definitions[] (variable) - for each column: + name_length (2B) + name (UTF-8 bytes) + data_type (1B) LONG/DOUBLE/STRING/INTEGER/etc. (maps to Type enum) + column_role (1B) TIMESTAMP=0, TAG=1, FIELD=2 + compression_hint (1B) DELTA_OF_DELTA=0, GORILLA_XOR=1, DICTIONARY=2, SIMPLE8B=3, NONE=4 + total_sample_count (8B) total samples in mutable file (not yet compacted) + min_timestamp (8B) global min across active pages + max_timestamp (8B) global max across active pages + active_data_page_count (4B) number of data pages with uncompacted data + compaction_watermark (8B) max timestamp of data confirmed in sealed files + (used for crash recovery — see section 6.8) + + --- Free Page List (for page reuse after compaction) --- + free_page_count (4B) number of reusable pages + free_page_list[] (4B each) page numbers available for reuse + + --- Pre-Compaction Checkpoint (crash safety for sealed files) --- + compaction_in_progress (1B) 0 = idle, 1 = compaction active + sealed_col_offsets[] (8B each, one per column) byte offset of each sealed column + file BEFORE compaction started + sealed_index_size (8B) byte size of .ts.index file BEFORE compaction started +``` + +These checkpoint fields enable crash recovery of sealed files (see section 6.8). + +**Directory Pages (Page 1..D) — Mutable Data Page Index:** + +The directory is a paginated list of active data pages inside the mutable file. It is **not sorted** — entries are appended when new data pages are created and removed when pages are compacted. Reads require a **linear scan**, which is efficient because the directory is tiny (typically ~100-200 entries covering the last seconds/minutes of data). + +The directory is paginated (WAL-protected) because it is modified by transactions: compaction cleanup removes entries, new page creation adds entries. Both operations go through `TransactionContext`. + +``` +[Standard page header] + entry_count (4B) + next_directory_page (4B) pointer to next directory page (0 = last) +[Entries] - unsorted, appended on page creation, removed on compaction: + data_page_number (4B) + min_timestamp (8B) + max_timestamp (8B) + sample_count (4B) + series_count (2B) + is_sorted (1B) 0 = timestamps in arrival order, 1 = sorted by timestamp +``` + +The `is_sorted` flag is set to 0 on creation and flipped to 1 if compaction discovers the page is already in order (optimization: skip sort step for in-order data). + +**Active Data Pages (Page D+1..N) — Row-Oriented, MVCC-Safe:** + +The active data pages use a **row-oriented layout** so that concurrent transactions can append samples via MVCC. This is the key difference from the sealed files. + +``` +[Standard page header: version(4B) + contentSize(4B)] + sample_count (4B) + min_timestamp (8B) + max_timestamp (8B) + row_size (2B) fixed bytes per sample row (computed from schema) +[Sample rows — appended sequentially, fixed-size:] + row 0: [timestamp(8B)][tag0_dictIndex(2B)][field0(8B)][field1(8B)]... + row 1: [timestamp(8B)][tag0_dictIndex(2B)][field0(8B)][field1(8B)]... + ... +[Tag Dictionary — at tail of page, grows backwards:] + dict_count (2B) + entry 0: [length(2B)][string bytes] + entry 1: [length(2B)][string bytes] +``` + +Fixed-size sample rows make appending trivial: write at `headerSize + sampleCount * rowSize`, increment count. The tag dictionary at the page tail maps string tags to small integer indices used in the sample rows. + +#### Concurrent Transaction Handling (MVCC) + +Multiple transactions can write to the same active page using standard ArcadeDB MVCC — the same mechanism `LocalBucket` uses: + +``` +tx1: begin + → reads active page (version V, sample_count=200) + → appends 100 samples at rows 200..299, sample_count becomes 300 + → commits → page version becomes V+1 + → WAL logs only the delta (the new bytes appended) + +tx2: begin (concurrent with tx1) + → reads active page (version V, sample_count=200) + → appends 50 samples at rows 200..249, sample_count becomes 250 + → tries to commit → MVCC conflict! page is now V+1 + → automatic retry: reads page V+1 (sample_count=300, includes tx1's data) + → appends 50 samples at rows 300..349, sample_count becomes 350 + → commits → page version V+2 + → WAL logs only the new delta (rows 300..349) +``` + +This works because: +- `TransactionContext` checks page versions at commit time (existing MVCC logic) +- `ConcurrentModificationException` triggers automatic retry (existing behavior) +- WAL logs only the changed byte range, not the full page (efficient) +- Replication propagates the page delta — identical to existing bucket replication + +When the active page fills up (~2,400 samples at 26 bytes/row for a 3-column schema), a new empty active page is created and the full page awaits compaction. + +#### Page Reuse — Free Page List + +When compaction moves data from mutable pages to sealed files, those pages become empty. Rather than growing the mutable file indefinitely, compacted pages are returned to a **free page list** stored in the header page: + +``` +Lifecycle of a mutable page: + 1. ALLOCATE: Need a new data page + → If free_page_list is non-empty: pop the last entry, reuse that page number + → If free_page_list is empty: extend the file (append new page at the end) + 2. FILL: Transaction appends samples to the page via MVCC + 3. COMPACT: Background compaction reads all samples, writes to sealed files + 4. FREE: Compaction cleanup (in transaction): + → Remove directory entry for the page + → Push page number onto free_page_list in header + → Increment free_page_count + → Page is now available for step 1 +``` + +**Steady-state behavior**: After initial ramp-up, the mutable file reaches a stable size. If ingestion rate is R samples/sec and compaction runs every T seconds, the mutable file holds ~R×T samples worth of pages. Compaction frees pages at the same rate new ones are allocated, so the free list stays near-empty and the file doesn't grow. + +**Backpressure**: If compaction falls behind (ingestion spike), the file grows temporarily. Once compaction catches up, the excess pages join the free list. A configuration setting `max_mutable_pages` can optionally trigger throttling of writes if the mutable file exceeds a threshold, giving compaction time to drain. + +#### Out-of-Order Data Handling + +TimeSeries data frequently arrives out of order: sensors may have network delays, batch uploads may contain historical data, or distributed collectors may deliver data at different rates. The mutable file handles this at three levels: + +**Level 1 — Within a single page (free, always works):** +Active data pages are row-oriented with no ordering requirement. Samples are appended in arrival order regardless of their timestamp value. When the page is later compacted, samples are sorted by timestamp at that point. Cost: zero at write time, negligible sort cost at compaction time (page fits in L1 cache). + +**Level 2 — Across pages, before compaction (free, always works):** +Different pages in the mutable file may have overlapping timestamp ranges. For example: +- Page 5: timestamps [10:00:01 .. 10:00:05] — some early, some late arrivals +- Page 6: timestamps [10:00:03 .. 10:00:08] — overlapping range + +Compaction reads ALL pages being compacted, collects all samples, sorts globally by timestamp, then writes sorted blocks to sealed files. The directory's `min_timestamp`/`max_timestamp` per page are used to select which pages to include in a compaction run. + +**Level 3 — After compaction (late-arriving data older than compaction_watermark):** +This is the hard case: data arrives with a timestamp that falls within a range already compacted into sealed files. + +**Strategy A (MVP — Overlapping Sealed Blocks):** +- Accept the late data into the mutable file normally (no rejection) +- When compacting, write the new sealed blocks even though they overlap existing sealed blocks +- The sealed index file records overlapping blocks: the `is_overlapping` flag is set +- At query time, if overlapping blocks exist in the requested range, merge-sort across all overlapping blocks (same as merging mutable + sealed) +- Periodic **major compaction** rewrites overlapping sealed blocks into a single sorted sequence (runs less frequently, e.g., daily) + +``` +Minor compaction (frequent, fast): + Mutable pages → NEW sealed blocks (may overlap existing sealed blocks) + +Major compaction (infrequent, more I/O): + Overlapping sealed blocks → single sorted sequence (no more overlaps) + Only touches the affected time range, not the entire sealed file +``` + +**Strategy B (Future — Configurable out-of-order tolerance window):** +- Configure a time window (e.g., 5 minutes) during which out-of-order data is expected +- Compaction only seals data older than `now - tolerance_window` +- Data within the tolerance window stays in the mutable file, even if the page is "full" +- This eliminates overlapping sealed blocks entirely for well-behaved data sources + +### 6.5 Sealed Files — Per-Column Immutable Storage + +The sealed layer is **not paginated**. It consists of plain binary files read via `java.nio.channels.FileChannel` positioned reads. This means: + +- **Variable-size blocks**: No 64KB page boundary. A block of 50,000 compressed samples using 12,847 bytes occupies exactly 12,847 bytes. Zero waste. +- **No WAL overhead**: Sealed files are derived data — the mutable file was the WAL-protected source of truth. +- **No MVCC**: Sealed files are never modified by transactions. Compaction appends new blocks; retention rewrites the file. +- **No replication**: Each server compacts independently. The WAL-replicated mutable file ensures all servers have the same logical data. +- **Per-column I/O**: `SELECT avg(temperature)` reads only `.col.0.timestamp` and `.col.2.temperature`. Files for humidity, pressure, sensor_id are never opened. + +#### Shared Index File (.ts.index) — NOT Paginated, Loaded In Memory + +All column files share the same block boundaries — block N in every column file covers the same set of samples. A single shared index file provides the block directory. + +**Key design decision**: The sealed index is a **plain file, NOT paginated**. It does not use `PaginatedComponent`, WAL, or MVCC. It is: +- **Loaded entirely into memory** at database open (trivially small — see size analysis below) +- **Sorted by `min_timestamp`** for binary search during range queries +- **Rewritten entirely** on each compaction (append new blocks, regenerate file) +- **Never modified by transactions** — only by the compaction background thread + +This is safe because the sealed index is derived data: it can always be rebuilt from the sealed column files themselves. Crash safety is handled by the pre-compaction checkpoint in the mutable file header (see section 6.8). + +``` +FILE HEADER + magic (4B) "TSIX" + format_version (2B) + column_count (2B) + block_count (4B) + total_sample_count (8B) + min_timestamp (8B) + max_timestamp (8B) + +BLOCK DIRECTORY — one entry per block, sorted by min_timestamp: + min_timestamp (8B) + max_timestamp (8B) + sample_count (4B) + is_overlapping (1B) 0 = no overlap with other blocks, 1 = overlapping range + (set when late-arriving data creates blocks that overlap + existing sealed blocks — see Out-of-Order Handling) + column_offsets[] (8B each) byte offset in each column file where this block starts + column_sizes[] (4B each) compressed size in each column file for this block + +FOOTER + directory_offset (8B) byte position where the directory starts in this file + magic (4B) "TSIX" (repeated for validation) +``` + +**Size**: For 5 columns, each directory entry is 21 + (5 x 8) + (5 x 4) = 81 bytes. +A dataset of 1 billion samples with 50,000 samples/block = 20,000 blocks → directory = **~1.6 MB**. Trivially fits in memory and is cached on first read. + +**Why the directory is at the end** (like a Parquet footer): The file is append-only. New blocks are appended, then the directory is rewritten at the new end. A reader opens the file, reads the footer to find the directory offset, then reads the directory. This avoids reserving space at the beginning. + +**Contrast with the mutable directory**: The mutable file's directory pages (section 6.4) ARE paginated because they are modified by transactions (compaction cleanup, new page creation). The sealed index is not — it is a standalone file managed exclusively by the compaction thread. + +#### Per-Column Files (.ts.col.N.*) + +Each column file is pure compressed data with a minimal header: + +``` +FILE HEADER + magic (4B) "TSCL" + column_index (2B) which column this file stores + compression_type (1B) default codec for this column + block_count (4B) + +BLOCK 0 (variable size — tightly packed, zero padding) + base_value (8B) first raw value (for delta/XOR encoding) + compressed_data (N bytes) + +BLOCK 1 (starts IMMEDIATELY after block 0) + base_value (8B) + compressed_data (M bytes) + +... blocks continue with zero gaps ... +``` + +No per-block headers are needed inside the column file — the shared index file already knows each block's offset and size. The column file is essentially a concatenation of compressed byte arrays. + +#### Compression Strategy Per Column Type + +| Column Type | Codec | Typical Ratio | Notes | +|---|---|---|---| +| DATETIME/LONG (timestamp) | Delta-of-delta | 96% → 1 bit/sample | Regular intervals compress best | +| DOUBLE (field values) | Gorilla XOR | avg 1.37 bytes/sample | Slowly changing values compress best | +| INTEGER/LONG (counters) | Simple-8b RLE | 4-8x | Monotonic counters compress extremely well | +| STRING TAG (low cardinality) | Dictionary + Simple-8b RLE | 10-100x | Dictionary is per-block | +| STRING TAG (high cardinality) | Dictionary (block-local) | 2-5x | Each block builds its own dictionary | + +#### I/O Strategy: FileChannel Positioned Reads + +Sealed files are read via standard `java.nio.channels.FileChannel`: + +```java +FileChannel channel = FileChannel.open(columnFilePath, StandardOpenOption.READ); +ByteBuffer buf = ByteBuffer.allocateDirect(blockSize); // direct buffer, no extra copy +channel.read(buf, blockOffset); // positioned read at exact offset +``` + +Why `FileChannel` over `mmap`: +- **No TLB pressure**: mmap competes with JVM heap for translation lookaside buffer entries. Many large sealed files could degrade JVM performance. +- **No SIGBUS risk**: mmap throws SIGBUS (crashes JVM) on I/O errors. FileChannel throws a catchable `IOException`. +- **Controlled memory**: FileChannel reads into explicitly sized buffers. mmap lets the OS decide what stays in memory. +- **Sequential scan friendly**: Range queries read blocks sequentially. FileChannel with OS readahead is as fast as mmap for this pattern. +- **Java 21+ optimization**: `FileChannel.read(ByteBuffer.allocateDirect(...), position)` with direct buffers avoids the user-space copy. + +The OS page cache still caches sealed file contents automatically — hot column files stay in memory without explicit management. + +### 6.6 Write Path + +#### Ingestion (Transactional, Shard-Per-Core) + +``` +1. Application calls appendSamples(timestamps[], tags[], values[]...) + ↓ +2. Shard selection (lock-free): + → ThreadBucketSelectionStrategy: shardIdx = threadId % N (default) + → PartitionedBucketSelectionStrategy: shardIdx = hash(tag_values) % N + → Async API: task routed to slot = getSlot(shard.mutableBucket.getFileId()) + ↓ +3. TransactionContext writes sample rows into the shard's active page + → Standard MVCC: if concurrent tx committed first, retry on new page version + → With ThreadBucketSelectionStrategy: ZERO conflicts (each thread owns its shard) + → WAL logs only the appended byte range (delta) + → WAL write is lock-free: activeWALFilePool[threadId % poolSize] + → Page fills up → new active page created, old page awaits compaction + ↓ +4. Transaction commits → WAL + replication propagate the page changes + ↓ +5. Shard's mutable file now holds recent uncompacted data (seconds to minutes) + Other shards are completely unaffected (no shared state). +``` + +**Throughput scaling**: With N shards and `ThreadBucketSelectionStrategy`, ingestion throughput scales linearly with cores. On an 8-core machine, 8 threads write to 8 independent shards with zero MVCC conflicts, zero WAL contention, and zero lock overhead. This matches Datadog Monocle's shard-per-core architecture. + +#### Compaction (Background, Per-Shard, Crash-Safe) + +Compaction moves data from a shard's mutable file to its sealed files. Each shard compacts independently — N shards means N concurrent compaction threads with zero contention. The algorithm is designed so that a **JVM crash at any point** leaves the system in a consistent state. + +``` +COMPACTION ALGORITHM (crash-safe): + +PHASE 1 — PRE-COMPACTION CHECKPOINT (in transaction, WAL-protected): + a. Record current state of sealed files in the mutable header page: + → sealed_col_offsets[i] = current byte size of each column file + → sealed_index_size = current byte size of .ts.index + → compaction_in_progress = 1 + b. Commit this transaction + → WAL logs the header page change → replicated + → This is the "rollback point" for crash recovery + + *** If JVM crashes here: checkpoint is committed, but no sealed writes yet. + Recovery sees compaction_in_progress=1, truncates sealed files to + checkpointed offsets (which are the current sizes — no-op). Safe. *** + +PHASE 2 — READ & TRANSFORM (no locks, no transactions): + a. Read all full data pages from mutable file directory + (only pages marked as full / not the current active page) + b. Collect all samples from those pages into memory + c. Sort by timestamp (global sort across all pages) + d. Split into columns + e. Chunk into SEALED_BLOCK_SIZE rows (default 65,536) — avoids one giant block per shard + f. Compress each column chunk independently using the configured codec + +PHASE 3 — WRITE SEALED FILES (append-only, no WAL): + a. For each chunk: write inline block metadata (magic 0x5453424C + minTs + maxTs + + sampleCount + per-column compressed sizes), then append compressed column data + → Block metadata enables directory reconstruction on cold open (loadDirectory()) + b. fsync ALL sealed files + → After fsync, sealed data is durable on disk + + *** If JVM crashes here (mid-write): sealed files have partial data + beyond the checkpointed offsets. Recovery truncates back to + checkpointed offsets. Mutable pages still intact. Will re-compact. *** + +PHASE 4 — COMMIT CLEANUP (in transaction, WAL-protected): + a. In a NEW TRANSACTION on the mutable file: + → Remove compacted pages from the directory + → Push freed page numbers onto the free_page_list in header + → Update free_page_count + → Update compaction_watermark = max timestamp of compacted data + → Update min_timestamp, max_timestamp, total_sample_count + → Set compaction_in_progress = 0 + → Clear sealed_col_offsets[] and sealed_index_size + b. Commit this transaction + → WAL logs the cleanup → replicated + + *** If JVM crashes here (before commit): cleanup tx didn't commit. + Recovery sees compaction_in_progress=1, truncates sealed files to + checkpointed offsets. But the sealed data IS valid (it was fsync'd). + However, the mutable pages weren't freed, so they'll be re-compacted. + Result: duplicate data in sealed files after recovery? NO — because + we truncated back to checkpoint offsets. The re-compaction produces + the same sealed blocks. Safe and idempotent. *** + +PHASE 5 — DONE + Mutable file: only holds recent, uncompacted data (seconds to minutes) + Sealed files: hold all historical data (days to years) + Free pages: available for new ingestion +``` + +**Key invariant**: The `compaction_watermark` in the mutable header is ONLY advanced (step 4a) AFTER sealed files are fsync'd (step 3c). This guarantees that any data below the watermark is durably stored in sealed files. Data above the watermark is in the mutable file (WAL-protected). No data is ever lost. + +### 6.7 Read Path (Range Query) + +Queries use a **pull-based streaming iterator pipeline** that never materializes all rows in memory. The SQL execution engine calls `syncPull(context, nRecords)` which returns at most `nRecords` rows per call — aggregation steps pull batches in a loop until exhausted. + +#### Iterator Chain + +``` +FetchFromTimeSeriesStep.syncPull(ctx, N) + └→ TimeSeriesEngine.iterateQuery(fromTs, toTs, columnIndices, tagFilter) + └→ PriorityQueue — merge-sort across shards by timestamp + ├→ TimeSeriesShard[0].iterateRange(fromTs, toTs, columnIndices, tagFilter) + │ ├→ TimeSeriesSealedStore.iterateRange() — sealed blocks first + │ └→ TimeSeriesBucket.iterateRange() — mutable pages second + ├→ TimeSeriesShard[1].iterateRange(...) + └→ ... +``` + +Each `next()` call on the engine iterator advances only the shard with the smallest current timestamp (min-heap). Memory usage is O(shardCount × blockSize) — constant regardless of total dataset size. + +#### Full Query Flow + +``` +Query: SELECT avg(temperature) FROM SensorReading + WHERE timestamp BETWEEN '2026-02-19' AND '2026-02-20' + AND sensor_id = 'A' + +FOR EACH SHARD (0..N-1): + + STEP 1: SEALED FILES (99%+ of shard's data, columnar, fast) + a. Binary search block directory → blocks overlapping time range O(log B) + b. Per block: decompress timestamps → binary search for exact range O(log N) + c. Lazy column decompression: only decode value columns if rows match + d. Early termination: stop when minTimestamp > toTs + e. Files NOT touched: .col.3.humidity, .col.4.pressure (zero I/O) + + STEP 2: MUTABLE FILE (last few seconds/minutes, small) + a. Short-circuit if empty (getSampleCount() == 0) + b. Scan pages lazily → filter by time range → yield matching rows + + STEP 3: CHAIN sealed iterator → mutable iterator (sealed first, mutable second) + Apply tag filter inline during iteration + +MERGE across shards: + PriorityQueue — min-heap by timestamp + Each next() advances only the shard with smallest current timestamp + For aggregations: AggregateProjectionCalculationStep pulls all rows via syncPull() +``` + +**Optimization — PartitionedBucketSelectionStrategy**: If the type partitions by `sensor_id` and the query filters on `sensor_id = 'A'`, the engine computes `hash('A') % N` to identify the single shard containing all data for sensor A. Only that one shard is queried — zero cross-shard overhead. + +**Performance characteristics:** +- Streaming: O(shardCount × blockSize) memory — never materializes all rows +- Block selection: O(log B) binary search per shard (B = blocks in shard) +- Within-block search: O(log N) binary search on sorted timestamps +- Column I/O: reads ONLY the column files needed by the query +- Lazy decompression: value columns decoded only when timestamps match +- Tag filtering: dictionary-decoded bitmask, applied inline during iteration +- Early termination: stops scanning blocks once `minTimestamp > toTs` +- Empty bucket short-circuit: zero cost for mutable layer after compaction +- Cold queries: sealed block directory persisted inline (survives close/reopen) +- Profiling: `PROFILE SELECT ...` shows per-step cost and row counts via `FetchFromTimeSeriesStep` +- Cross-shard merge: min-heap merge-sort for raw scans, trivial for aggregations + +### 6.8 Crash Recovery + +Sealed files have no WAL. A JVM crash during compaction could leave them in an inconsistent state (partially written blocks). The **pre-compaction checkpoint** protocol in section 6.6 ensures crash safety. Here is the full recovery algorithm: + +#### Recovery Algorithm (runs at database open) + +``` +On startup: + +STEP 1 — Recover mutable file from WAL (standard ArcadeDB recovery) + → All WAL-protected fields are now reliable: + - compaction_watermark + - compaction_in_progress flag + - sealed_col_offsets[] (checkpoint of sealed file sizes before compaction) + - sealed_index_size (checkpoint of index file size before compaction) + - free_page_list + - directory entries + +STEP 2 — Check if compaction was interrupted + IF compaction_in_progress == 1: + → A compaction was running when the JVM crashed. + → Sealed files may have partial/corrupt data beyond the checkpoint. + + a. For each sealed column file i: + → Truncate to sealed_col_offsets[i] bytes + → This removes any partially written blocks from the failed compaction + + b. Truncate .ts.index to sealed_index_size bytes + → This removes any partially written index entries + + c. In a NEW TRANSACTION on the mutable file: + → Set compaction_in_progress = 0 + → Clear sealed_col_offsets[] and sealed_index_size + → Commit (WAL-logged) + + d. Log: "TimeSeries recovery: truncated sealed files to pre-compaction state. + Mutable pages preserved, will be re-compacted." + + IF compaction_in_progress == 0: + → No compaction was running, OR the compaction completed cleanly. + → Sealed files are consistent. No truncation needed. + +STEP 3 — Validate compaction_watermark consistency + a. Read .ts.index → find the max timestamp across all sealed blocks + b. Verify: sealed_max_timestamp <= compaction_watermark + (If not, something is wrong — log error and truncate sealed files + to match the watermark, then re-compact) + +STEP 4 — Load sealed index into memory + a. Read .ts.index into memory (sorted block directory) + b. Ready for queries + +STEP 5 — Resume normal operation + → Mutable pages with data > compaction_watermark are valid, will be compacted + → Mutable pages with data <= compaction_watermark may exist if cleanup + didn't commit — safe to free (compaction will handle this) + → Background compaction resumes on schedule +``` + +#### Crash Scenarios Matrix + +| Crash Point | Mutable State | Sealed State | Recovery Action | +|---|---|---|---| +| Before Phase 1 commit | Unchanged | Unchanged | Nothing to do | +| After Phase 1, before Phase 3 | Has checkpoint | No new data written | Truncate to checkpoint (no-op) | +| During Phase 3 (mid-write) | Has checkpoint | Partially written | Truncate to checkpoint, discard partial blocks | +| After Phase 3 fsync, before Phase 4 | Has checkpoint | Fully written + fsync'd | Truncate to checkpoint. Data re-compacted (safe, idempotent) | +| After Phase 4 commit | Clean (pages freed) | Fully written | compaction_in_progress=0, nothing to do | + +**Key invariant**: The mutable file is the **source of truth**. Sealed files are derived data and can always be rebuilt from mutable pages that haven't been cleaned up. The `compaction_watermark` is only advanced AFTER sealed files are fsync'd AND the cleanup transaction commits. This guarantees zero data loss in all crash scenarios. + +### 6.9 Replication + +The two-layer, sharded design has elegant replication properties: + +``` +Leader: tx writes → shard K's mutable file → WAL → replicates to followers + compaction (local, per-shard) → shard K's sealed files + +Follower: receives WAL → applies to shard K's mutable file (identical mutable state) + compaction (local, per-shard) → shard K's sealed files +``` + +- **WAL replication covers only the mutable files** (N small files, only recent data per shard) +- **Sealed files are NOT replicated** — each server compacts each shard independently +- Sealed files on leader and followers are **logically equivalent** (same data) but may differ in block boundaries. This is perfectly fine — same model as Cassandra's per-node compaction. +- **Zero replication overhead** for historical data (the vast majority of storage) +- **Leader failover**: the new leader's sealed files are already up to date (derived from the same WAL-replicated mutable data) +- **Shard count is the same** on leader and followers (it's part of the type schema) + +### 6.10 Retention + +**Strategy 1: Sealed file truncation (default)** +For each shard independently: +1. Read shard's `.ts.index` → find blocks where `max_timestamp < now - retention_period` +2. Rewrite shard's column files without those old blocks +3. Rewrite shard's `.ts.index` without old entries +4. Update shard's mutable file header's retention watermark (in transaction) + +**Strategy 2: Time-partitioned sealed files (for instant retention)** +```sql +CREATE TIMESERIES TYPE SensorReading + PARTITION BY INTERVAL 1 MONTH + RETENTION 12 MONTHS +``` +Creates a separate set of sealed files per time window: +``` +SensorReading_202602.ts.index +SensorReading_202602.ts.col.0.timestamp +SensorReading_202602.ts.col.1.sensor_id +... +``` +Retention = delete the entire set of files for expired months. Instant, zero I/O. + +### 6.11 Major Compaction (Sealed File Defragmentation) + +Minor compaction (described in 6.6) runs frequently and may produce overlapping sealed blocks when out-of-order data arrives after previous compaction. **Major compaction** consolidates overlapping blocks: + +``` +MAJOR COMPACTION (infrequent, e.g., daily or on-demand): + +1. Scan .ts.index → identify time ranges with overlapping blocks + (blocks where is_overlapping=1, or multiple blocks covering the same range) + +2. For each overlapping region: + a. Read all overlapping blocks from column files + b. Decompress → merge-sort by timestamp → deduplicate + c. Re-compress into new non-overlapping blocks + d. Write replacement blocks to NEW temporary column files + e. fsync temporary files + +3. Rewrite sealed column files: + a. Copy non-affected blocks from old files + b. Insert replacement blocks in the correct position + c. fsync new files + +4. Atomically swap: rename new files over old files + (POSIX rename is atomic on the same filesystem) + +5. Rewrite .ts.index with all blocks now non-overlapping +``` + +Major compaction only touches the affected time ranges, not the entire dataset. For well-behaved data sources (no out-of-order after compaction), major compaction is rarely needed. + +### 6.12 Series Filtering Optimization + +**Default: In-block dictionary filtering** +- Each sealed block's tag column uses dictionary encoding +- To check if `sensor_id = 'A'` exists: scan the block dictionary (<100 entries typically) +- Build a bitmask from dictionary indices to select matching samples +- Fast enough for analytical queries (scan-oriented) + +**Optional: LSM-Tree tag index** +- For high-cardinality point lookups, create an LSM-Tree index on `(tag_values, timestamp)` +- Maps `(sensor_id='A', timestamp=X)` → block number in sealed files +- Uses existing `LSMTreeIndex` infrastructure — no new index type needed +- Useful for: "get the latest value for sensor A" (point lookup, not range scan) + +### 6.13 SIMD-Accelerated Aggregation (Project Panama) + +TimeSeries aggregation (SUM, AVG, MIN, MAX, COUNT) over large decompressed arrays is the hottest path in range queries. SIMD (Single Instruction, Multiple Data) can process 4-8 doubles per CPU cycle instead of one. + +ArcadeDB already uses SIMD for vector similarity via JVector's `VectorizationProvider`. The TimeSeries module follows the **same pattern**: an interface with two implementations (pure Java + SIMD), auto-detected at runtime. + +#### Interface Design + +```java +package com.arcadedb.engine.timeseries.simd; + +/** + * Vectorized operations for timeseries aggregation. + * Two implementations: ScalarOps (pure Java) and SimdOps (Project Panama Vector API). + * The provider auto-detects SIMD availability and returns the best implementation. + */ +public interface TimeSeriesVectorOps { + + // === Aggregation over double arrays (field values) === + double sum(double[] values, int offset, int length); + double min(double[] values, int offset, int length); + double max(double[] values, int offset, int length); + // AVG = sum / count (no separate method needed) + + // === Aggregation over long arrays (timestamps, counters) === + long sumLong(long[] values, int offset, int length); + long minLong(long[] values, int offset, int length); + long maxLong(long[] values, int offset, int length); + + // === Filtered aggregation (apply bitmask from tag filtering) === + double sumFiltered(double[] values, long[] bitmask, int offset, int length); + int countFiltered(long[] bitmask, int offset, int length); // popcount + + // === Comparison / filtering (produce bitmask) === + void greaterThan(double[] values, double threshold, long[] bitmaskOut, int offset, int length); + void lessThan(double[] values, double threshold, long[] bitmaskOut, int offset, int length); + void between(double[] values, double low, double high, long[] bitmaskOut, int offset, int length); + + // === Bitmask logic (combine tag filters) === + void bitmaskAnd(long[] a, long[] b, long[] out, int length); + void bitmaskOr(long[] a, long[] b, long[] out, int length); +} +``` + +#### Pure Java Implementation (Always Available) + +```java +package com.arcadedb.engine.timeseries.simd; + +/** + * Scalar (pure Java) implementation. Works on any JDK 21+. + * No dependencies on incubator modules. + */ +public class ScalarTimeSeriesVectorOps implements TimeSeriesVectorOps { + + @Override + public double sum(final double[] values, final int offset, final int length) { + double result = 0.0; + for (int i = offset; i < offset + length; i++) + result += values[i]; + return result; + } + + @Override + public double min(final double[] values, final int offset, final int length) { + double result = Double.MAX_VALUE; + for (int i = offset; i < offset + length; i++) + if (values[i] < result) + result = values[i]; + return result; + } + + // ... analogous for max, sumLong, minLong, maxLong, filtered variants, bitmask ops +} +``` + +#### SIMD Implementation (Auto-Detected via Project Panama) + +```java +package com.arcadedb.engine.timeseries.simd; + +import jdk.incubator.vector.*; + +/** + * SIMD-accelerated implementation using Java Vector API (Project Panama). + * Processes 4 doubles (AVX2/256-bit) or 8 doubles (AVX-512) per cycle. + * Only instantiated if jdk.incubator.vector module is available. + */ +public class SimdTimeSeriesVectorOps implements TimeSeriesVectorOps { + + private static final VectorSpecies SPECIES = DoubleVector.SPECIES_PREFERRED; + // SPECIES_PREFERRED auto-selects: 256-bit (4 lanes) on AVX2, 512-bit (8 lanes) on AVX-512 + + @Override + public double sum(final double[] values, final int offset, final int length) { + DoubleVector acc = DoubleVector.zero(SPECIES); + final int bound = SPECIES.loopBound(length); + int i = offset; + for (; i < offset + bound; i += SPECIES.length()) + acc = acc.add(DoubleVector.fromArray(SPECIES, values, i)); + double result = acc.reduceLanes(VectorOperators.ADD); + for (; i < offset + length; i++) // tail + result += values[i]; + return result; + } + + @Override + public double min(final double[] values, final int offset, final int length) { + DoubleVector acc = DoubleVector.broadcast(SPECIES, Double.MAX_VALUE); + final int bound = SPECIES.loopBound(length); + int i = offset; + for (; i < offset + bound; i += SPECIES.length()) + acc = acc.min(DoubleVector.fromArray(SPECIES, values, i)); + double result = acc.reduceLanes(VectorOperators.MIN); + for (; i < offset + length; i++) + if (values[i] < result) + result = values[i]; + return result; + } + + @Override + public int countFiltered(final long[] bitmask, final int offset, final int length) { + // SIMD popcount: count bits set in bitmask (number of matching samples) + int count = 0; + for (int i = offset; i < offset + length; i++) + count += Long.bitCount(bitmask[i]); // intrinsic → POPCNT instruction + return count; + } + + // ... analogous for max, sumFiltered, greaterThan, between, bitmask ops +} +``` + +#### Provider (Runtime Auto-Detection) + +```java +package com.arcadedb.engine.timeseries.simd; + +/** + * Singleton provider that detects SIMD availability at startup. + * Same pattern as JVector's VectorizationProvider.getInstance(). + */ +public final class TimeSeriesVectorOpsProvider { + + private static final TimeSeriesVectorOps INSTANCE; + + static { + TimeSeriesVectorOps ops; + try { + // Try to load SIMD implementation — will fail if jdk.incubator.vector is absent + Class.forName("jdk.incubator.vector.DoubleVector"); + ops = new SimdTimeSeriesVectorOps(); + LogManager.instance().log(TimeSeriesVectorOpsProvider.class, Level.INFO, + "TimeSeries SIMD acceleration enabled (Vector API, %d-bit lanes)", + jdk.incubator.vector.DoubleVector.SPECIES_PREFERRED.vectorBitSize()); + } catch (final Throwable e) { + ops = new ScalarTimeSeriesVectorOps(); + LogManager.instance().log(TimeSeriesVectorOpsProvider.class, Level.INFO, + "TimeSeries SIMD acceleration not available, using scalar fallback"); + } + INSTANCE = ops; + } + + public static TimeSeriesVectorOps getInstance() { + return INSTANCE; + } +} +``` + +#### Where SIMD Is Used in the Query Path + +``` +Sealed block read → decompress column → double[] array (in heap) + ↓ + TimeSeriesVectorOpsProvider.getInstance() + ↓ + ┌─── SimdTimeSeriesVectorOps (if available) + │ → 4-8 doubles per cycle (AVX2/AVX-512) + │ + └─── ScalarTimeSeriesVectorOps (fallback) + → 1 double per cycle (standard loop) + ↓ + partial aggregation result (per block, per shard) +``` + +**Operations that benefit most from SIMD:** + +| Operation | SIMD Speedup | Notes | +|---|---|---| +| SUM / AVG over double[] | 4-8x | Process 4 (AVX2) or 8 (AVX-512) doubles per cycle | +| MIN / MAX over double[] | 4-8x | Lane-wise min/max with reduce | +| Bitmask AND/OR (tag filter combine) | 4-8x | 256/512-bit bitwise ops | +| COUNT (popcount on bitmask) | HW intrinsic | Maps to POPCNT instruction | +| Threshold filtering (WHERE temp > 30) | 4-8x | SIMD compare → bitmask | +| SUM with bitmask (filtered agg) | 3-6x | Masked lane operations | + +**Operations where SIMD helps less:** +- Delta-of-delta decoding: sequential dependency (each value depends on previous). Can be partially vectorized with prefix-sum techniques but not in Phase 1. +- Gorilla XOR decoding: bit-level sequential. Pure Java is fine — decoding is not the bottleneck (I/O dominates). +- Dictionary lookup: indirect indexing, not SIMD-friendly. But dictionaries are tiny. + +#### Runtime Requirements + +- **JDK 21+**: `--add-modules jdk.incubator.vector` (already in ArcadeDB's server.sh and test argLine) +- **No additional dependency**: The Vector API is part of the JDK, not an external library +- **Automatic fallback**: If the module is not available (e.g., GraalVM native image), `ScalarTimeSeriesVectorOps` is used transparently +- **Future-proof**: When `jdk.incubator.vector` graduates to a stable module (expected in a future JDK LTS), simply update the import — the API is the same + +### 6.14 Java API + +```java +/** + * Mutable transactional storage for timeseries data. + * Extends PaginatedComponent for WAL, MVCC, and replication support. + * Holds recent data in row-oriented pages. Compaction moves data to sealed files. + */ +public class TimeSeriesBucket extends PaginatedComponent { + + // === Schema === + List getColumns(); + int getTimestampColumnIndex(); + + // === Write (transactional, MVCC-safe) === + void appendSamples(long[] timestamps, Object[]... columnValues); + + // === Read from mutable pages only === + List scanRange(long fromTs, long toTs, int[] columnIndices); // materialized + Iterator iterateRange(long fromTs, long toTs, int[] columnIndices); // streaming (lazy, page-at-a-time) + + // === Metadata === + long getCompactionWatermark(); + long getSampleCount(); // uncompacted samples only + int getActiveDataPageCount(); + + // === Compaction === + void compact(TimeSeriesSealedStore sealedStore); // move full pages → sealed files (chunked, 65K rows/block) +} + +/** + * Immutable columnar storage for timeseries data. + * NOT a PaginatedComponent — uses plain FileChannel I/O. + * One instance manages the index file + all per-column files for a type. + */ +public class TimeSeriesSealedStore { + + // === Read (the primary query path for historical data) === + List scanRange(long fromTs, long toTs, int[] columnIndices); // materialized + Iterator iterateRange(long fromTs, long toTs, int[] columnIndices); // streaming + // Streaming iterator uses: binary search on block directory → lazy column decompression + // → binary search within blocks (lowerBound/upperBound) → early termination + + // === Metadata === + long getMinTimestamp(); + long getMaxTimestamp(); + long getSampleCount(); + int getBlockCount(); + + // === Write (called by compaction only, NOT by user transactions) === + void appendBlock(int sampleCount, long minTs, long maxTs, byte[][] compressedColumns); + // Writes inline block metadata (magic 0x5453424C + minTs + maxTs + sampleCount + colSizes) + // before column data, enabling directory reconstruction on cold open + + // === Directory persistence === + void loadDirectory(); // reconstructs block directory by scanning inline metadata records + + // === Maintenance === + void truncateBefore(long timestamp); // retention: remove old blocks +} + +/** + * A shard is a paired mutable bucket + sealed store. + * One shard per core for zero-contention writes. + * Compaction runs independently per shard. + */ +public class TimeSeriesShard { + final TimeSeriesBucket mutableBucket; // PaginatedComponent — WAL, MVCC + final TimeSeriesSealedStore sealedStore; // plain files — immutable, per-column + + static final int SEALED_BLOCK_SIZE = 65_536; // rows per sealed block (chunked compaction) + + void appendSamples(long[] timestamps, Object[]... columnValues); + List scanRange(long fromTs, long toTs, int[] columns, TagFilter filter); // materialized + Iterator iterateRange(long fromTs, long toTs, int[] columns, TagFilter filter); // streaming + // Streaming: chains sealed iterator → mutable iterator, applies tag filter inline + void compact(); // move full mutable pages → sealed files (chunked, shard-local) +} + +/** + * Coordinates reads across ALL shards (mutable + sealed layers). + * This is what the SQL query engine interacts with. + * Routes writes to the correct shard via BucketSelectionStrategy. + */ +public class TimeSeriesEngine { + + final TimeSeriesShard[] shards; // one per core (default) + final BucketSelectionStrategy strategy; // Thread or Partitioned + + // Write: routes to correct shard (lock-free, zero contention) + void appendSamples(Document record, long[] timestamps, Object[]... columnValues) { + int shardIdx = strategy.getBucketIdByRecord(record, async); + shards[shardIdx].appendSamples(timestamps, columnValues); + } + + // Read (materialized): queries all shards, merges results into List + List query(long fromTs, long toTs, int[] columns, TagFilter filter); + + // Read (streaming): lazy merge-sort across shard iterators via PriorityQueue + Iterator iterateQuery(long fromTs, long toTs, int[] columns, TagFilter filter) { + // 1. Create per-shard iterators (each chains sealed → mutable) + // 2. Min-heap merge-sort by timestamp: each next() advances only the + // shard with smallest current timestamp + // 3. Memory: O(shardCount × blockSize) — constant regardless of dataset size + // 4. Used by FetchFromTimeSeriesStep for SQL queries (prevents OOM on full scans) + } +} +``` + +### 6.15 Integration with ArcadeDB Schema + +``` +TimeSeriesType extends DocumentType + ├── owns N TimeSeriesShards (one per core, default = availableProcessors()) + │ └── each shard: + │ ├── TimeSeriesBucket (PaginatedComponent — mutable, transactional) + │ └── TimeSeriesSealedStore (plain files — immutable, per-column) + ├── owns a TimeSeriesEngine (coordinates reads/writes across all shards) + ├── uses BucketSelectionStrategy (ThreadBucket default, or PartitionedBucket) + ├── optional LSM-Tree index on (tag_columns, timestamp) per shard + │ for high-cardinality point lookups + ├── TimeSeriesType knows: + │ - which column is the designated timestamp + │ - which columns are tags vs. fields + │ - partition interval, retention policy, compression settings + └── SQL DDL: + CREATE TIMESERIES TYPE SensorReading + TIMESTAMP ts PRECISION NANOSECOND + PARTITION BY INTERVAL 1 DAY + RETENTION 90 DAYS + TAGS (sensor_id STRING, location STRING) + FIELDS (temperature DOUBLE, humidity DOUBLE, pressure DOUBLE) +``` + +### 6.16 Compression Savings + +Example schema: 1 timestamp + 1 string tag + 2 double fields. + +**Mutable file** (row-oriented, uncompressed within pages): +- ~26 bytes/sample → ~2,400 samples per 64KB page +- Only holds recent data (seconds to minutes), so total size is small + +**Sealed files** (columnar, compressed, per-column): +- ~3-4 bytes/sample across all columns combined +- Zero wasted space (variable-size blocks, no page padding) +- Per-column I/O: a query touching 2 of 5 columns reads only 40% of the data + +| Layer | Bytes/Sample | Samples per 64KB equivalent | Notes | +|---|---|---|---| +| Mutable (row, uncompressed) | ~26 B | ~2,400 | Small dataset, fast MVCC append | +| Sealed (columnar, compressed) | ~3-4 B | ~16,000-21,000 | 99%+ of data, zero waste | +| Sealed (best case, slow values) | ~1.5 B | ~40,000+ | Regular intervals, stable values | + +At 1M total samples with 5 columns: +- Mutable: holds last ~2,400 samples = 1 page = 64KB +- Sealed: ~3.5 MB across all column files (vs. ~25 MB uncompressed) — **7x compression** +- Query reading 2 of 5 columns: reads ~1.4 MB — **18x less I/O than uncompressed row storage** + +--- + +## Part 7: Implementation Plan — Making ArcadeDB a Leading TSDB + +### Phase 1: Foundation — Two-Layer Storage + Schema (Core) + +**Goal**: Store and retrieve timeseries data efficiently with fast range queries using the two-layer mutable/sealed architecture. + +#### 1a. Compression Codecs +- Implement timeseries-specific compression codecs as standalone classes (no storage dependency): + - `DeltaOfDeltaCodec` — for timestamps (based on Facebook Gorilla paper) + - `GorillaXORCodec` — for double values + - `DictionaryCodec` — for low-cardinality string tags (dictionary + Simple-8b RLE indices) + - `Simple8bCodec` — for integer packing with RLE +- Each codec: `byte[] encode(primitive_array, count)` and `primitive_array decode(byte[], count)` +- **Key package**: `com.arcadedb.engine.timeseries.codec` +- **Tests first**: Unit test each codec independently with known inputs/outputs, edge cases (all-same values, all-different, empty, single value, max precision, out-of-range) + +#### 1b. TimeSeriesBucket (Mutable Layer) +- New `TimeSeriesBucket extends PaginatedComponent` with header page, directory pages, and row-oriented active data pages +- Concurrent transaction support via standard ArcadeDB MVCC (same as `LocalBucket`) +- Active page: row-oriented, fixed-size sample rows, tag dictionary at page tail +- Directory pages: sorted entries with min/max timestamp per data page for binary search +- `appendSamples()` appends to active page within a transaction +- `scanMutableRange()` reads uncompacted data pages +- **Key package**: `com.arcadedb.engine.timeseries` +- **Reuses**: `PaginatedComponent`, `PageManager`, `TransactionContext`, `WALFile` + +#### 1c. TimeSeriesSealedStore (Sealed Layer) +- Per-column files (`.ts.col.N.*`) with variable-size compressed blocks, zero padding +- Shared index file (`.ts.index`) with block directory (min/max timestamp, column offsets/sizes) +- I/O via `java.nio.channels.FileChannel` positioned reads with direct ByteBuffers +- `scanRange()` reads index → binary search → reads only needed column files +- `appendBlock()` called by compaction to add new sealed blocks +- `truncateBefore()` for retention +- **Key package**: `com.arcadedb.engine.timeseries` + +#### 1d. TimeSeriesShard (Paired Unit) +- Pairs a `TimeSeriesBucket` (mutable) with a `TimeSeriesSealedStore` (sealed) +- Each shard is an independent write/compact/read unit — no shared state +- Compaction runs per-shard: background thread reads shard's full mutable pages → sorts → compresses → appends to shard's sealed files → cleans shard's mutable directory (in transaction, crash-safe) +- Compaction watermark per shard for crash recovery +- Configurable compaction interval (default: 30 seconds or when N mutable pages are full) +- **Reuses**: Existing background task infrastructure + +#### 1e. TimeSeriesEngine (Query Coordinator + Shard Router) +- Routes writes to the correct shard via `BucketSelectionStrategy` (lock-free) +- Coordinates reads across **all shards** in parallel (N shards = N parallel scans) +- Merges partial aggregations from shards → final result +- For `PartitionedBucketSelectionStrategy` + tag filter: routes to single shard (zero cross-shard overhead) +- **Reuses**: `BucketSelectionStrategy`, `DatabaseAsyncExecutorImpl` for parallel reads + +#### 1f. Schema: TimeSeriesType +- New `TimeSeriesType` extending `DocumentType` with: + - N `TimeSeriesShard` instances (default = `availableProcessors()`, configurable via `SHARDS`) + - `BucketSelectionStrategy` (default `ThreadBucketSelectionStrategy`, or `PartitionedBucketSelectionStrategy` via `PARTITION BY`) + - Mandatory designated timestamp column (DATETIME_NANOS default, configurable precision) + - Tag columns (indexed, low-cardinality) vs. field columns (values, high-cardinality) + - Configurable partition interval and retention policy +- SQL DDL support (CREATE/ALTER/DROP TIMESERIES TYPE) +- **Reuses**: `LocalDocumentType`, `LocalSchema`, `Type` enum, `BucketSelectionStrategy` + +#### 1g. Basic Query Support +- Time-windowed aggregation: `GROUP BY time(interval)` +- Sealed: block pruning via index binary search + per-column I/O (only read needed columns) +- Mutable: scan active pages, filter by time range +- Tag filtering: dictionary-decoded bitmask in sealed blocks, direct comparison in mutable pages +- Streaming aggregation: one block/page at a time, constant memory +- **Reuses**: `AggregationContext`, SQL execution framework + +#### 1h. Retention +- Sealed: `truncateBefore(timestamp)` rewrites column files and index without old blocks +- Mutable: remove compacted pages from directory (in transaction) +- Optional: time-partitioned sealed files (one set of column files per time window) for instant retention by file deletion + +#### 1i. SIMD-Accelerated Aggregation +- `TimeSeriesVectorOps` interface with `sum`, `min`, `max`, `sumFiltered`, `countFiltered`, bitmask ops +- `ScalarTimeSeriesVectorOps`: pure Java loops (always works, no dependencies) +- `SimdTimeSeriesVectorOps`: Java Vector API (`jdk.incubator.vector`), processes 4-8 doubles per cycle +- `TimeSeriesVectorOpsProvider`: singleton auto-detection at startup (same pattern as JVector's `VectorizationProvider`) +- Used by sealed block reader and aggregation engine from day one — not a later optimization +- **Key package**: `com.arcadedb.engine.timeseries.simd` +- **Tests**: Benchmark both implementations, verify identical results, test edge cases (empty arrays, single element, non-aligned lengths) +- **No new dependency**: Vector API is part of the JDK; `--add-modules jdk.incubator.vector` already in server.sh + +#### 1j. SQL DDL & DML +- `CreateTimeSeriesTypeStatement extends CreateTypeAbstractStatement` — parse TIMESTAMP/TAGS/FIELDS/SHARDS/RETENTION +- `time_bucket(interval, timestamp)` function: `SQLFunctionTimeBucket extends SQLFunctionAbstract` +- `first(value)` / `last(value)` aggregate functions: track min/max timestamp during `AggregateProjectionCalculationStep` +- Route `INSERT INTO` for timeseries types to `TimeSeriesEngine.appendSamples()` instead of `LocalBucket` +- **Reuses**: `CreateTypeAbstractStatement`, `SQLFunctionFactoryTemplate`, `InsertExecutionPlanner`, `AggregateProjectionCalculationStep` + +#### 1k. HTTP Ingestion Endpoint (InfluxDB Line Protocol) +- `PostTimeSeriesWriteHandler extends AbstractServerHttpHandler` +- `LineProtocolParser`: parse ILP text → batch of (measurement, tags, fields, timestamp) +- Endpoints: `POST /api/v1/ts/{database}/write` + `POST /api/v2/write` (InfluxDB v2 compat) +- Auto-schema creation (opt-in): first line defines type schema, subsequent lines auto-alter +- Gzip decompression support for large batches +- **Reuses**: `AbstractServerHttpHandler`, `HttpServer.setupRoutes()`, existing auth +- **Tests**: Parse correctness (edge cases, escaping, type suffixes), batch throughput, error handling + +### Phase 2: Query Engine — TimeSeries Functions & Aggregations + +#### 2a. TimeSeries-Specific Functions — **COMPLETED** +- ✅ `ts.first(value, timestamp)` / `ts.last(value, timestamp)` — first/last value in time window +- ✅ `ts.rate(value, ts [, counterResetDetection])` — per-second rate of change with optional counter reset detection +- ✅ `ts.delta(value, ts)` — difference between first and last in window +- ✅ `ts.movingAvg(value, window)` — sliding window average +- ✅ `ts.percentile(value, percentile)` — approximate percentile (p50/p95/p99) with exact sort and rank interpolation +- ✅ `ts.interpolate(value, method [, timestamp])` — fill missing values (linear, prev, zero, none) +- ✅ `ts.correlate(series_a, series_b)` — Pearson correlation between two series +- ✅ `ts.timeBucket(interval, ts)` — time bucketing for GROUP BY aggregation +- **Reuses**: Existing `SQLFunction` registration framework + +#### 2b. Continuous Aggregates (**IMPLEMENTED**) +- Watermark-based incremental aggregation — separate from MaterializedView to keep timeseries-specific logic clean: + ```sql + -- Create a continuous aggregate (initial full refresh runs automatically) + CREATE CONTINUOUS AGGREGATE hourly_temps AS + SELECT sensor_id, ts.timeBucket('1h', ts) AS hour, + avg(temperature) AS avg_temp, max(temperature) AS max_temp + FROM SensorReading + GROUP BY sensor_id, hour + + -- Idempotent creation + CREATE CONTINUOUS AGGREGATE IF NOT EXISTS hourly_temps AS ... + + -- Manual refresh + REFRESH CONTINUOUS AGGREGATE hourly_temps + + -- Drop (removes backing type too) + DROP CONTINUOUS AGGREGATE hourly_temps + DROP CONTINUOUS AGGREGATE IF EXISTS hourly_temps + + -- Query metadata + SELECT FROM schema:continuousAggregates + ``` +- **Automatic incremental refresh**: After each transaction that inserts into a TimeSeries type, a post-commit callback triggers incremental refresh of all continuous aggregates sourced from that type. Only data from the watermark forward is reprocessed — stale buckets are deleted and recomputed. +- **Watermark tracking**: Tracks the start of the last fully computed time bucket. On refresh, deletes rows where `bucketColumn >= watermark`, re-runs the query filtered by `WHERE ts >= watermark`, inserts results, advances watermark to `max(bucketColumn)`. +- **Query validation at creation**: Source must be a TimeSeries type, query must include `ts.timeBucket(interval, ts)` with an alias in projections and GROUP BY, only aggregate functions allowed in non-GROUP-BY projections. +- **Schema persistence**: Stored in `LocalSchema.toJSON()` under `"continuousAggregates"` section. Survives database close/reopen. Crash recovery marks BUILDING→STALE on restart. +- **Concurrency**: Atomic `tryBeginRefresh()` / `endRefresh()` guard prevents concurrent refresh of the same aggregate. +- **Java API**: `schema.buildContinuousAggregate().withName("...").withQuery("...").withIgnoreIfExists(true).create()` +- **Metrics**: refreshCount, refreshTotalTimeMs, refreshMinTimeMs, refreshMaxTimeMs, lastRefreshDurationMs, errorCount + +#### 2c. Downsampling Policies +- Automatically reduce resolution of old data: + ```sql + ALTER TIMESERIES TYPE SensorReading + ADD DOWNSAMPLING POLICY + AFTER 7 DAYS GRANULARITY 1 MINUTE + AFTER 30 DAYS GRANULARITY 1 HOUR + ``` + +### Phase 3: Graph + TimeSeries Integration (The Differentiator) + +#### 3a. TimeSeries-on-Vertex / TimeSeries-on-Edge +- Any vertex or edge can have associated timeseries data +- Schema declaration: + ```sql + CREATE VERTEX TYPE Sensor + PROPERTIES (name STRING, location STRING) + TIMESERIES temperature (DOUBLE, PARTITION 1 DAY, RETENTION 90 DAYS) + TIMESERIES humidity (DOUBLE, PARTITION 1 DAY, RETENTION 30 DAYS) + ``` +- Under the hood: each timeseries property creates a linked `TimeSeriesType` with a foreign key back to the vertex RID +- The vertex stores a lightweight pointer (bucket + latest partition) for fast access + +#### 3b. TIMESERIES Clause in SQL +- New SQL clause to access timeseries data from graph traversals: + ```sql + -- Access timeseries of vertices found by traversal + SELECT v.name, avg(ts.value) + FROM (TRAVERSE out('InstalledIn') FROM #12:0 MAXDEPTH 3) AS v + TIMESERIES v.temperature AS ts FROM '2026-02-19' TO '2026-02-20' + GROUP BY v.name + ``` +- Query planner optimizes: first resolve graph traversal to RID set, then batch-fetch timeseries data for all RIDs in parallel + +#### 3c. Graph-Aware Aggregation Functions +- `ROLLUP ALONG path`: Aggregate timeseries following graph hierarchy + ```sql + SELECT node.name, node.@type, + sum_along_children(node, 'ContainedIn', 'energy_kwh', + FROM '2026-02-01' TO '2026-02-20', GRANULARITY '1h') AS total_energy + FROM (SELECT FROM V WHERE @type IN ['Campus', 'Building', 'Floor']) + ``` + +#### 3d. OpenCypher TimeSeries Functions & Procedures +- Register `ts.*` functions in native `CypherFunctionRegistry`: `ts.avg`, `ts.sum`, `ts.min`, `ts.max`, `ts.count`, `ts.first`, `ts.last`, `ts.rate`, `ts.query` +- Register `ts.range`, `ts.aggregate` procedures in `CypherProcedureRegistry` for tabular results via `CALL ... YIELD` +- Functions evaluated by existing `ExpressionEvaluator` via `CypherFunctionFactory` (already supports namespaced functions) +- Procedures executed by existing `CallStep` (already handles YIELD) +- No Cypher grammar changes needed — Cypher25 grammar already supports namespaced functions and CALL +- **Reuses**: `CypherFunctionRegistry`, `CypherProcedureRegistry`, `ExpressionEvaluator`, `CallStep` + +#### 3e. Temporal Graph Snapshots (Future) +- Query the graph as it existed at a specific point in time +- Track edge creation/deletion timestamps +- `AT TIMESTAMP '2025-06-01'` clause for historical graph state + +### Phase 4: Advanced Performance Optimizations + +> **Note**: SIMD aggregation and shard-per-core parallelism are already in Phase 1 (core design). +> This phase focuses on additional optimizations beyond the foundation. + +#### 4a. Advanced SIMD: Vectorized Decompression +- SIMD-accelerated delta-of-delta decoding using prefix-sum vectorization +- SIMD-accelerated Gorilla XOR decoding (batch bit-manipulation) +- Benchmark against scalar decompression to validate speedup + +#### 4b. Write Path Optimization +- Batch ingestion API: `INSERT INTO SensorReading BATCH [...]` accepting arrays of values +- Configurable flush interval and buffer size +- Out-of-order tolerance window: buffer and sort before commit + +#### 4c. Adaptive Block Sizing +- Dynamically size sealed blocks based on data characteristics +- Smaller blocks for high-cardinality data (faster filtering) +- Larger blocks for uniform data (better compression ratios) + +### Phase 5: HTTP API & Studio Integration — **COMPLETED** + +#### 5a. REST API for TimeSeries +- ✅ `POST /api/v1/ts/{database}/write?precision=ns|us|ms|s` — InfluxDB Line Protocol batch ingestion +- ✅ `POST /api/v1/ts/{database}/query` — JSON query with raw/aggregated response, time range, field projection, tag filtering +- ✅ `GET /api/v1/ts/{database}/latest?type=name&tag=key:value` — latest value per series with optional tag filter +- ✅ Grafana DataFrame-compatible endpoints (`GET .../grafana/health`, `GET .../grafana/metadata`, `POST .../grafana/query`) — works with Grafana Infinity datasource plugin +- Prometheus remote-write/remote-read compatibility endpoints (future — requires protobuf dependency) + +#### 5b. Studio TimeSeries Explorer +- ✅ Full TimeSeries tab in Studio navigation with Query, Schema, and Ingestion sub-tabs +- ✅ Time-range picker (5min/1h/24h/7d/All/Custom) with configurable aggregation and bucket intervals +- ✅ ApexCharts line/area charts with datetime x-axis, zoom, dark mode support +- ✅ DataTable with pagination for raw and aggregated results +- ✅ Chart/table toggle switches with per-database localStorage persistence +- ✅ Schema introspection: columns, diagnostics, configuration, downsampling tiers, shard details +- ✅ Ingestion documentation: Line Protocol, SQL INSERT, Java API with examples and comparison table +- ✅ HTTP API Reference panel with 3 TimeSeries endpoints and interactive playground +- Combined graph + timeseries view (future — Phase 3 integration) + +--- + +## Part 8: Prioritized Roadmap + +### MVP (Phase 1 — "Two-Layer Storage + Fast Range Queries") +**Goal**: Users can store and query timeseries data with the sharded, two-layer mutable/sealed architecture. +- Compression codecs: DeltaOfDelta, GorillaXOR, Dictionary, Simple8b +- `TimeSeriesShard` = `TimeSeriesBucket` (mutable, paginated, MVCC) + `TimeSeriesSealedStore` (immutable, per-column files) +- **Shard-per-core**: N shards per type (default = availableProcessors()), zero-contention parallel writes +- `BucketSelectionStrategy` integration: `ThreadBucketSelectionStrategy` (default) or `PartitionedBucketSelectionStrategy` +- `TimeSeriesEngine` routing writes to shards + coordinating parallel reads across all shards +- Background compaction per shard (mutable → sealed), crash-safe via pre-compaction checkpoint +- Free page list for mutable file page reuse, out-of-order data handling (3 levels) +- `CREATE TIMESERIES TYPE` DDL with `TimeSeriesType` (configurable SHARDS, PARTITION BY) +- Range queries: parallel shard scans → index binary search → per-column I/O (sealed) + page scan (mutable) +- `GROUP BY time(interval)` aggregation with parallel partial aggregation per shard +- **SIMD-accelerated aggregation**: `TimeSeriesVectorOps` interface with auto-detected SIMD (Project Panama) or scalar fallback — 4-8x faster SUM/AVG/MIN/MAX from day one +- **SQL**: `CREATE TIMESERIES TYPE` DDL, `time_bucket()` function, `first()`/`last()` aggregates, standard INSERT +- **HTTP ingestion**: InfluxDB Line Protocol compatible endpoint (`POST /api/v1/ts/{db}/write` + `/api/v2/write`), Telegraf/Grafana Agent ready +- **Java API**: Direct `TimeSeriesEngine.appendSamples()` for maximum throughput (~0.5-1μs/sample) +- Retention policies (per-shard sealed file truncation + optional time-partitioned file sets) + +### v2 (Phase 2 — "Rich Query Functions") — **COMPLETED** +**Goal**: Competitive query capabilities for analytics. +- ✅ TimeSeries-specific functions (rate, delta, moving_avg, interpolate, correlate, timeBucket, first, last, percentile) +- ✅ Counter reset detection in `ts.rate()` (optional 3rd param for Prometheus-style monotonic counters) +- ✅ Linear interpolation in `ts.interpolate()` (4th fill method) +- ✅ Approximate percentiles via `ts.percentile()` (p50/p95/p99) +- ✅ Continuous aggregates (watermark-based incremental refresh, automatic post-commit trigger, SQL DDL, schema metadata) +- ✅ Downsampling policies with automatic scheduler +- ✅ Window functions: `ts.lag()`, `ts.lead()`, `ts.rowNumber()`, `ts.rank()` — compare current/previous values, sequential numbering, ranking with ties + +### v3 (Phase 3 — "Graph + TimeSeries, The Differentiator") +**Goal**: World's first native graph + timeseries integration. +- `TIMESERIES ... AS` clause in SQL (graph traversal + timeseries aggregation in one query) +- `ts.*` Cypher functions (`ts.avg`, `ts.max`, `ts.last`, etc.) for OpenCypher graph+TS queries +- TimeSeries-on-Vertex/Edge (vertex owns timeseries data) +- Graph-aware aggregation (`ROLLUP ALONG` graph hierarchy) +- Combined graph + timeseries Studio visualization + +### v4 (Phase 4+5 — "Performance & Ecosystem") — **COMPLETED** +**Goal**: Advanced optimizations + full ecosystem integration. +- ✅ SIMD-accelerated aggregation: `TimeSeriesVectorOps` wired into `aggregateMultiBlocks()` slow path with segment-based vectorized `sum()/min()/max()` +- ✅ Parallel shard aggregation: `CompletableFuture`-based concurrent sealed store processing with flat-array merge +- ✅ Coalesced I/O: single pread per block, reusable decode buffers, flat array accumulation (no HashMap) +- ✅ BitReader sliding-window register: pre-loaded 64-bit window, lazy refill every ~7-8 bytes (decompVal 1305ms → 1224ms) +- ✅ Bucket-aligned compaction: `COMPACTION_INTERVAL` DDL splits blocks at bucket boundaries for 100% fast-path aggregation +- ✅ Dedicated timeseries JSON query endpoint (`POST /api/v1/ts/{db}/query`) with raw + aggregated responses +- ✅ Dedicated latest-value endpoint (`GET /api/v1/ts/{db}/latest`) with tag filtering +- ✅ Studio TimeSeries Explorer: Query (charts + tables), Schema (introspection), Ingestion (docs + examples) +- ✅ HTTP API Reference panel with TimeSeries section and interactive playground +- ✅ Multi-tag filtering in query engine and HTTP API +- ✅ Time range operator push-down (`>`, `>=`, `<`, `<=`, `=` — not just `BETWEEN`) +- ✅ Automatic retention/downsampling scheduler (`TimeSeriesMaintenanceScheduler` daemon thread) +- Advanced decompression: Gorilla XOR decode is inherently sequential (each value XORs with previous) — further gains require fused decode+aggregate or alternative encoding schemes + +### v5 (Phase 6+7 — "Observability Ecosystem Integration") +**Goal**: Drop-in compatibility with the Prometheus/Grafana/OpenTelemetry ecosystem. + +| Priority | Feature | Who has it | Effort | +|----------|---------|-----------|--------| +| P0 | PromQL / MetricsQL query language | Prometheus, VictoriaMetrics, Grafana Mimir | High | +| P0 | Grafana native datasource plugin | All 10 TSDBs | Medium | +| P0 | Prometheus `remote_write` / `remote_read` | 6/10 TSDBs | Medium | +| P0 | Alerting & recording rules | 7+/10 TSDBs | High | +| P1 | OpenTelemetry OTLP ingestion (gRPC + HTTP) | 5/10 TSDBs (growing) | Medium | +| P1 | Cardinality management & monitoring | VictoriaMetrics, Grafana Mimir, Prometheus | Medium | + +### v6 (Phase 8 — "Advanced Analytics & Data Platform") +**Goal**: Feature parity with analytics-focused TSDBs. + +| Priority | Feature | Who has it | Effort | +|----------|---------|-----------|--------| +| ~~P1~~ | ~~SQL window functions (LAG, LEAD, RANK, etc.)~~ | ~~TimescaleDB, QuestDB, ClickHouse, TDengine, Kdb+~~ | **DONE** | +| P1 | ASOF JOIN / temporal joins | QuestDB, ClickHouse, Kdb+ | High | +| P1 | Streaming aggregation at ingestion | TDengine, VictoriaMetrics, QuestDB | High | +| P2 | TimeSeries via PostgreSQL wire protocol | TimescaleDB, QuestDB | Medium | +| P2 | Native histogram support | Prometheus 3.0, VictoriaMetrics 3.0 | High | +| P2 | Tiered storage (S3/GCS/Azure Blob) | InfluxDB 3, Grafana Mimir, ClickHouse | High | +| P2 | Parquet/Arrow export/import | InfluxDB 3, QuestDB, ClickHouse, Kdb+ | Medium | +| P3 | MQTT protocol support | TDengine, Apache IoTDB | Medium | +| P3 | Exemplars (trace-to-metrics linking) | Prometheus, Grafana Mimir, VictoriaMetrics | Medium | +| P3 | Anomaly detection | VictoriaMetrics (enterprise) | High | +| P3 | TCP ingestion socket (raw ILP over TCP) | QuestDB | Low | + +--- + +## Key Sources + +- Facebook Gorilla paper (VLDB 2015) — Compression algorithms +- HyGraph (EDBT 2025, University of Leipzig) — Graph + TimeSeries unification theory +- "Combining Time-Series and Graph Data: A Survey" (arXiv:2601.00304, Jan 2025) — Confirms no production system unifies both +- InfluxDB 3.0 FDAP architecture — Modern Arrow/Parquet approach +- TimescaleDB compression docs — 7 compression algorithms reference +- QuestDB architecture — Columnar + SIMD reference implementation +- ClickHouse MergeTree — Sparse indexing + composable codecs +- Datadog Monocle — Shard-per-core LSM design +- Competitive gap analysis (Feb 2026) — Feature comparison against InfluxDB 3, TimescaleDB, Prometheus, QuestDB, TDengine, ClickHouse, Kdb+, Apache IoTDB, VictoriaMetrics, Grafana Mimir diff --git a/engine/src/main/antlr4/com/arcadedb/query/sql/grammar/SQLLexer.g4 b/engine/src/main/antlr4/com/arcadedb/query/sql/grammar/SQLLexer.g4 index 9e23209bf7..35470cea19 100644 --- a/engine/src/main/antlr4/com/arcadedb/query/sql/grammar/SQLLexer.g4 +++ b/engine/src/main/antlr4/com/arcadedb/query/sql/grammar/SQLLexer.g4 @@ -231,6 +231,20 @@ MINUTE: M I N U T E; HOUR: H O U R; MANUAL: M A N U A L; INCREMENTAL: I N C R E M E N T A L; +TIMESERIES: T I M E S E R I E S; +TAGS: T A G S; +FIELDS: F I E L D S; +RETENTION: R E T E N T I O N; +COMPACTION_INTERVAL: C O M P A C T I O N UNDERSCORE I N T E R V A L; +SHARDS: S H A R D S; +DAYS: D A Y S; +HOURS: H O U R S; +MINUTES: M I N U T E S; +DOWNSAMPLING: D O W N S A M P L I N G; +POLICY: P O L I C Y; +GRANULARITY: G R A N U L A R I T Y; +CONTINUOUS: C O N T I N U O U S; +AGGREGATE: A G G R E G A T E; // ============================================================================ // COMPARISON OPERATORS diff --git a/engine/src/main/antlr4/com/arcadedb/query/sql/grammar/SQLParser.g4 b/engine/src/main/antlr4/com/arcadedb/query/sql/grammar/SQLParser.g4 index 9f08822900..cbd09ca13c 100644 --- a/engine/src/main/antlr4/com/arcadedb/query/sql/grammar/SQLParser.g4 +++ b/engine/src/main/antlr4/com/arcadedb/query/sql/grammar/SQLParser.g4 @@ -95,6 +95,8 @@ statement | CREATE EDGE createEdgeBody # createEdgeStmt | CREATE TRIGGER createTriggerBody # createTriggerStmt | CREATE MATERIALIZED VIEW createMaterializedViewBody # createMaterializedViewStmt + | CREATE TIMESERIES TYPE createTimeSeriesTypeBody # createTimeSeriesTypeStmt + | CREATE CONTINUOUS AGGREGATE createContinuousAggregateBody # createContinuousAggregateStmt // DDL Statements - ALTER variants | ALTER TYPE alterTypeBody # alterTypeStmt @@ -102,6 +104,7 @@ statement | ALTER BUCKET alterBucketBody # alterBucketStmt | ALTER DATABASE alterDatabaseBody # alterDatabaseStmt | ALTER MATERIALIZED VIEW alterMaterializedViewBody # alterMaterializedViewStmt + | ALTER TIMESERIES TYPE alterTimeSeriesTypeBody # alterTimeSeriesTypeStmt // DDL Statements - DROP variants | DROP TYPE dropTypeBody # dropTypeStmt @@ -110,6 +113,7 @@ statement | DROP BUCKET dropBucketBody # dropBucketStmt | DROP TRIGGER dropTriggerBody # dropTriggerStmt | DROP MATERIALIZED VIEW dropMaterializedViewBody # dropMaterializedViewStmt + | DROP CONTINUOUS AGGREGATE dropContinuousAggregateBody # dropContinuousAggregateStmt // DDL Statements - TRUNCATE variants | TRUNCATE TYPE truncateTypeBody # truncateTypeStmt @@ -119,6 +123,9 @@ statement // Materialized View Refresh | REFRESH MATERIALIZED VIEW refreshMaterializedViewBody # refreshMaterializedViewStmt + // Continuous Aggregate Refresh + | REFRESH CONTINUOUS AGGREGATE refreshContinuousAggregateBody # refreshContinuousAggregateStmt + // Index Management | rebuildIndexStatement # rebuildIndexStmt @@ -425,6 +432,51 @@ createTypeBody (PAGESIZE INTEGER_LITERAL)? ; +/** + * CREATE TIMESERIES TYPE body + * Example: CREATE TIMESERIES TYPE SensorData TIMESTAMP ts TAGS (sensor_id STRING) FIELDS (temperature DOUBLE, humidity DOUBLE) SHARDS 4 RETENTION 90 DAYS COMPACTION_INTERVAL 1 HOURS + */ +createTimeSeriesTypeBody + : identifier + (IF NOT EXISTS)? + (TIMESTAMP identifier)? + (TAGS LPAREN tsTagColumnDef (COMMA tsTagColumnDef)* RPAREN)? + (FIELDS LPAREN tsFieldColumnDef (COMMA tsFieldColumnDef)* RPAREN)? + (SHARDS INTEGER_LITERAL)? + (RETENTION INTEGER_LITERAL (DAYS | HOURS | MINUTES)?)? + (COMPACTION_INTERVAL INTEGER_LITERAL (DAYS | HOURS | MINUTES)?)? + ; + +tsTagColumnDef + : identifier identifier + ; + +tsFieldColumnDef + : identifier identifier + ; + +/** + * ALTER TIMESERIES TYPE body - add or drop downsampling policy + * Example: ALTER TIMESERIES TYPE SensorData ADD DOWNSAMPLING POLICY AFTER 7 DAYS GRANULARITY 1 HOURS AFTER 30 DAYS GRANULARITY 1 DAYS + * Example: ALTER TIMESERIES TYPE SensorData DROP DOWNSAMPLING POLICY + */ +alterTimeSeriesTypeBody + : identifier ADD DOWNSAMPLING POLICY downsamplingTierClause+ + | identifier DROP DOWNSAMPLING POLICY + ; + +downsamplingTierClause + : AFTER INTEGER_LITERAL tsTimeUnit GRANULARITY INTEGER_LITERAL tsTimeUnit + ; + +tsTimeUnit + : DAYS + | HOURS + | MINUTES + | HOUR + | MINUTE + ; + /** * CREATE EDGE TYPE body (supports UNIDIRECTIONAL) */ @@ -685,6 +737,35 @@ alterMaterializedViewBody : identifier materializedViewRefreshClause ; +// ============================================================================ +// DDL STATEMENTS - CONTINUOUS AGGREGATE +// ============================================================================ + +/** + * CREATE CONTINUOUS AGGREGATE statement + * Syntax: CREATE CONTINUOUS AGGREGATE [IF NOT EXISTS] name AS selectStatement + */ +createContinuousAggregateBody + : (IF NOT EXISTS)? identifier + AS selectStatement + ; + +/** + * DROP CONTINUOUS AGGREGATE statement + * Syntax: DROP CONTINUOUS AGGREGATE [IF EXISTS] name + */ +dropContinuousAggregateBody + : (IF EXISTS)? identifier + ; + +/** + * REFRESH CONTINUOUS AGGREGATE statement + * Syntax: REFRESH CONTINUOUS AGGREGATE name + */ +refreshContinuousAggregateBody + : identifier + ; + // ============================================================================ // DDL STATEMENTS - TRUNCATE // ============================================================================ @@ -1315,6 +1396,20 @@ identifier | MANUAL | INCREMENTAL | MATERIALIZED + | CONTINUOUS + | AGGREGATE + | TIMESERIES + | TAGS + | FIELDS + | RETENTION + | COMPACTION_INTERVAL + | SHARDS + | DAYS + | HOURS + | MINUTES + | DOWNSAMPLING + | POLICY + | GRANULARITY // Additional keywords allowed as identifiers (matching JavaCC parser) | PROPERTY | BUCKETS diff --git a/engine/src/main/java/com/arcadedb/database/LocalDatabase.java b/engine/src/main/java/com/arcadedb/database/LocalDatabase.java index e6a0c5d892..891c201a82 100644 --- a/engine/src/main/java/com/arcadedb/database/LocalDatabase.java +++ b/engine/src/main/java/com/arcadedb/database/LocalDatabase.java @@ -36,6 +36,7 @@ import com.arcadedb.engine.WALFile; import com.arcadedb.engine.WALFileFactory; import com.arcadedb.engine.WALFileFactoryEmbedded; +import com.arcadedb.engine.timeseries.TimeSeriesBucket; import com.arcadedb.exception.ArcadeDBException; import com.arcadedb.exception.CommandExecutionException; import com.arcadedb.exception.DatabaseIsClosedException; @@ -74,6 +75,7 @@ import com.arcadedb.schema.EdgeType; import com.arcadedb.schema.LocalDocumentType; import com.arcadedb.schema.LocalSchema; +import com.arcadedb.schema.LocalTimeSeriesType; import com.arcadedb.schema.LocalVertexType; import com.arcadedb.schema.Property; import com.arcadedb.schema.Schema; @@ -136,7 +138,8 @@ public class LocalDatabase extends RWLockContext implements DatabaseInternal { LSMTreeIndexCompacted.NOTUNIQUE_INDEX_EXT, LSMTreeIndexCompacted.UNIQUE_INDEX_EXT, LSMVectorIndex.FILE_EXT, - LSMVectorIndexGraphFile.FILE_EXT); + LSMVectorIndexGraphFile.FILE_EXT, + TimeSeriesBucket.BUCKET_EXT); public final AtomicLong indexCompactions = new AtomicLong(); protected final String name; @@ -553,6 +556,15 @@ public long countType(final String typeName, final boolean polymorphic) { return (Long) executeInReadLock((Callable) () -> { final DocumentType type = schema.getType(typeName); + // TimeSeries types store data in their own engine, not in regular buckets + if (type instanceof LocalTimeSeriesType tsType) { + try { + return tsType.getEngine().countSamples(); + } catch (final IOException e) { + throw new DatabaseOperationException("Error counting TimeSeries samples for type '" + typeName + "'", e); + } + } + long total = 0; for (final Bucket b : type.getBuckets(polymorphic)) total += b.count(); @@ -1248,7 +1260,7 @@ public MutableDocument newDocument(final String typeName) { throw new IllegalArgumentException("Type is null"); final LocalDocumentType type = schema.getType(typeName); - if (!type.getClass().equals(LocalDocumentType.class)) + if (!type.getClass().equals(LocalDocumentType.class) && !(type instanceof com.arcadedb.schema.LocalTimeSeriesType)) throw new IllegalArgumentException("Cannot create a document of type '" + typeName + "' because is not a " + "document type"); diff --git a/engine/src/main/java/com/arcadedb/database/async/DatabaseAsyncAppendSamples.java b/engine/src/main/java/com/arcadedb/database/async/DatabaseAsyncAppendSamples.java new file mode 100644 index 0000000000..3eb575c99a --- /dev/null +++ b/engine/src/main/java/com/arcadedb/database/async/DatabaseAsyncAppendSamples.java @@ -0,0 +1,60 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.database.async; + +import com.arcadedb.database.DatabaseInternal; +import com.arcadedb.engine.timeseries.TimeSeriesEngine; +import com.arcadedb.exception.DatabaseOperationException; +import com.arcadedb.log.LogManager; + +import java.util.logging.Level; + +public class DatabaseAsyncAppendSamples implements DatabaseAsyncTask { + private final TimeSeriesEngine engine; + private final int shardIndex; + private final long[] timestamps; + private final Object[][] columnValues; + + public DatabaseAsyncAppendSamples(final TimeSeriesEngine engine, final int shardIndex, final long[] timestamps, + final Object[][] columnValues) { + this.engine = engine; + this.shardIndex = shardIndex; + this.timestamps = timestamps.clone(); + this.columnValues = new Object[columnValues.length][]; + for (int i = 0; i < columnValues.length; i++) + this.columnValues[i] = columnValues[i] != null ? columnValues[i].clone() : null; + } + + @Override + public void execute(final DatabaseAsyncExecutorImpl.AsyncThread async, final DatabaseInternal database) { + try { + engine.getShard(shardIndex).appendSamples(timestamps, columnValues); + } catch (final Exception e) { + LogManager.instance().log(this, Level.SEVERE, + "Error appending timeseries samples to shard %d of type '%s' (%d points)", + e, shardIndex, engine.getTypeName(), timestamps.length); + throw new DatabaseOperationException("Error appending timeseries samples to shard " + shardIndex, e); + } + } + + @Override + public String toString() { + return "AppendSamples(type=" + engine.getTypeName() + " shard=" + shardIndex + " points=" + timestamps.length + ")"; + } +} diff --git a/engine/src/main/java/com/arcadedb/database/async/DatabaseAsyncExecutor.java b/engine/src/main/java/com/arcadedb/database/async/DatabaseAsyncExecutor.java index 07fecbf83a..b0f8429e90 100644 --- a/engine/src/main/java/com/arcadedb/database/async/DatabaseAsyncExecutor.java +++ b/engine/src/main/java/com/arcadedb/database/async/DatabaseAsyncExecutor.java @@ -304,6 +304,16 @@ void newEdgeByKeys(String sourceVertexType, String[] sourceVertexKeyNames, Objec boolean bidirectional, boolean light, NewEdgeCallback callback, Object... properties); + /** + * Schedules the asynchronous append of time-series samples. The samples are routed to shards in a round-robin + * fashion, with each shard pinned to a dedicated async slot for zero-contention parallel ingestion. + * + * @param typeName The name of the TimeSeries type + * @param timestamps Array of timestamps for each sample + * @param columnValues One array per column (tags + fields), each with the same length as timestamps + */ + void appendSamples(String typeName, long[] timestamps, Object[]... columnValues); + /** * Forces the shutdown of the asynchronous threads. */ diff --git a/engine/src/main/java/com/arcadedb/database/async/DatabaseAsyncExecutorImpl.java b/engine/src/main/java/com/arcadedb/database/async/DatabaseAsyncExecutorImpl.java index d351367c7e..d21bf11f9d 100644 --- a/engine/src/main/java/com/arcadedb/database/async/DatabaseAsyncExecutorImpl.java +++ b/engine/src/main/java/com/arcadedb/database/async/DatabaseAsyncExecutorImpl.java @@ -31,12 +31,14 @@ import com.arcadedb.engine.Bucket; import com.arcadedb.engine.ErrorRecordCallback; import com.arcadedb.engine.WALFile; +import com.arcadedb.engine.timeseries.TimeSeriesEngine; import com.arcadedb.exception.DatabaseOperationException; import com.arcadedb.graph.Vertex; import com.arcadedb.index.IndexInternal; import com.arcadedb.log.LogManager; import com.arcadedb.schema.DocumentType; import com.arcadedb.schema.EdgeType; +import com.arcadedb.schema.LocalTimeSeriesType; import com.conversantmedia.util.concurrent.PushPullBlockingQueue; import java.util.Arrays; @@ -64,6 +66,7 @@ public class DatabaseAsyncExecutorImpl implements DatabaseAsyncExecutor { private long checkForStalledQueuesMaxDelay = 5_000; private final AtomicLong transactionCounter = new AtomicLong(); private final AtomicLong commandRoundRobinIndex = new AtomicLong(); + private final AtomicLong tsAppendCounter = new AtomicLong(); // SPECIAL TASKS public final static DatabaseAsyncTask FORCE_EXIT = new DatabaseAsyncTask() { @@ -643,6 +646,16 @@ public void newEdgeByKeys(final String sourceVertexType, final String[] sourceVe newEdge(sourceRID.asVertex(true), edgeType, destinationRID, lightWeight, callback, properties); } + @Override + public void appendSamples(final String typeName, final long[] timestamps, final Object[]... columnValues) { + final LocalTimeSeriesType tsType = (LocalTimeSeriesType) database.getSchema().getType(typeName); + final TimeSeriesEngine engine = tsType.getEngine(); + final int shardIdx = (int) (tsAppendCounter.getAndIncrement() % engine.getShardCount()); + final int slot = getSlot(shardIdx); + scheduleTask(slot, new DatabaseAsyncAppendSamples(engine, shardIdx, timestamps, columnValues), true, + backPressurePercentage); + } + /** * Test only API. */ diff --git a/engine/src/main/java/com/arcadedb/engine/timeseries/AggregationMetrics.java b/engine/src/main/java/com/arcadedb/engine/timeseries/AggregationMetrics.java new file mode 100644 index 0000000000..d654180d6f --- /dev/null +++ b/engine/src/main/java/com/arcadedb/engine/timeseries/AggregationMetrics.java @@ -0,0 +1,119 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +/** + * Mutable accumulator for aggregation timing breakdown. + *

+ * Thread-safety contract: each shard should use its own instance for accumulation + * (via {@code addIo()}, {@code addDecompTs()}, etc.). Only {@link #mergeFrom(AggregationMetrics)} + * is synchronized and safe to call from multiple threads to merge per-shard results + * into a shared instance after all futures have completed. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public final class AggregationMetrics { + + private long ioNanos; + private long decompTsNanos; + private long decompValNanos; + private long accumNanos; + private int fastPathBlocks; + private int slowPathBlocks; + private int skippedBlocks; + + public void addIo(final long nanos) { + ioNanos += nanos; + } + + public void addDecompTs(final long nanos) { + decompTsNanos += nanos; + } + + public void addDecompVal(final long nanos) { + decompValNanos += nanos; + } + + public void addAccum(final long nanos) { + accumNanos += nanos; + } + + public void addFastPathBlock() { + fastPathBlocks++; + } + + public void addSlowPathBlock() { + slowPathBlocks++; + } + + public void addSkippedBlock() { + skippedBlocks++; + } + + public long getIoNanos() { + return ioNanos; + } + + public long getDecompTsNanos() { + return decompTsNanos; + } + + public long getDecompValNanos() { + return decompValNanos; + } + + public long getAccumNanos() { + return accumNanos; + } + + public int getFastPathBlocks() { + return fastPathBlocks; + } + + public int getSlowPathBlocks() { + return slowPathBlocks; + } + + public int getSkippedBlocks() { + return skippedBlocks; + } + + /** + * Merges counters from another instance (used to aggregate across shards). + */ + public synchronized void mergeFrom(final AggregationMetrics other) { + ioNanos += other.ioNanos; + decompTsNanos += other.decompTsNanos; + decompValNanos += other.decompValNanos; + accumNanos += other.accumNanos; + fastPathBlocks += other.fastPathBlocks; + slowPathBlocks += other.slowPathBlocks; + skippedBlocks += other.skippedBlocks; + } + + @Override + public String toString() { + final long totalNanos = ioNanos + decompTsNanos + decompValNanos + accumNanos; + return String.format( + "AggMetrics[io=%dms decompTs=%dms decompVal=%dms accum=%dms total=%dms | blocks: fast=%d slow=%d skipped=%d]", + ioNanos / 1_000_000, decompTsNanos / 1_000_000, decompValNanos / 1_000_000, + accumNanos / 1_000_000, totalNanos / 1_000_000, + fastPathBlocks, slowPathBlocks, skippedBlocks); + } +} diff --git a/engine/src/main/java/com/arcadedb/engine/timeseries/AggregationResult.java b/engine/src/main/java/com/arcadedb/engine/timeseries/AggregationResult.java new file mode 100644 index 0000000000..92ef7cdfa4 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/engine/timeseries/AggregationResult.java @@ -0,0 +1,109 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +/** + * Holds time-bucketed aggregation results. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public final class AggregationResult { + + private final List bucketTimestamps = new ArrayList<>(); + private final List values = new ArrayList<>(); + private final List counts = new ArrayList<>(); + private final Map bucketIndex = new HashMap<>(); + + public void addBucket(final long timestamp, final double value, final long count) { + bucketIndex.put(timestamp, bucketTimestamps.size()); + bucketTimestamps.add(timestamp); + values.add(value); + counts.add(count); + } + + public int size() { + return bucketTimestamps.size(); + } + + public long getBucketTimestamp(final int index) { + return bucketTimestamps.get(index); + } + + public double getValue(final int index) { + return values.get(index); + } + + public long getCount(final int index) { + return counts.get(index); + } + + public void updateValue(final int index, final double value) { + values.set(index, value); + } + + public void updateCount(final int index, final long count) { + counts.set(index, count); + } + + /** + * Finds the index of a bucket by timestamp. Returns -1 if not found. + */ + public int findBucketIndex(final long timestamp) { + final Integer idx = bucketIndex.get(timestamp); + return idx != null ? idx : -1; + } + + /** + * Merges another result into this one. Used for combining partial results from multiple shards. + */ + public void merge(final AggregationResult other, final AggregationType type) { + if (bucketTimestamps.isEmpty()) { + for (int i = 0; i < other.size(); i++) + addBucket(other.getBucketTimestamp(i), other.getValue(i), other.getCount(i)); + return; + } + + for (int i = 0; i < other.size(); i++) { + final long otherTs = other.getBucketTimestamp(i); + final int idx = findBucketIndex(otherTs); + if (idx >= 0) { + final double merged = mergeValue(values.get(idx), counts.get(idx), other.getValue(i), other.getCount(i), type); + values.set(idx, merged); + counts.set(idx, counts.get(idx) + other.getCount(i)); + } else { + addBucket(otherTs, other.getValue(i), other.getCount(i)); + } + } + } + + private static double mergeValue(final double v1, final long c1, final double v2, final long c2, + final AggregationType type) { + return switch (type) { + case SUM, COUNT -> v1 + v2; + case AVG -> (v1 * c1 + v2 * c2) / (c1 + c2); + case MIN -> Math.min(v1, v2); + case MAX -> Math.max(v1, v2); + }; + } +} diff --git a/engine/src/main/java/com/arcadedb/engine/timeseries/AggregationType.java b/engine/src/main/java/com/arcadedb/engine/timeseries/AggregationType.java new file mode 100644 index 0000000000..ceb1b3987b --- /dev/null +++ b/engine/src/main/java/com/arcadedb/engine/timeseries/AggregationType.java @@ -0,0 +1,28 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +/** + * Aggregation types for time-series push-down aggregation. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public enum AggregationType { + SUM, AVG, MIN, MAX, COUNT +} diff --git a/engine/src/main/java/com/arcadedb/engine/timeseries/ColumnDefinition.java b/engine/src/main/java/com/arcadedb/engine/timeseries/ColumnDefinition.java new file mode 100644 index 0000000000..5ca7f82ecd --- /dev/null +++ b/engine/src/main/java/com/arcadedb/engine/timeseries/ColumnDefinition.java @@ -0,0 +1,100 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.engine.timeseries.codec.TimeSeriesCodec; +import com.arcadedb.schema.Type; + +/** + * Defines a column in a TimeSeries type. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public final class ColumnDefinition { + + public enum ColumnRole { + TIMESTAMP, TAG, FIELD + } + + private final String name; + private final Type dataType; + private final ColumnRole role; + private final TimeSeriesCodec compressionHint; + + public ColumnDefinition(final String name, final Type dataType, final ColumnRole role) { + this(name, dataType, role, defaultCodecFor(dataType, role)); + } + + public ColumnDefinition(final String name, final Type dataType, final ColumnRole role, final TimeSeriesCodec compressionHint) { + this.name = name; + this.dataType = dataType; + this.role = role; + this.compressionHint = compressionHint; + } + + public String getName() { + return name; + } + + public Type getDataType() { + return dataType; + } + + public ColumnRole getRole() { + return role; + } + + public TimeSeriesCodec getCompressionHint() { + return compressionHint; + } + + /** + * Returns the fixed byte size for this column's data type in the mutable row format. + * Variable-length types (STRING) return -1; the caller must handle dictionary encoding. + */ + public int getFixedSize() { + return switch (dataType) { + case LONG, DATETIME -> 8; + case DOUBLE -> 8; + case INTEGER -> 4; + case FLOAT -> 4; + case SHORT -> 2; + case BYTE -> 1; + case BOOLEAN -> 1; + default -> -1; // Variable length (STRING etc.) + }; + } + + @Override + public String toString() { + return name + " " + dataType + " (" + role + ")"; + } + + private static TimeSeriesCodec defaultCodecFor(final Type dataType, final ColumnRole role) { + if (role == ColumnRole.TIMESTAMP) + return TimeSeriesCodec.DELTA_OF_DELTA; + if (role == ColumnRole.TAG) + return TimeSeriesCodec.DICTIONARY; + return switch (dataType) { + case DOUBLE, FLOAT -> TimeSeriesCodec.GORILLA_XOR; + case LONG, INTEGER, SHORT, BYTE -> TimeSeriesCodec.SIMPLE8B; + default -> TimeSeriesCodec.DICTIONARY; + }; + } +} diff --git a/engine/src/main/java/com/arcadedb/engine/timeseries/DownsamplingTier.java b/engine/src/main/java/com/arcadedb/engine/timeseries/DownsamplingTier.java new file mode 100644 index 0000000000..4bae9e5a9a --- /dev/null +++ b/engine/src/main/java/com/arcadedb/engine/timeseries/DownsamplingTier.java @@ -0,0 +1,36 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +/** + * Defines a downsampling tier: data older than {@code afterMs} gets downsampled + * to {@code granularityMs} resolution (averaging numeric fields per time bucket). + * + * @param afterMs age threshold in milliseconds (must be > 0) + * @param granularityMs target resolution in milliseconds (must be > 0) + */ +public record DownsamplingTier(long afterMs, long granularityMs) { + + public DownsamplingTier { + if (afterMs <= 0) + throw new IllegalArgumentException("afterMs must be > 0, got " + afterMs); + if (granularityMs <= 0) + throw new IllegalArgumentException("granularityMs must be > 0, got " + granularityMs); + } +} diff --git a/engine/src/main/java/com/arcadedb/engine/timeseries/LineProtocolParser.java b/engine/src/main/java/com/arcadedb/engine/timeseries/LineProtocolParser.java new file mode 100644 index 0000000000..813a645edf --- /dev/null +++ b/engine/src/main/java/com/arcadedb/engine/timeseries/LineProtocolParser.java @@ -0,0 +1,358 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.log.LogManager; + +import java.util.ArrayList; +import java.util.LinkedHashMap; +import java.util.List; +import java.util.Map; +import java.util.logging.Level; + +/** + * Parser for InfluxDB Line Protocol. + * Format: {@code [,=...] =[,=...] []} + *

+ * Type suffixes: no suffix = double, {@code i} = long, quoted = string, true/false = boolean. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public class LineProtocolParser { + + public enum Precision { + NANOSECONDS(1_000_000L, 1L), + MICROSECONDS(1_000L, 1L), + MILLISECONDS(1L, 1L), + SECONDS(1L, 1_000L); + + private final long divisor; + private final long multiplier; + + Precision(final long divisor, final long multiplier) { + this.divisor = divisor; + this.multiplier = multiplier; + } + + public long toMillis(final long value) { + return (value / divisor) * multiplier; + } + + public static Precision fromString(final String s) { + if (s == null || s.isEmpty()) + return NANOSECONDS; + return switch (s.toLowerCase()) { + case "ns" -> NANOSECONDS; + case "us", "u" -> MICROSECONDS; + case "ms" -> MILLISECONDS; + case "s" -> SECONDS; + default -> { + LogManager.instance().log(Precision.class, Level.WARNING, + "Unrecognized precision '%s'; defaulting to nanoseconds", null, s); + yield NANOSECONDS; + } + }; + } + } + + public static class Sample { + private final String measurement; + private final Map tags; + private final Map fields; + private final long timestampMs; + + public Sample(final String measurement, final Map tags, final Map fields, + final long timestampMs) { + this.measurement = measurement; + this.tags = tags; + this.fields = fields; + this.timestampMs = timestampMs; + } + + public String getMeasurement() { + return measurement; + } + + public Map getTags() { + return tags; + } + + public Map getFields() { + return fields; + } + + public long getTimestampMs() { + return timestampMs; + } + } + + private record ParsedString(String value, int length) {} + + private record ParsedValue(Object value, int length) {} + + /** + * Parses one or more lines of InfluxDB Line Protocol. + */ + public static List parse(final String text, final Precision precision) { + final List samples = new ArrayList<>(); + if (text == null || text.isEmpty()) + return samples; + + // Use \R (any line terminator) to handle Unix (\n), Windows (\r\n), and classic Mac (\r) + final String[] lines = text.split("\\R"); + for (final String rawLine : lines) { + final String line = rawLine.trim(); + if (line.isEmpty() || line.startsWith("#")) + continue; + + final Sample sample = parseLine(line, precision); + if (sample != null) + samples.add(sample); + else + LogManager.instance().log(LineProtocolParser.class, Level.WARNING, + "Skipping malformed line protocol line: '%s'", null, + sanitizeForLog(line.length() > 120 ? line.substring(0, 120) + "..." : line)); + } + return samples; + } + + /** + * Parses a single line of InfluxDB Line Protocol. + * Returns {@code null} if the line is malformed (missing measurement, no fields, or unparseable numbers). + */ + static Sample parseLine(final String line, final Precision precision) { + // Split into: measurement+tags, fields, [timestamp] + // Space separates measurement+tags from fields, and fields from timestamp + // But commas and equals within the measurement+tags section are significant + + try { + int pos = 0; + final int len = line.length(); + + // Parse measurement name (up to first unescaped comma or space) + final StringBuilder measurement = new StringBuilder(); + while (pos < len) { + final char c = line.charAt(pos); + if (c == '\\' && pos + 1 < len) { + measurement.append(line.charAt(pos + 1)); + pos += 2; + continue; + } + if (c == ',' || c == ' ') + break; + measurement.append(c); + pos++; + } + + if (measurement.isEmpty()) + return null; + + // Parse tags (comma-separated key=value pairs) + final Map tags = new LinkedHashMap<>(); + if (pos < len && line.charAt(pos) == ',') { + pos++; // skip comma + while (pos < len && line.charAt(pos) != ' ') { + final ParsedString keyResult = readKeyWithLength(line, pos, '='); + pos += keyResult.length() + 1; // +1 for '=' + final ParsedString valResult = readTagValueWithLength(line, pos); + pos += valResult.length(); + // InfluxDB spec mandates non-empty tag keys; skip silently to avoid polluting the schema + if (!keyResult.value().isEmpty()) + tags.put(keyResult.value(), valResult.value()); + if (pos < len && line.charAt(pos) == ',') + pos++; // skip comma separator + } + } + + // Skip space before fields + if (pos < len && line.charAt(pos) == ' ') + pos++; + + // Parse fields (comma-separated key=value pairs) + final Map fields = new LinkedHashMap<>(); + while (pos < len && line.charAt(pos) != ' ') { + final ParsedString keyResult = readKeyWithLength(line, pos, '='); + pos += keyResult.length() + 1; // +1 for '=' + final ParsedValue valueAndLen = readFieldValue(line, pos); + fields.put(keyResult.value(), valueAndLen.value()); + pos += valueAndLen.length(); + if (pos < len && line.charAt(pos) == ',') + pos++; // skip comma separator + } + + if (fields.isEmpty()) + return null; + + // Parse optional timestamp + long timestampMs; + if (pos < len && line.charAt(pos) == ' ') { + pos++; // skip space + final String tsStr = line.substring(pos).trim(); + if (!tsStr.isEmpty()) { + final long rawTs = Long.parseLong(tsStr); + timestampMs = precision.toMillis(rawTs); + } else { + timestampMs = System.currentTimeMillis(); + } + } else { + timestampMs = System.currentTimeMillis(); + } + + return new Sample(measurement.toString(), tags, fields, timestampMs); + } catch (final IllegalArgumentException e) { + // Malformed numeric value or timestamp (including unsigned integer overflow) — + // skip this line rather than halting batch parse + return null; + } + } + + /** + * Reads a key (tag key or field key) terminated by {@code stopChar}, handling backslash escapes. + * Returns the decoded string and the raw byte length consumed (not including the stop character). + */ + private static ParsedString readKeyWithLength(final String line, final int start, final char stopChar) { + final StringBuilder sb = new StringBuilder(); + int pos = start; + while (pos < line.length()) { + final char c = line.charAt(pos); + if (c == '\\' && pos + 1 < line.length()) { + sb.append(line.charAt(pos + 1)); + pos += 2; + continue; + } + if (c == stopChar) + break; + sb.append(c); + pos++; + } + return new ParsedString(sb.toString(), pos - start); + } + + /** + * Reads a tag value terminated by ',' or ' ', handling backslash escapes. + * Returns the decoded string and the raw byte length consumed. + */ + private static ParsedString readTagValueWithLength(final String line, final int start) { + final StringBuilder sb = new StringBuilder(); + int pos = start; + while (pos < line.length()) { + final char c = line.charAt(pos); + if (c == '\\' && pos + 1 < line.length()) { + sb.append(line.charAt(pos + 1)); + pos += 2; + continue; + } + if (c == ',' || c == ' ') + break; + sb.append(c); + pos++; + } + return new ParsedString(sb.toString(), pos - start); + } + + /** + * Strips control characters and newlines from user-controlled input before logging + * to prevent log injection attacks. + */ + private static String sanitizeForLog(final String s) { + if (s == null) + return null; + final StringBuilder sb = new StringBuilder(s.length()); + for (int i = 0; i < s.length(); i++) { + final char c = s.charAt(i); + if (c >= 0x20 && c != 0x7F) + sb.append(c); + else + sb.append('?'); + } + return sb.toString(); + } + + /** + * Reads a field value and returns the parsed value and the raw byte length consumed. + */ + private static ParsedValue readFieldValue(final String line, final int start) { + if (start >= line.length()) + return new ParsedValue(0.0, 0); + + final char first = line.charAt(start); + + // Quoted string — enforce MAX_STRING_BYTES to prevent multi-megabyte allocations + if (first == '"') { + final StringBuilder sb = new StringBuilder(); + int pos = start + 1; + boolean closed = false; + while (pos < line.length()) { + final char c = line.charAt(pos); + if (c == '\\' && pos + 1 < line.length()) { + if (sb.length() >= TimeSeriesBucket.MAX_STRING_BYTES) + throw new IllegalArgumentException( + "Quoted field value exceeds maximum length of " + TimeSeriesBucket.MAX_STRING_BYTES + " bytes"); + sb.append(line.charAt(pos + 1)); + pos += 2; + continue; + } + if (c == '"') { + pos++; + closed = true; + break; + } + if (sb.length() >= TimeSeriesBucket.MAX_STRING_BYTES) + throw new IllegalArgumentException( + "Quoted field value exceeds maximum length of " + TimeSeriesBucket.MAX_STRING_BYTES + " bytes"); + sb.append(c); + pos++; + } + if (!closed) + throw new IllegalArgumentException("Unterminated quoted string in field value at position " + start); + return new ParsedValue(sb.toString(), pos - start); + } + + // Read until comma or space + int pos = start; + while (pos < line.length() && line.charAt(pos) != ',' && line.charAt(pos) != ' ') + pos++; + + final String raw = line.substring(start, pos); + final int rawLen = pos - start; + + // Boolean + if ("true".equalsIgnoreCase(raw) || "t".equalsIgnoreCase(raw)) + return new ParsedValue(true, rawLen); + if ("false".equalsIgnoreCase(raw) || "f".equalsIgnoreCase(raw)) + return new ParsedValue(false, rawLen); + + // Integer (suffix 'i') + if (raw.endsWith("i")) { + final long intVal = Long.parseLong(raw.substring(0, raw.length() - 1)); + return new ParsedValue(intVal, rawLen); + } + + // Unsigned integer (suffix 'u'): values in [0, 2^64-1] are stored as the bit-pattern + // in a signed long (values >= 2^63 appear negative but are valid uint64 encodings) + if (raw.endsWith("u")) { + final long uintVal = Long.parseUnsignedLong(raw.substring(0, raw.length() - 1)); + return new ParsedValue(uintVal, rawLen); + } + + // Default: double + return new ParsedValue(Double.parseDouble(raw), rawLen); + } + +} diff --git a/engine/src/main/java/com/arcadedb/engine/timeseries/MultiColumnAggregationRequest.java b/engine/src/main/java/com/arcadedb/engine/timeseries/MultiColumnAggregationRequest.java new file mode 100644 index 0000000000..962a95b078 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/engine/timeseries/MultiColumnAggregationRequest.java @@ -0,0 +1,29 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +/** + * Describes a single aggregation request within a multi-column push-down aggregation. + * + * @param columnIndex index into the row array (0 = timestamp, 1+ = value columns) + * @param type the aggregation type (AVG, MAX, MIN, SUM, COUNT) + * @param alias the output alias for this aggregation + */ +public record MultiColumnAggregationRequest(int columnIndex, AggregationType type, String alias) { +} diff --git a/engine/src/main/java/com/arcadedb/engine/timeseries/MultiColumnAggregationResult.java b/engine/src/main/java/com/arcadedb/engine/timeseries/MultiColumnAggregationResult.java new file mode 100644 index 0000000000..e282f7b886 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/engine/timeseries/MultiColumnAggregationResult.java @@ -0,0 +1,499 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +/** + * Holds multi-column aggregation results bucketed by timestamp. + * Supports two modes: + *

    + *
  • Flat mode: pre-allocated arrays indexed by {@code (bucketTs - firstBucketTs) / bucketIntervalMs}. + * Zero HashMap overhead per sample. Used when bucket interval and data range are known.
  • + *
  • Map mode: HashMap-based fallback for unknown ranges or zero-interval queries.
  • + *
+ * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public final class MultiColumnAggregationResult { + + /** Maximum number of buckets allowed in flat mode before falling back to map mode. */ + static final int MAX_FLAT_BUCKETS = 10_000_000; + + private final int requestCount; + private final AggregationType[] types; + + // --- Map mode (fallback) --- + private final Map valuesByBucket; + private final Map countsByBucket; + private final List orderedBuckets; + + // --- Flat mode --- + private final boolean flatMode; + private final long firstBucketTs; + private final long bucketIntervalMs; + private final int maxBuckets; + private double[][] flatValues; // [bucketIdx][requestIdx] + private long[][] flatCounts; // [bucketIdx][requestIdx] + private boolean[] bucketUsed; // whether this bucket has been touched + private List cachedBucketTimestamps; // cached result for flat mode + + /** + * Map-mode constructor (original behavior). + */ + public MultiColumnAggregationResult(final List requests) { + this.requestCount = requests.size(); + this.types = new AggregationType[requestCount]; + for (int i = 0; i < requestCount; i++) + types[i] = requests.get(i).type(); + this.valuesByBucket = new HashMap<>(); + this.countsByBucket = new HashMap<>(); + this.orderedBuckets = new ArrayList<>(); + this.flatMode = false; + this.firstBucketTs = 0; + this.bucketIntervalMs = 0; + this.maxBuckets = 0; + } + + /** + * Flat-mode constructor. Pre-allocates arrays for direct-index access. + * If {@code maxBuckets} exceeds {@link #MAX_FLAT_BUCKETS}, falls back to map mode + * to avoid excessive memory allocation. + * + * @param requests aggregation request definitions + * @param firstBucketTs timestamp of the first bucket (aligned to interval) + * @param bucketIntervalMs bucket width in ms (must be > 0) + * @param maxBuckets number of buckets to pre-allocate + */ + public MultiColumnAggregationResult(final List requests, + final long firstBucketTs, final long bucketIntervalMs, final int maxBuckets) { + this.requestCount = requests.size(); + this.types = new AggregationType[requestCount]; + for (int i = 0; i < requestCount; i++) + types[i] = requests.get(i).type(); + + if (maxBuckets > MAX_FLAT_BUCKETS) { + // Fall back to map mode to avoid OOM + this.flatMode = false; + this.firstBucketTs = 0; + this.bucketIntervalMs = 0; + this.maxBuckets = 0; + this.valuesByBucket = new HashMap<>(); + this.countsByBucket = new HashMap<>(); + this.orderedBuckets = new ArrayList<>(); + } else { + this.flatMode = true; + this.firstBucketTs = firstBucketTs; + this.bucketIntervalMs = bucketIntervalMs; + this.maxBuckets = maxBuckets; + this.flatValues = new double[maxBuckets][]; + this.flatCounts = new long[maxBuckets][]; + this.bucketUsed = new boolean[maxBuckets]; + this.valuesByBucket = null; + this.countsByBucket = null; + this.orderedBuckets = null; + } + } + + /** + * Returns whether this result uses flat array mode. + */ + public boolean isFlatMode() { + return flatMode; + } + + // ---- Accumulation methods ---- + + /** + * Accumulates a value for request at the given index into the given bucket. + */ + public void accumulate(final long bucketTs, final int requestIndex, final double value) { + if (flatMode) { + final int idx = flatIndex(bucketTs); + if (!ensureFlatBucket(idx)) + return; + accumulateInPlace(flatValues[idx], flatCounts[idx], requestIndex, value); + } else { + double[] vals = valuesByBucket.get(bucketTs); + if (vals == null) { + vals = newInitializedValues(); + final long[] counts = new long[requestCount]; + valuesByBucket.put(bucketTs, vals); + countsByBucket.put(bucketTs, counts); + orderedBuckets.add(bucketTs); + } + accumulateInPlace(vals, countsByBucket.get(bucketTs), requestIndex, value); + } + } + + /** + * Batch accumulate for all requests in a single row. + */ + public void accumulateRow(final long bucketTs, final double[] values) { + if (flatMode) { + final int idx = flatIndex(bucketTs); + if (!ensureFlatBucket(idx)) + return; + final double[] vals = flatValues[idx]; + final long[] counts = flatCounts[idx]; + for (int i = 0; i < requestCount; i++) + accumulateInPlace(vals, counts, i, values[i]); + } else { + double[] vals = valuesByBucket.get(bucketTs); + long[] counts; + if (vals == null) { + vals = newInitializedValues(); + counts = new long[requestCount]; + valuesByBucket.put(bucketTs, vals); + countsByBucket.put(bucketTs, counts); + orderedBuckets.add(bucketTs); + } else { + counts = countsByBucket.get(bucketTs); + } + for (int i = 0; i < requestCount; i++) + accumulateInPlace(vals, counts, i, values[i]); + } + } + + /** + * Accumulates block-level statistics for all requests in a single call. + */ + public void accumulateBlockStats(final long bucketTs, final double[] values, final int sampleCount) { + if (flatMode) { + final int idx = flatIndex(bucketTs); + if (!ensureFlatBucket(idx)) + return; + accumulateBlockStatsInPlace(flatValues[idx], flatCounts[idx], values, sampleCount); + } else { + double[] vals = valuesByBucket.get(bucketTs); + long[] counts; + if (vals == null) { + vals = newInitializedValues(); + counts = new long[requestCount]; + valuesByBucket.put(bucketTs, vals); + countsByBucket.put(bucketTs, counts); + orderedBuckets.add(bucketTs); + } else { + counts = countsByBucket.get(bucketTs); + } + accumulateBlockStatsInPlace(vals, counts, values, sampleCount); + } + } + + /** + * Accumulates a single statistic result for one request at the given bucket. + * Used by vectorized (SIMD) segment accumulation where each aggregation type + * is computed separately per segment. + * + * @param bucketTs aligned bucket timestamp + * @param requestIndex which aggregation request this applies to + * @param value the aggregated value (sum, min, max, or count) + * @param count number of samples that produced this value + */ + public void accumulateSingleStat(final long bucketTs, final int requestIndex, + final double value, final long count) { + if (flatMode) { + final int idx = flatIndex(bucketTs); + if (!ensureFlatBucket(idx)) + return; + accumulateStatInPlace(flatValues[idx], flatCounts[idx], requestIndex, value, count); + } else { + double[] vals = valuesByBucket.get(bucketTs); + long[] counts; + if (vals == null) { + vals = newInitializedValues(); + counts = new long[requestCount]; + valuesByBucket.put(bucketTs, vals); + countsByBucket.put(bucketTs, counts); + orderedBuckets.add(bucketTs); + } else { + counts = countsByBucket.get(bucketTs); + } + accumulateStatInPlace(vals, counts, requestIndex, value, count); + } + } + + // ---- Finalize & query ---- + + /** + * Finalizes AVG accumulators by dividing accumulated sums by their counts. + */ + public void finalizeAvg() { + if (flatMode) { + for (int i = 0; i < requestCount; i++) { + if (types[i] == AggregationType.AVG) { + for (int b = 0; b < maxBuckets; b++) { + if (bucketUsed[b] && flatCounts[b][i] > 0) + flatValues[b][i] = flatValues[b][i] / flatCounts[b][i]; + } + } + } + } else { + for (int i = 0; i < requestCount; i++) { + if (types[i] == AggregationType.AVG) { + for (final Map.Entry entry : valuesByBucket.entrySet()) { + final long[] counts = countsByBucket.get(entry.getKey()); + if (counts[i] > 0) + entry.getValue()[i] = entry.getValue()[i] / counts[i]; + } + } + } + } + } + + /** + * Returns bucket timestamps in order. + */ + public List getBucketTimestamps() { + if (flatMode) { + if (cachedBucketTimestamps == null) { + final List result = new ArrayList<>(); + for (int b = 0; b < maxBuckets; b++) + if (bucketUsed[b]) + result.add(firstBucketTs + (long) b * bucketIntervalMs); + cachedBucketTimestamps = result; + } + return cachedBucketTimestamps; + } + return orderedBuckets; + } + + public double getValue(final long bucketTs, final int requestIndex) { + if (flatMode) { + final int idx = flatIndex(bucketTs); + if (idx >= 0 && idx < maxBuckets && bucketUsed[idx]) + return flatValues[idx][requestIndex]; + return 0.0; + } + final double[] vals = valuesByBucket.get(bucketTs); + return vals != null ? vals[requestIndex] : 0.0; + } + + public long getCount(final long bucketTs, final int requestIndex) { + if (flatMode) { + final int idx = flatIndex(bucketTs); + if (idx >= 0 && idx < maxBuckets && bucketUsed[idx]) + return flatCounts[idx][requestIndex]; + return 0; + } + final long[] counts = countsByBucket.get(bucketTs); + return counts != null ? counts[requestIndex] : 0; + } + + public int size() { + if (flatMode) { + int count = 0; + for (int b = 0; b < maxBuckets; b++) + if (bucketUsed[b]) + count++; + return count; + } + return valuesByBucket.size(); + } + + /** + * Merges another result into this one. Both must use flat mode with + * the same firstBucketTs, bucketIntervalMs, and maxBuckets. + */ + public void mergeFrom(final MultiColumnAggregationResult other) { + if (flatMode && other.flatMode) { + if (firstBucketTs != other.firstBucketTs || bucketIntervalMs != other.bucketIntervalMs || maxBuckets != other.maxBuckets) + throw new IllegalArgumentException( + "Cannot merge incompatible flat-mode results: firstBucketTs=" + firstBucketTs + "/" + other.firstBucketTs + + " bucketIntervalMs=" + bucketIntervalMs + "/" + other.bucketIntervalMs + + " maxBuckets=" + maxBuckets + "/" + other.maxBuckets); + for (int b = 0; b < other.maxBuckets; b++) { + if (!other.bucketUsed[b]) + continue; + ensureFlatBucket(b); + final double[] oVals = other.flatValues[b]; + final long[] oCounts = other.flatCounts[b]; + final double[] tVals = flatValues[b]; + final long[] tCounts = flatCounts[b]; + for (int i = 0; i < requestCount; i++) { + switch (types[i]) { + case MIN: + if (oVals[i] < tVals[i]) + tVals[i] = oVals[i]; + break; + case MAX: + if (oVals[i] > tVals[i]) + tVals[i] = oVals[i]; + break; + case SUM: + case AVG: + case COUNT: + tVals[i] += oVals[i]; + break; + } + tCounts[i] += oCounts[i]; + } + } + } else { + // Fallback: merge map-mode results + for (final long bucketTs : other.getBucketTimestamps()) { + for (int i = 0; i < requestCount; i++) { + final double oVal = other.getValue(bucketTs, i); + final long oCount = other.getCount(bucketTs, i); + accumulateStatInPlaceByTs(bucketTs, i, oVal, oCount); + } + } + } + } + + // ---- Internal helpers ---- + + int getRequestCount() { + return requestCount; + } + + AggregationType[] getTypes() { + return types; + } + + private int flatIndex(final long bucketTs) { + final long idx = (bucketTs - firstBucketTs) / bucketIntervalMs; + if (idx < 0 || idx >= maxBuckets) + return -1; + return (int) idx; // safe: idx < maxBuckets which is an int + } + + private boolean ensureFlatBucket(final int idx) { + if (idx < 0 || idx >= maxBuckets) + return false; + if (!bucketUsed[idx]) { + bucketUsed[idx] = true; + flatValues[idx] = newInitializedValues(); + flatCounts[idx] = new long[requestCount]; + cachedBucketTimestamps = null; // invalidate cache + } + return true; + } + + private double[] newInitializedValues() { + final double[] vals = new double[requestCount]; + for (int i = 0; i < requestCount; i++) { + switch (types[i]) { + case MIN: + vals[i] = Double.MAX_VALUE; + break; + case MAX: + vals[i] = -Double.MAX_VALUE; + break; + default: + vals[i] = 0.0; + break; + } + } + return vals; + } + + private void accumulateInPlace(final double[] vals, final long[] counts, final int idx, final double value) { + switch (types[idx]) { + case SUM: + case AVG: + vals[idx] += value; + break; + case COUNT: + vals[idx] += 1; + break; + case MIN: + if (value < vals[idx]) + vals[idx] = value; + break; + case MAX: + if (value > vals[idx]) + vals[idx] = value; + break; + } + counts[idx]++; + } + + private void accumulateBlockStatsInPlace(final double[] vals, final long[] counts, + final double[] values, final int sampleCount) { + for (int i = 0; i < requestCount; i++) { + switch (types[i]) { + case MIN: + if (values[i] < vals[i]) + vals[i] = values[i]; + break; + case MAX: + if (values[i] > vals[i]) + vals[i] = values[i]; + break; + case SUM: + case AVG: + vals[i] += values[i]; + break; + case COUNT: + vals[i] += values[i]; + break; + } + counts[i] += sampleCount; + } + } + + private void accumulateStatInPlace(final double[] vals, final long[] counts, + final int requestIndex, final double value, final long count) { + switch (types[requestIndex]) { + case MIN: + if (value < vals[requestIndex]) + vals[requestIndex] = value; + break; + case MAX: + if (value > vals[requestIndex]) + vals[requestIndex] = value; + break; + case SUM: + case AVG: + vals[requestIndex] += value; + break; + case COUNT: + vals[requestIndex] += value; + break; + } + counts[requestIndex] += count; + } + + private void accumulateStatInPlaceByTs(final long bucketTs, final int requestIndex, + final double value, final long count) { + if (flatMode) { + final int idx = flatIndex(bucketTs); + if (!ensureFlatBucket(idx)) + return; + accumulateStatInPlace(flatValues[idx], flatCounts[idx], requestIndex, value, count); + } else { + double[] vals = valuesByBucket.get(bucketTs); + long[] counts; + if (vals == null) { + vals = newInitializedValues(); + counts = new long[requestCount]; + valuesByBucket.put(bucketTs, vals); + countsByBucket.put(bucketTs, counts); + orderedBuckets.add(bucketTs); + } else { + counts = countsByBucket.get(bucketTs); + } + accumulateStatInPlace(vals, counts, requestIndex, value, count); + } + } +} diff --git a/engine/src/main/java/com/arcadedb/engine/timeseries/TagFilter.java b/engine/src/main/java/com/arcadedb/engine/timeseries/TagFilter.java new file mode 100644 index 0000000000..4705169f63 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/engine/timeseries/TagFilter.java @@ -0,0 +1,166 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.utility.CollectionUtils; + +import java.util.ArrayList; +import java.util.List; +import java.util.Set; + +/** + * Predicate for tag column filtering. Supports multiple tag conditions ANDed together. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public final class TagFilter { + + private final List conditions; + + private TagFilter(final List conditions) { + this.conditions = conditions; + } + + /** + * Creates a filter matching a single tag equality. + * + * @param nonTsColumnIndex zero-based column index excluding the timestamp column. + * In {@link #matches(Object[])}, this is offset by +1 to account + * for the timestamp at row[0]. + * @param value the value to match against + */ + public static TagFilter eq(final int nonTsColumnIndex, final Object value) { + final List conditions = new ArrayList<>(1); + conditions.add(new Condition(nonTsColumnIndex, CollectionUtils.singletonSet(value))); + return new TagFilter(conditions); + } + + /** + * Creates a filter matching a single tag against a set of values (IN). + * + * @param nonTsColumnIndex zero-based column index excluding the timestamp column + * @param values the set of values to match against + */ + public static TagFilter in(final int nonTsColumnIndex, final Set values) { + final List conditions = new ArrayList<>(1); + conditions.add(new Condition(nonTsColumnIndex, values)); + return new TagFilter(conditions); + } + + /** + * Returns a new TagFilter that ANDs this filter with an additional tag equality condition. + * + * @param nonTsColumnIndex zero-based column index excluding the timestamp column + * @param value the value to match against + */ + public TagFilter and(final int nonTsColumnIndex, final Object value) { + final List newConditions = new ArrayList<>(conditions.size() + 1); + newConditions.addAll(conditions); + newConditions.add(new Condition(nonTsColumnIndex, CollectionUtils.singletonSet(value))); + return new TagFilter(newConditions); + } + + /** + * Returns a new TagFilter that ANDs this filter with an additional IN condition. + * + * @param nonTsColumnIndex zero-based column index excluding the timestamp column + * @param values the set of values to match against + */ + public TagFilter andIn(final int nonTsColumnIndex, final Set values) { + final List newConditions = new ArrayList<>(conditions.size() + 1); + newConditions.addAll(conditions); + newConditions.add(new Condition(nonTsColumnIndex, values)); + return new TagFilter(newConditions); + } + + /** + * Returns the column index of the first condition (for backward compatibility). + */ + public int getColumnIndex() { + return conditions.isEmpty() ? -1 : conditions.getFirst().columnIndex; + } + + /** + * Returns the number of conditions in this filter. + */ + public int getConditionCount() { + return conditions.size(); + } + + /** + * Tests if a sample row matches all conditions in this filter. + * Assumes the row was built from all non-timestamp columns in schema order: + * {@code row[0] = timestamp, row[1] = non-ts col 0, row[2] = non-ts col 1, ...} + * + * @param row the sample row (index 0 = timestamp, index 1+ = columns in full schema order) + */ + public boolean matches(final Object[] row) { + for (final Condition cond : conditions) { + if (cond.columnIndex + 1 >= row.length) + return false; + if (!cond.values.contains(row[cond.columnIndex + 1])) + return false; + } + return true; + } + + /** + * Tests if a sample row matches all conditions in this filter, resolving column positions + * through the supplied {@code columnIndices} mapping. + *

+ * Use this overload when the row was built from a subset of columns (i.e. + * {@code columnIndices != null} was passed to {@code scanRange} / {@code iterateRange}). + * In that case {@code row[i+1]} holds the column whose non-timestamp schema index equals + * {@code columnIndices[i]}, so a direct {@code cond.columnIndex+1} offset would be wrong. + *

+ * Falls back to {@link #matches(Object[])} when {@code columnIndices} is {@code null} + * (all columns present in schema order). + * + * @param row the sample row (index 0 = timestamp, index 1+ = selected columns) + * @param columnIndices the non-timestamp schema indices that were used to build the row, + * in ascending order; {@code null} means all columns in schema order + */ + public boolean matchesMapped(final Object[] row, final int[] columnIndices) { + if (columnIndices == null) + return matches(row); + for (final Condition cond : conditions) { + int outPos = -1; + for (int i = 0; i < columnIndices.length; i++) { + if (columnIndices[i] == cond.columnIndex) { + outPos = i; + break; + } + } + if (outPos < 0) + return false; // tag column was not included in the requested subset + if (outPos + 1 >= row.length) + return false; + if (!cond.values.contains(row[outPos + 1])) + return false; + } + return true; + } + + record Condition(int columnIndex, Set values) { + } + + List getConditions() { + return conditions; + } +} diff --git a/engine/src/main/java/com/arcadedb/engine/timeseries/TimeSeriesBucket.java b/engine/src/main/java/com/arcadedb/engine/timeseries/TimeSeriesBucket.java new file mode 100644 index 0000000000..740c9dc46c --- /dev/null +++ b/engine/src/main/java/com/arcadedb/engine/timeseries/TimeSeriesBucket.java @@ -0,0 +1,729 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.database.DatabaseInternal; +import com.arcadedb.database.TransactionContext; +import com.arcadedb.engine.BasePage; +import com.arcadedb.engine.ComponentFactory; +import com.arcadedb.engine.ComponentFile; +import com.arcadedb.engine.MutablePage; +import com.arcadedb.engine.PageId; +import com.arcadedb.engine.PaginatedComponent; +import com.arcadedb.schema.Type; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.List; +import java.util.NoSuchElementException; + +/** + * Mutable TimeSeries bucket backed by paginated storage. + * Stores samples in row-oriented format within pages for ACID compliance. + *

+ * Header page (page 0) layout (offsets from PAGE_HEADER_SIZE) — 44 bytes: + * - [0..3] magic "TSBC" (4 bytes) + * - [4] formatVersion (1 byte) + * - [5..6] column count (short) + * - [7..14] total sample count (long) + * - [15..22] min timestamp (long) + * - [23..30] max timestamp (long) + * - [31] compaction in progress flag (byte) + * - [32..39] compaction watermark (long) — sealed store offset + * - [40..43] active data page count (int) + *

+ * Data pages layout (offsets from PAGE_HEADER_SIZE): + * - [0..1] sample count in page (short, read as unsigned with & 0xFFFF) + * - [2..9] min timestamp in page (long) + * - [10..17] max timestamp in page (long) + * - [18..] row data: fixed-size rows [timestamp(8)|col1|col2|...] + * For STRING columns: 2-byte length prefix + up to MAX_STRING_BYTES payload + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public class TimeSeriesBucket extends PaginatedComponent { + + public static final String BUCKET_EXT = "tstb"; + public static final int MAX_STRING_BYTES = 256; + public static final int CURRENT_VERSION = 0; + private static final int MAGIC_VALUE = 0x54534243; // "TSBC" + + // Header page offsets (from PAGE_HEADER_SIZE) + private static final int HEADER_MAGIC_OFFSET = 0; + private static final int HEADER_FORMAT_VERSION_OFFSET = 4; + private static final int HEADER_COLUMN_COUNT_OFFSET = 5; + private static final int HEADER_SAMPLE_COUNT_OFFSET = 7; + private static final int HEADER_MIN_TS_OFFSET = 15; + private static final int HEADER_MAX_TS_OFFSET = 23; + private static final int HEADER_COMPACTION_FLAG = 31; + private static final int HEADER_COMPACTION_WATERMARK = 32; + private static final int HEADER_DATA_PAGE_COUNT = 40; + private static final int HEADER_SIZE = 44; + + // Data page offsets (from PAGE_HEADER_SIZE) + // Sample count stored as short (2 bytes), read with & 0xFFFF to treat as unsigned (0..65535). + // A page can never hold more than (pageSize - overhead) / rowSize samples, which is well under 65535. + private static final int DATA_SAMPLE_COUNT_OFFSET = 0; + private static final int DATA_MIN_TS_OFFSET = 2; + private static final int DATA_MAX_TS_OFFSET = 10; + private static final int DATA_ROWS_OFFSET = 18; + + private List columns; + private int rowSize; // fixed row size in bytes + + /** + * Factory handler for loading existing .tstb files during schema load. + * Columns are set later via {@link #setColumns(List)} when the TimeSeries type is initialized. + */ + public static class PaginatedComponentFactoryHandler implements ComponentFactory.PaginatedComponentFactoryHandler { + @Override + public PaginatedComponent createOnLoad(final DatabaseInternal database, final String name, final String filePath, + final int id, final ComponentFile.MODE mode, final int pageSize, final int version) throws IOException { + return new TimeSeriesBucket(database, name, filePath, id, new ArrayList<>()); + } + } + + /** + * Creates a new TimeSeries bucket. + */ + public TimeSeriesBucket(final DatabaseInternal database, final String name, final String filePath, + final List columns) throws IOException { + super(database, name, filePath, BUCKET_EXT, ComponentFile.MODE.READ_WRITE, + database.getConfiguration().getValueAsInteger(com.arcadedb.GlobalConfiguration.BUCKET_DEFAULT_PAGE_SIZE), CURRENT_VERSION); + this.columns = columns; + this.rowSize = calculateRowSize(columns); + // Note: initHeaderPage() is NOT called here. + // TimeSeriesShard calls it in a self-contained nested transaction after registering the + // bucket with the schema, so the nested TX commit can resolve the file by its ID. + } + + /** + * Opens an existing TimeSeries bucket. + */ + public TimeSeriesBucket(final DatabaseInternal database, final String name, final String filePath, final int id, + final List columns) throws IOException { + super(database, name, filePath, id, ComponentFile.MODE.READ_WRITE, + database.getConfiguration().getValueAsInteger(com.arcadedb.GlobalConfiguration.BUCKET_DEFAULT_PAGE_SIZE), CURRENT_VERSION); + this.columns = columns; + this.rowSize = calculateRowSize(columns); + } + + /** + * Sets column definitions (called during cold open after the factory handler creates a stub bucket). + */ + public void setColumns(final List columns) { + this.columns = columns; + this.rowSize = calculateRowSize(columns); + } + + /** + * Appends samples to the mutable bucket within the current transaction. + * + * @param timestamps array of timestamps (millisecond epoch) + * @param columnValues array of column value arrays, one per non-timestamp column + */ + public void appendSamples(final long[] timestamps, final Object[]... columnValues) throws IOException { + final TransactionContext tx = database.getTransaction(); + + for (int i = 0; i < timestamps.length; i++) { + final MutablePage dataPage = getOrCreateActiveDataPage(tx); + + final int sampleCountInPage = dataPage.readShort(DATA_SAMPLE_COUNT_OFFSET) & 0xFFFF; + final int rowOffset = DATA_ROWS_OFFSET + sampleCountInPage * rowSize; + + // Write timestamp + dataPage.writeLong(rowOffset, timestamps[i]); + + // Write each non-timestamp column value + int colOffset = rowOffset + 8; + int colIdx = 0; + for (int c = 0; c < columns.size(); c++) { + if (columns.get(c).getRole() == ColumnDefinition.ColumnRole.TIMESTAMP) + continue; + + final Object value = columnValues[colIdx][i]; + colOffset += writeColumnValue(dataPage, colOffset, columns.get(c), value); + colIdx++; + } + + // Update page sample count and min/max timestamps + dataPage.writeShort(DATA_SAMPLE_COUNT_OFFSET, (short) (sampleCountInPage + 1)); + + final long currentMinTs = dataPage.readLong(DATA_MIN_TS_OFFSET); + final long currentMaxTs = dataPage.readLong(DATA_MAX_TS_OFFSET); + + if (sampleCountInPage == 0 || timestamps[i] < currentMinTs) + dataPage.writeLong(DATA_MIN_TS_OFFSET, timestamps[i]); + if (sampleCountInPage == 0 || timestamps[i] > currentMaxTs) + dataPage.writeLong(DATA_MAX_TS_OFFSET, timestamps[i]); + + // Update header page stats + updateHeaderStats(tx, timestamps[i]); + } + } + + /** + * Scans the mutable bucket for samples in the given time range. + * + * @param fromTs start timestamp (inclusive) + * @param toTs end timestamp (inclusive) + * @param columnIndices which columns to return (null = all) + * + * @return list of sample rows: each row is Object[] { timestamp, col1, col2, ... } + */ + public List scanRange(final long fromTs, final long toTs, final int[] columnIndices) throws IOException { + final List results = new ArrayList<>(); + final int dataPageCount = getDataPageCount(); + + for (int pageNum = 1; pageNum <= dataPageCount; pageNum++) { + final BasePage page = database.getTransaction().getPage(new PageId(database, fileId, pageNum), pageSize); + + final int sampleCount = page.readShort(DATA_SAMPLE_COUNT_OFFSET) & 0xFFFF; + if (sampleCount == 0) + continue; + + final long pageMinTs = page.readLong(DATA_MIN_TS_OFFSET); + final long pageMaxTs = page.readLong(DATA_MAX_TS_OFFSET); + + // Skip pages outside range + if (pageMaxTs < fromTs || pageMinTs > toTs) + continue; + + for (int row = 0; row < sampleCount; row++) { + final int rowOffset = DATA_ROWS_OFFSET + row * rowSize; + final long ts = page.readLong(rowOffset); + + if (ts < fromTs || ts > toTs) + continue; + + final Object[] sample = readRow(page, rowOffset, columnIndices); + results.add(sample); + } + } + return results; + } + + /** + * Returns a lazy iterator over samples in the given time range. + * Only one page is loaded at a time, keeping memory usage O(pageSize). + * + * @param fromTs start timestamp (inclusive) + * @param toTs end timestamp (inclusive) + * @param columnIndices which columns to return (null = all) + * + * @return iterator yielding Object[] { timestamp, col1, col2, ... } + */ + public Iterator iterateRange(final long fromTs, final long toTs, final int[] columnIndices) throws IOException { + if (getSampleCount() == 0) + return java.util.Collections.emptyIterator(); + + final int dataPageCount = getDataPageCount(); + + return new Iterator<>() { + private int pageNum = 1; + private int rowIdx = 0; + private BasePage currentPage = null; + private int currentSampleCount = 0; + private Object[] nextRow = null; + + { + advance(); + } + + private void advance() { + nextRow = null; + try { + while (pageNum <= dataPageCount) { + if (currentPage == null) { + currentPage = database.getTransaction().getPage(new PageId(database, fileId, pageNum), pageSize); + currentSampleCount = currentPage.readShort(DATA_SAMPLE_COUNT_OFFSET) & 0xFFFF; + rowIdx = 0; + + if (currentSampleCount == 0) { + currentPage = null; + pageNum++; + continue; + } + + final long pageMinTs = currentPage.readLong(DATA_MIN_TS_OFFSET); + final long pageMaxTs = currentPage.readLong(DATA_MAX_TS_OFFSET); + if (pageMaxTs < fromTs || pageMinTs > toTs) { + currentPage = null; + pageNum++; + continue; + } + } + + while (rowIdx < currentSampleCount) { + final int rowOffset = DATA_ROWS_OFFSET + rowIdx * rowSize; + final long ts = currentPage.readLong(rowOffset); + rowIdx++; + + if (ts >= fromTs && ts <= toTs) { + nextRow = readRow(currentPage, rowOffset, columnIndices); + return; + } + } + + currentPage = null; + pageNum++; + } + } catch (final IOException e) { + throw new com.arcadedb.exception.DatabaseOperationException("Error iterating TimeSeries bucket pages", e); + } + } + + @Override + public boolean hasNext() { + return nextRow != null; + } + + @Override + public Object[] next() { + if (nextRow == null) + throw new NoSuchElementException(); + final Object[] result = nextRow; + advance(); + return result; + } + }; + } + + /** + * Returns the total sample count stored in this bucket. + */ + public long getSampleCount() throws IOException { + if (getTotalPages() == 0) + return 0; + final BasePage headerPage = database.getTransaction().getPage(new PageId(database, fileId, 0), pageSize); + return headerPage.readLong(HEADER_SAMPLE_COUNT_OFFSET); + } + + /** + * Returns the minimum timestamp across all samples. + */ + public long getMinTimestamp() throws IOException { + final BasePage headerPage = database.getTransaction().getPage(new PageId(database, fileId, 0), pageSize); + return headerPage.readLong(HEADER_MIN_TS_OFFSET); + } + + /** + * Returns the maximum timestamp across all samples. + */ + public long getMaxTimestamp() throws IOException { + final BasePage headerPage = database.getTransaction().getPage(new PageId(database, fileId, 0), pageSize); + return headerPage.readLong(HEADER_MAX_TS_OFFSET); + } + + /** + * Returns the number of data pages (excluding header page). + */ + public int getDataPageCount() throws IOException { + if (getTotalPages() == 0) + return 0; + final BasePage headerPage = database.getTransaction().getPage(new PageId(database, fileId, 0), pageSize); + return headerPage.readInt(HEADER_DATA_PAGE_COUNT); + } + + /** + * Sets the compaction-in-progress flag. Used for crash-safe compaction. + */ + public void setCompactionInProgress(final boolean inProgress) throws IOException { + final TransactionContext tx = database.getTransaction(); + final MutablePage headerPage = tx.getPageToModify(new PageId(database, fileId, 0), pageSize, false); + headerPage.writeByte(HEADER_COMPACTION_FLAG, (byte) (inProgress ? 1 : 0)); + } + + /** + * Returns true if a compaction was in progress (crash recovery check). + */ + public boolean isCompactionInProgress() throws IOException { + final BasePage headerPage = database.getTransaction().getPage(new PageId(database, fileId, 0), pageSize); + return headerPage.readByte(HEADER_COMPACTION_FLAG) == 1; + } + + /** + * Gets the compaction watermark (sealed store file offset). + */ + public long getCompactionWatermark() throws IOException { + final BasePage headerPage = database.getTransaction().getPage(new PageId(database, fileId, 0), pageSize); + return headerPage.readLong(HEADER_COMPACTION_WATERMARK); + } + + /** + * Sets the compaction watermark. + */ + public void setCompactionWatermark(final long watermark) throws IOException { + final TransactionContext tx = database.getTransaction(); + final MutablePage headerPage = tx.getPageToModify(new PageId(database, fileId, 0), pageSize, false); + headerPage.writeLong(HEADER_COMPACTION_WATERMARK, watermark); + } + + /** + * Returns all data from the bucket as parallel arrays for compaction. + * First array is timestamps (long[]), rest are column values. + */ + public Object[] readAllForCompaction() throws IOException { + final List allRows = scanRange(Long.MIN_VALUE, Long.MAX_VALUE, null); + return allRows.isEmpty() ? null : rowsToCompactionArrays(allRows); + } + + /** + * Reads samples from data pages 1..toPage using the current transaction. + *

+ * Pages 1..toPage must be FULL (immutable): once a data page is full it is never + * modified by {@link #appendSamples}, which always writes to the LAST page. This + * makes it safe to read them inside a short read-only transaction that is rolled + * back immediately after, with no MVCC conflict with concurrent writers. + * + * @param toPage last data page to read (inclusive); must be ≥ 1 + * + * @return parallel arrays [long[] timestamps, Object[] col1, ...], or null if empty + */ + public Object[] readFullPagesForCompaction(final int toPage) throws IOException { + return readPagesRangeForCompaction(1, toPage); + } + + /** + * Reads samples from data pages fromPage..toPage using the current transaction. + * Used by Phase 4 of lock-free compaction (under write lock) to read the partial + * last page(s) that arrived after the Phase 0 snapshot. + * + * @param fromPage first data page to read (inclusive, ≥ 1) + * @param toPage last data page to read (inclusive, ≥ fromPage) + * + * @return parallel arrays [long[] timestamps, Object[] col1, ...], or null if empty + */ + public Object[] readPagesRangeForCompaction(final int fromPage, final int toPage) throws IOException { + final List allRows = new ArrayList<>(); + for (int pageNum = fromPage; pageNum <= toPage; pageNum++) { + final BasePage page = database.getTransaction().getPage(new PageId(database, fileId, pageNum), pageSize); + final int sampleCount = page.readShort(DATA_SAMPLE_COUNT_OFFSET) & 0xFFFF; + if (sampleCount == 0) + continue; + for (int row = 0; row < sampleCount; row++) + allRows.add(readRow(page, DATA_ROWS_OFFSET + row * rowSize, null)); + } + return allRows.isEmpty() ? null : rowsToCompactionArrays(allRows); + } + + /** + * Clears data pages 1..upToPage and recomputes header stats from the remaining pages. + * Pages are physically kept for reuse but have their sample counts reset to 0. + * Called by lock-free compaction to clear only the pages that were compacted, + * leaving newer pages (upToPage+1..dataPageCount) intact. + * + * @param upToPage last page number to clear (inclusive); must be ≥ 1 + */ + public void clearDataPagesUpTo(final int upToPage) throws IOException { + final TransactionContext tx = database.getTransaction(); + final MutablePage headerPage = tx.getPageToModify(new PageId(database, fileId, 0), pageSize, false); + + // Clear pages 1..upToPage + for (int p = 1; p <= upToPage; p++) { + final MutablePage dataPage = tx.getPageToModify(new PageId(database, fileId, p), pageSize, false); + dataPage.writeShort(DATA_SAMPLE_COUNT_OFFSET, (short) 0); + dataPage.writeLong(DATA_MIN_TS_OFFSET, Long.MAX_VALUE); + dataPage.writeLong(DATA_MAX_TS_OFFSET, Long.MIN_VALUE); + } + + // Recompute header stats from the remaining pages (upToPage+1..totalDataPages) + final int totalDataPages = headerPage.readInt(HEADER_DATA_PAGE_COUNT); + long sampleCount = 0; + long minTs = Long.MAX_VALUE; + long maxTs = Long.MIN_VALUE; + for (int p = upToPage + 1; p <= totalDataPages; p++) { + final BasePage page = tx.getPage(new PageId(database, fileId, p), pageSize); + final int count = page.readShort(DATA_SAMPLE_COUNT_OFFSET) & 0xFFFF; + if (count > 0) { + sampleCount += count; + final long pMin = page.readLong(DATA_MIN_TS_OFFSET); + final long pMax = page.readLong(DATA_MAX_TS_OFFSET); + if (pMin < minTs) + minTs = pMin; + if (pMax > maxTs) + maxTs = pMax; + } + } + headerPage.writeLong(HEADER_SAMPLE_COUNT_OFFSET, sampleCount); + headerPage.writeLong(HEADER_MIN_TS_OFFSET, minTs); + headerPage.writeLong(HEADER_MAX_TS_OFFSET, maxTs); + // Keep HEADER_DATA_PAGE_COUNT unchanged so cleared pages can be reused by new inserts + } + + /** + * Clears all data pages after compaction. + * O(1): only the header page is touched; physical data pages remain allocated on disk + * and will be transparently reused as new samples arrive. + * {@link #getOrCreateActiveDataPage} uses {@code HEADER_DATA_PAGE_COUNT} (not the physical + * page count) to locate the current write position, so after this reset it starts from + * page 1 again, reinitialising its sample-count field on the first write. + */ + public void clearDataPages() throws IOException { + final TransactionContext tx = database.getTransaction(); + final MutablePage headerPage = tx.getPageToModify(new PageId(database, fileId, 0), pageSize, false); + headerPage.writeLong(HEADER_SAMPLE_COUNT_OFFSET, 0L); + headerPage.writeLong(HEADER_MIN_TS_OFFSET, Long.MAX_VALUE); + headerPage.writeLong(HEADER_MAX_TS_OFFSET, Long.MIN_VALUE); + headerPage.writeInt(HEADER_DATA_PAGE_COUNT, 0); + // Physical pages are not touched: committing a single header page is O(1) regardless + // of how many data pages were previously allocated, preventing OOM on large datasets. + } + + public List getColumns() { + return columns; + } + + /** + * Returns the maximum number of samples that fit in one data page. + */ + public int getMaxSamplesPerPage() { + return (pageSize - BasePage.PAGE_HEADER_SIZE - DATA_ROWS_OFFSET) / rowSize; + } + + // --- Private helpers --- + + void initHeaderPage() throws IOException { + final TransactionContext tx = database.getTransaction(); + final MutablePage headerPage = tx.addPage(new PageId(database, fileId, 0), pageSize); + headerPage.writeInt(HEADER_MAGIC_OFFSET, MAGIC_VALUE); + headerPage.writeByte(HEADER_FORMAT_VERSION_OFFSET, (byte) CURRENT_VERSION); + headerPage.writeShort(HEADER_COLUMN_COUNT_OFFSET, (short) columns.size()); + headerPage.writeLong(HEADER_SAMPLE_COUNT_OFFSET, 0L); + headerPage.writeLong(HEADER_MIN_TS_OFFSET, Long.MAX_VALUE); + headerPage.writeLong(HEADER_MAX_TS_OFFSET, Long.MIN_VALUE); + headerPage.writeByte(HEADER_COMPACTION_FLAG, (byte) 0); + headerPage.writeLong(HEADER_COMPACTION_WATERMARK, 0L); + headerPage.writeInt(HEADER_DATA_PAGE_COUNT, 0); + pageCount.set(1); + } + + private MutablePage getOrCreateActiveDataPage(final TransactionContext tx) throws IOException { + // Use the logical page count from the header, NOT getTotalPages() (physical). + // After clearDataPages() resets HEADER_DATA_PAGE_COUNT to 0, the physical pages + // still exist on disk; we transparently reuse them starting from page 1, avoiding + // allocating new pages and avoiding wasted space. + final MutablePage headerPage = tx.getPageToModify(new PageId(database, fileId, 0), pageSize, false); + final int dataPageCount = headerPage.readInt(HEADER_DATA_PAGE_COUNT); + + if (dataPageCount > 0) { + // Check if the last logical data page has room + final MutablePage lastPage = tx.getPageToModify(new PageId(database, fileId, dataPageCount), pageSize, false); + final int sampleCount = lastPage.readShort(DATA_SAMPLE_COUNT_OFFSET) & 0xFFFF; + if (sampleCount < getMaxSamplesPerPage()) + return lastPage; + } + + // Need a new (or reused) data page + final int newPageNum = dataPageCount + 1; + final MutablePage newPage; + if (newPageNum < getTotalPages()) { + // Physical page already exists — reuse it (typical after compaction clears the header) + newPage = tx.getPageToModify(new PageId(database, fileId, newPageNum), pageSize, false); + } else { + // Physical page does not yet exist — allocate it + newPage = tx.addPage(new PageId(database, fileId, newPageNum), pageSize); + pageCount.incrementAndGet(); + } + // Initialise the page (old data bytes beyond sample-count are ignored by readers) + newPage.writeShort(DATA_SAMPLE_COUNT_OFFSET, (short) 0); + newPage.writeLong(DATA_MIN_TS_OFFSET, Long.MAX_VALUE); + newPage.writeLong(DATA_MAX_TS_OFFSET, Long.MIN_VALUE); + + headerPage.writeInt(HEADER_DATA_PAGE_COUNT, newPageNum); + return newPage; + } + + private void updateHeaderStats(final TransactionContext tx, final long timestamp) throws IOException { + final MutablePage headerPage = tx.getPageToModify(new PageId(database, fileId, 0), pageSize, false); + final long count = headerPage.readLong(HEADER_SAMPLE_COUNT_OFFSET); + headerPage.writeLong(HEADER_SAMPLE_COUNT_OFFSET, count + 1); + + final long currentMin = headerPage.readLong(HEADER_MIN_TS_OFFSET); + final long currentMax = headerPage.readLong(HEADER_MAX_TS_OFFSET); + if (timestamp < currentMin) + headerPage.writeLong(HEADER_MIN_TS_OFFSET, timestamp); + if (timestamp > currentMax) + headerPage.writeLong(HEADER_MAX_TS_OFFSET, timestamp); + } + + private int writeColumnValue(final MutablePage page, final int offset, final ColumnDefinition col, final Object value) { + return switch (col.getDataType()) { + case DOUBLE -> { + page.writeLong(offset, Double.doubleToRawLongBits(value != null ? ((Number) value).doubleValue() : 0.0)); + yield 8; + } + case LONG, DATETIME -> { + page.writeLong(offset, value != null ? ((Number) value).longValue() : 0L); + yield 8; + } + case INTEGER -> { + page.writeInt(offset, value != null ? ((Number) value).intValue() : 0); + yield 4; + } + case FLOAT -> { + page.writeInt(offset, Float.floatToRawIntBits(value != null ? ((Number) value).floatValue() : 0f)); + yield 4; + } + case SHORT -> { + page.writeShort(offset, value != null ? ((Number) value).shortValue() : (short) 0); + yield 2; + } + case BOOLEAN -> { + page.writeByte(offset, (byte) (Boolean.TRUE.equals(value) ? 1 : 0)); + yield 1; + } + case STRING -> { + // For strings in mutable layer, store length-prefixed UTF-8 + final byte[] bytes = value != null ? ((String) value).getBytes(java.nio.charset.StandardCharsets.UTF_8) : new byte[0]; + if (bytes.length > MAX_STRING_BYTES) + throw new IllegalArgumentException( + "String value exceeds max length of " + MAX_STRING_BYTES + " bytes for column '" + col.getName() + "'"); + page.writeShort(offset, (short) bytes.length); + if (bytes.length > 0) + page.writeByteArray(offset + 2, bytes); + yield 2 + bytes.length; + } + default -> { + page.writeLong(offset, 0L); + yield 8; + } + }; + } + + private Object[] readRow(final BasePage page, final int rowOffset, final int[] columnIndices) { + // First element is always the timestamp + final int resultSize = columnIndices != null ? columnIndices.length + 1 : columns.size(); + final Object[] result = new Object[resultSize]; + result[0] = page.readLong(rowOffset); + + if (columnIndices == null) { + // Read all columns + int colOffset = rowOffset + 8; + int colIdx = 0; + for (int c = 0; c < columns.size(); c++) { + if (columns.get(c).getRole() == ColumnDefinition.ColumnRole.TIMESTAMP) + continue; + result[colIdx + 1] = readColumnValue(page, colOffset, columns.get(c)); + colOffset += getColumnStorageSize(page, colOffset, columns.get(c)); + colIdx++; + } + } else { + // Read specific columns by index + int colOffset = rowOffset + 8; + int colIdx = 0; + int resultIdx = 1; + for (int c = 0; c < columns.size(); c++) { + if (columns.get(c).getRole() == ColumnDefinition.ColumnRole.TIMESTAMP) + continue; + if (isInArray(colIdx, columnIndices)) { + result[resultIdx++] = readColumnValue(page, colOffset, columns.get(c)); + } + colOffset += getColumnStorageSize(page, colOffset, columns.get(c)); + colIdx++; + } + } + return result; + } + + private Object readColumnValue(final BasePage page, final int offset, final ColumnDefinition col) { + return switch (col.getDataType()) { + case DOUBLE -> Double.longBitsToDouble(page.readLong(offset)); + case LONG, DATETIME -> page.readLong(offset); + case INTEGER -> page.readInt(offset); + case FLOAT -> Float.intBitsToFloat(page.readInt(offset)); + case SHORT -> page.readShort(offset); + case BOOLEAN -> page.readByte(offset) == 1; + case STRING -> { + final int len = page.readShort(offset) & 0xFFFF; + if (len == 0) + yield ""; + final byte[] bytes = new byte[len]; + for (int i = 0; i < len; i++) + bytes[i] = (byte) page.readByte(offset + 2 + i); + yield new String(bytes, java.nio.charset.StandardCharsets.UTF_8); + } + default -> null; + }; + } + + private int getColumnStorageSize(final BasePage page, final int offset, final ColumnDefinition col) { + final int fixed = col.getFixedSize(); + if (fixed > 0) + return fixed; + // STRING: 2-byte length prefix + data + return 2 + (page.readShort(offset) & 0xFFFF); + } + + private static int calculateRowSize(final List columns) { + int size = 8; // timestamp (always 8 bytes) + for (final ColumnDefinition col : columns) { + if (col.getRole() == ColumnDefinition.ColumnRole.TIMESTAMP) + continue; + final int fixed = col.getFixedSize(); + if (fixed > 0) + size += fixed; + else + size += 2 + MAX_STRING_BYTES; // max STRING: 2-byte length prefix + max payload + } + return size; + } + + private static boolean isInArray(final int value, final int[] array) { + for (final int v : array) + if (v == value) + return true; + return false; + } + + /** + * Converts a list of sample rows into the parallel-array format expected by compaction. + * First element of the returned array is long[] timestamps; subsequent elements are + * Object[] column value arrays, one per non-timestamp column. + */ + private Object[] rowsToCompactionArrays(final List allRows) { + final int size = allRows.size(); + final int totalCols = columns.size(); + final long[] timestamps = new long[size]; + final Object[][] colArrays = new Object[totalCols - 1][]; + + int colIdx = 0; + for (int c = 0; c < totalCols; c++) { + if (columns.get(c).getRole() == ColumnDefinition.ColumnRole.TIMESTAMP) + continue; + colArrays[colIdx] = new Object[size]; + colIdx++; + } + + for (int i = 0; i < size; i++) { + final Object[] row = allRows.get(i); + timestamps[i] = (long) row[0]; + for (int c = 1; c < row.length; c++) + colArrays[c - 1][i] = row[c]; + } + + final Object[] result = new Object[totalCols]; + result[0] = timestamps; + int idx = 1; + for (final Object[] colArray : colArrays) + result[idx++] = colArray; + return result; + } +} diff --git a/engine/src/main/java/com/arcadedb/engine/timeseries/TimeSeriesCursor.java b/engine/src/main/java/com/arcadedb/engine/timeseries/TimeSeriesCursor.java new file mode 100644 index 0000000000..d131d3d025 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/engine/timeseries/TimeSeriesCursor.java @@ -0,0 +1,61 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import java.util.Iterator; +import java.util.List; + +/** + * Iterator over timeseries samples. Each element is an Object[] where + * index 0 is the timestamp (long) and subsequent indices are column values. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public final class TimeSeriesCursor implements Iterator, AutoCloseable { + + private final List data; + private int position = 0; + + public TimeSeriesCursor(final List data) { + this.data = data; + } + + @Override + public boolean hasNext() { + return position < data.size(); + } + + @Override + public Object[] next() { + return data.get(position++); + } + + public int size() { + return data.size(); + } + + public void reset() { + position = 0; + } + + @Override + public void close() { + // Nothing to close for in-memory cursor + } +} diff --git a/engine/src/main/java/com/arcadedb/engine/timeseries/TimeSeriesEngine.java b/engine/src/main/java/com/arcadedb/engine/timeseries/TimeSeriesEngine.java new file mode 100644 index 0000000000..c5bcd05130 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/engine/timeseries/TimeSeriesEngine.java @@ -0,0 +1,540 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.database.DatabaseInternal; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Comparator; +import java.util.Iterator; +import java.util.List; +import java.util.NoSuchElementException; +import java.util.PriorityQueue; +import java.util.concurrent.CompletableFuture; +import java.util.concurrent.CompletionException; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicInteger; +import java.util.concurrent.atomic.AtomicLong; + +/** + * Coordinates N shards for a TimeSeries type. Routes sync writes to shards + * using round-robin selection, merges reads from all shards. + *

+ * Shard count defaults to the number of async worker threads so that when + * using the async API each slot owns exactly one shard (1:1 affinity). + * When running on a machine with fewer or more cores than the one where + * the type was created, the async executor's {@code getSlot(shardIdx)} + * mapping still guarantees contention-free writes: each shard always + * maps to the same slot. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public class TimeSeriesEngine implements AutoCloseable { + + private final DatabaseInternal database; + private final String typeName; + private final List columns; + private final TimeSeriesShard[] shards; + private final int shardCount; + private final long compactionBucketIntervalMs; + private final ExecutorService shardExecutor; + private final AtomicLong appendCounter = new AtomicLong(); + + public TimeSeriesEngine(final DatabaseInternal database, final String typeName, + final List columns, final int shardCount) throws IOException { + this(database, typeName, columns, shardCount, 0); + } + + public TimeSeriesEngine(final DatabaseInternal database, final String typeName, + final List columns, final int shardCount, + final long compactionBucketIntervalMs) throws IOException { + this.database = database; + this.typeName = typeName; + this.columns = columns; + this.shardCount = shardCount; + this.compactionBucketIntervalMs = compactionBucketIntervalMs; + this.shards = new TimeSeriesShard[shardCount]; + final AtomicInteger threadCounter = new AtomicInteger(0); + this.shardExecutor = Executors.newFixedThreadPool(shardCount, r -> { + final Thread t = new Thread(r, "ArcadeDB-TS-Shard-" + typeName + "-" + threadCounter.getAndIncrement()); + t.setDaemon(true); + return t; + }); + + try { + for (int i = 0; i < shardCount; i++) + shards[i] = new TimeSeriesShard(database, typeName, i, columns, compactionBucketIntervalMs); + } catch (final Exception e) { + shardExecutor.shutdownNow(); + // Close any shards that were successfully created + for (final TimeSeriesShard shard : shards) { + if (shard != null) { + try { + shard.close(); + } catch (final IOException ignored) { + } + } + } + throw e instanceof IOException ? (IOException) e : new IOException("Failed to initialize shards for " + typeName, e); + } + } + + /** + * Appends samples, routing to a shard using round-robin distribution. + *

+ * Note: this method is not synchronized. When multiple threads call it concurrently, + * they may be routed to the same shard. For contention-free writes, use the async API + * which provides 1:1 slot-to-shard affinity. + *

+ * Dictionary column constraint: columns using {@code DICTIONARY} compression + * (typically TAG columns) must not exceed {@link com.arcadedb.engine.timeseries.codec.DictionaryCodec#MAX_DICTIONARY_SIZE} + * distinct values per sealed block. This is validated at compaction time; data that violates + * the limit will cause compaction to fail. Plan tag cardinality accordingly. + */ + public void appendSamples(final long[] timestamps, final Object[]... columnValues) throws IOException { + final int shardIdx = (int) Math.floorMod(appendCounter.getAndIncrement(), (long) shardCount); + shards[shardIdx].appendSamples(timestamps, columnValues); + } + + /** + * Queries all shards and merge-sorts results by timestamp. + */ + public List query(final long fromTs, final long toTs, final int[] columnIndices, + final TagFilter tagFilter) throws IOException { + final List merged = new ArrayList<>(); + for (final TimeSeriesShard shard : shards) + merged.addAll(shard.scanRange(fromTs, toTs, columnIndices, tagFilter)); + + merged.sort(Comparator.comparingLong(row -> (long) row[0])); + return merged; + } + + /** + * Returns a lazy merge-sorted iterator across all shards. + * Uses a min-heap to merge shard iterators by timestamp. + * Memory usage: O(shardCount * max(blockSize, pageSize)) instead of O(totalRows). + */ + public Iterator iterateQuery(final long fromTs, final long toTs, final int[] columnIndices, + final TagFilter tagFilter) throws IOException { + final PriorityQueue heap = new PriorityQueue<>( + Math.max(1, shardCount), Comparator.comparingLong(it -> (long) it.peek()[0])); + + for (final TimeSeriesShard shard : shards) { + final Iterator it = shard.iterateRange(fromTs, toTs, columnIndices, tagFilter); + if (it.hasNext()) + heap.add(new PeekableIterator(it)); + } + + return new Iterator<>() { + @Override + public boolean hasNext() { + return !heap.isEmpty(); + } + + @Override + public Object[] next() { + if (heap.isEmpty()) + throw new NoSuchElementException(); + final PeekableIterator min = heap.poll(); + final Object[] row = min.next(); + if (min.hasNext()) + heap.add(min); + return row; + } + }; + } + + /** + * Aggregates across all shards. + * + * @param columnIndex 0-based index among non-timestamp columns (i.e. column 0 = first non-ts column). + * This differs from {@link MultiColumnAggregationRequest#columnIndex()} which uses + * the full schema index (including the timestamp column). + */ + public AggregationResult aggregate(final long fromTs, final long toTs, final int columnIndex, + final AggregationType aggType, final long bucketIntervalMs, final TagFilter tagFilter) throws IOException { + // Use lazy iteration to avoid loading all data into memory + final Iterator iter = iterateQuery(fromTs, toTs, null, tagFilter); + final AggregationResult result = new AggregationResult(); + + while (iter.hasNext()) { + final Object[] row = iter.next(); + final long ts = (long) row[0]; + final long bucketTs = bucketIntervalMs > 0 ? (ts / bucketIntervalMs) * bucketIntervalMs : fromTs; + final double value; + + if (columnIndex + 1 < row.length && row[columnIndex + 1] instanceof Number) + value = ((Number) row[columnIndex + 1]).doubleValue(); + else + value = 0.0; + + accumulateToBucket(result, bucketTs, value, aggType); + } + + // Finalize AVG: divide accumulated sums by counts + if (aggType == AggregationType.AVG) { + for (int i = 0; i < result.size(); i++) + result.updateValue(i, result.getValue(i) / result.getCount(i)); + } + + return result; + } + + /** + * Aggregates multiple columns in a single pass, bucketed by time interval. + * Returns only the aggregated buckets instead of all raw rows. + * Uses block-level aggregation on sealed stores (decompresses arrays directly, no Object[] boxing). + * Falls back to row iteration only for the small mutable bucket. + */ + public MultiColumnAggregationResult aggregateMulti(final long fromTs, final long toTs, + final List requests, final long bucketIntervalMs, + final TagFilter tagFilter) throws IOException { + return aggregateMulti(fromTs, toTs, requests, bucketIntervalMs, tagFilter, null); + } + + /** + * Aggregates multiple columns in a single pass, bucketed by time interval. + * Optionally populates an {@link AggregationMetrics} with timing breakdown. + */ + public MultiColumnAggregationResult aggregateMulti(final long fromTs, final long toTs, + final List requests, final long bucketIntervalMs, + final TagFilter tagFilter, final AggregationMetrics metrics) throws IOException { + final int reqCount = requests.size(); + + // Determine actual data range to size flat arrays correctly + long actualMin = Long.MAX_VALUE; + long actualMax = Long.MIN_VALUE; + final boolean useFlatMode = bucketIntervalMs > 0; + if (useFlatMode) { + for (final TimeSeriesShard shard : shards) { + final TimeSeriesSealedStore ss = shard.getSealedStore(); + if (ss.getBlockCount() > 0) { + if (ss.getGlobalMinTimestamp() < actualMin) + actualMin = ss.getGlobalMinTimestamp(); + if (ss.getGlobalMaxTimestamp() > actualMax) + actualMax = ss.getGlobalMaxTimestamp(); + } + } + // Clamp to query range + if (fromTs != Long.MIN_VALUE && fromTs > actualMin) + actualMin = fromTs; + if (toTs != Long.MAX_VALUE && toTs < actualMax) + actualMax = toTs; + } + + final long firstBucket; + final int maxBuckets; + if (useFlatMode && actualMin <= actualMax) { + firstBucket = (actualMin / bucketIntervalMs) * bucketIntervalMs; + final long computedBuckets = (actualMax - firstBucket) / bucketIntervalMs + 2; + if (computedBuckets > MultiColumnAggregationResult.MAX_FLAT_BUCKETS) + // Will trigger map-mode fallback in MultiColumnAggregationResult constructor + maxBuckets = MultiColumnAggregationResult.MAX_FLAT_BUCKETS + 1; + else + maxBuckets = (int) computedBuckets; + } else { + firstBucket = 0; + maxBuckets = 0; + } + + // Pre-extract column indices and types for mutable bucket iteration + final int[] columnIndices = new int[reqCount]; + final boolean[] isCount = new boolean[reqCount]; + for (int r = 0; r < reqCount; r++) { + columnIndices[r] = requests.get(r).columnIndex(); + isCount[r] = requests.get(r).type() == AggregationType.COUNT; + } + + // Process sealed stores in parallel when there are multiple shards with data + if (shardCount > 1 && maxBuckets > 0) { + @SuppressWarnings("unchecked") + final CompletableFuture[] futures = new CompletableFuture[shardCount]; + + // Acquire all shard compaction read locks on the calling thread (which has the database + // transaction context) before dispatching futures. This prevents compaction from + // completing between the sealed reads (in futures) and the mutable reads (on the calling + // thread), which would cause data loss: sealed would see the old state and mutable would + // be empty after compaction cleared it. + // Worker threads only read sealed stores (no transaction required); mutable reads happen + // on the calling thread after futures complete. + for (int s = 0; s < shardCount; s++) + shards[s].getCompactionLock().readLock().lock(); + try { + final AggregationMetrics[] shardMetricsArr = metrics != null ? new AggregationMetrics[shardCount] : null; + for (int s = 0; s < shardCount; s++) { + final TimeSeriesShard shard = shards[s]; + final AggregationMetrics shardMetrics = metrics != null ? new AggregationMetrics() : null; + if (shardMetricsArr != null) + shardMetricsArr[s] = shardMetrics; + futures[s] = CompletableFuture.supplyAsync(() -> { + try { + final MultiColumnAggregationResult shardResult = + new MultiColumnAggregationResult(requests, firstBucket, bucketIntervalMs, maxBuckets); + shard.getSealedStore().aggregateMultiBlocks(fromTs, toTs, requests, bucketIntervalMs, shardResult, shardMetrics, tagFilter); + return shardResult; + } catch (final IOException e) { + throw new CompletionException(e); + } + }, shardExecutor); + } + + // Wait for all sealed reads to complete + try { + CompletableFuture.allOf(futures).join(); + } catch (final CompletionException e) { + if (e.getCause() instanceof IOException ioe) + throw ioe; + throw new IOException("Parallel shard aggregation failed", e.getCause()); + } + + // Merge metrics after all futures have completed (avoids race condition) + if (shardMetricsArr != null) + for (final AggregationMetrics sm : shardMetricsArr) + metrics.mergeFrom(sm); + + final MultiColumnAggregationResult result = futures[0].join(); + for (int s = 1; s < shardCount; s++) + result.mergeFrom(futures[s].join()); + + // Process mutable buckets on the calling thread (has both transaction context and the + // compaction read locks acquired above, so compaction cannot clear mutable data now) + final double[] rowValues = new double[reqCount]; + for (final TimeSeriesShard shard : shards) { + final Iterator mutableIter = shard.getMutableBucket().iterateRange(fromTs, toTs, null); + while (mutableIter.hasNext()) { + final Object[] row = mutableIter.next(); + if (tagFilter != null && !tagFilter.matches(row)) + continue; + final long ts = (long) row[0]; + final long bucketTs = (ts / bucketIntervalMs) * bucketIntervalMs; + for (int r = 0; r < reqCount; r++) { + if (isCount[r]) + rowValues[r] = 1.0; + else if (columnIndices[r] < row.length && row[columnIndices[r]] instanceof Number n) + rowValues[r] = n.doubleValue(); + else + rowValues[r] = 0.0; + } + result.accumulateRow(bucketTs, rowValues); + } + } + + result.finalizeAvg(); + return result; + } finally { + for (int s = 0; s < shardCount; s++) + shards[s].getCompactionLock().readLock().unlock(); + } + + } else { + // Sequential path: single shard or no flat mode + final MultiColumnAggregationResult result = maxBuckets > 0 + ? new MultiColumnAggregationResult(requests, firstBucket, bucketIntervalMs, maxBuckets) + : new MultiColumnAggregationResult(requests); + + final double[] rowValues = new double[reqCount]; + + for (final TimeSeriesShard shard : shards) { + // Hold the compaction read lock for sealed+mutable reads to prevent data loss + // if compaction completes between reading the two layers. + shard.getCompactionLock().readLock().lock(); + try { + shard.getSealedStore().aggregateMultiBlocks(fromTs, toTs, requests, bucketIntervalMs, result, metrics, tagFilter); + + final Iterator mutableIter = shard.getMutableBucket().iterateRange(fromTs, toTs, null); + while (mutableIter.hasNext()) { + final Object[] row = mutableIter.next(); + if (tagFilter != null && !tagFilter.matches(row)) + continue; + final long ts = (long) row[0]; + final long bucketTs = bucketIntervalMs > 0 ? (ts / bucketIntervalMs) * bucketIntervalMs : fromTs; + + for (int r = 0; r < reqCount; r++) { + if (isCount[r]) + rowValues[r] = 1.0; + else if (columnIndices[r] < row.length && row[columnIndices[r]] instanceof Number n) + rowValues[r] = n.doubleValue(); + else + rowValues[r] = 0.0; + } + + result.accumulateRow(bucketTs, rowValues); + } + } finally { + shard.getCompactionLock().readLock().unlock(); + } + } + + result.finalizeAvg(); + return result; + } + } + + /** + * Triggers compaction on all shards. + */ + public void compactAll() throws IOException { + for (final TimeSeriesShard shard : shards) + shard.compact(); + } + + /** + * Applies retention policy: removes sealed blocks older than the given timestamp. + * Note: this only truncates sealed stores. To ensure mutable bucket data is also + * covered, call {@link #compactAll()} before this method. + */ + public void applyRetention(final long cutoffTimestamp) throws IOException { + for (final TimeSeriesShard shard : shards) + shard.getSealedStore().truncateBefore(cutoffTimestamp); + } + + /** + * Applies downsampling tiers to sealed data. For each tier (sorted by afterMs ascending), + * blocks older than (nowMs - tier.afterMs) are reduced to tier.granularityMs resolution + * by averaging numeric fields per time bucket. Tag columns are preserved as group keys. + * The density check provides idempotency: blocks already at or coarser than the target + * resolution are left untouched. + */ + public void applyDownsampling(final List tiers, final long nowMs) throws IOException { + if (tiers == null || tiers.isEmpty()) + return; + + // Identify column roles + final int tsColIdx = findTimestampColumnIndex(); + final List tagColIndices = new ArrayList<>(); + final List numericColIndices = new ArrayList<>(); + for (int c = 0; c < columns.size(); c++) { + if (c == tsColIdx) + continue; + if (columns.get(c).getRole() == ColumnDefinition.ColumnRole.TAG) + tagColIndices.add(c); + else + numericColIndices.add(c); + } + + for (final DownsamplingTier tier : tiers) { + final long cutoffTs = nowMs - tier.afterMs(); + for (final TimeSeriesShard shard : shards) + shard.getSealedStore().downsampleBlocks(cutoffTs, tier.granularityMs(), tsColIdx, tagColIndices, numericColIndices); + } + } + + private int findTimestampColumnIndex() { + for (int i = 0; i < columns.size(); i++) + if (columns.get(i).getRole() == ColumnDefinition.ColumnRole.TIMESTAMP) + return i; + return 0; + } + + /** + * Returns the total number of samples across all shards (sealed + mutable). + * O(shardCount * blockCount), all data already in memory. + */ + public long countSamples() throws IOException { + long total = 0; + for (final TimeSeriesShard shard : shards) { + total += shard.getSealedStore().getTotalSampleCount(); + total += shard.getMutableBucket().getSampleCount(); + } + return total; + } + + public int getShardCount() { + return shardCount; + } + + public TimeSeriesShard getShard(final int index) { + return shards[index]; + } + + public List getColumns() { + return columns; + } + + public String getTypeName() { + return typeName; + } + + @Override + public void close() throws IOException { + shardExecutor.shutdown(); + try { + if (!shardExecutor.awaitTermination(30, TimeUnit.SECONDS)) + shardExecutor.shutdownNow(); + } catch (final InterruptedException e) { + shardExecutor.shutdownNow(); + Thread.currentThread().interrupt(); + } + for (final TimeSeriesShard shard : shards) + shard.close(); + } + + // --- Private helpers --- + + private static final class PeekableIterator implements Iterator { + private final Iterator delegate; + private Object[] peeked; + + PeekableIterator(final Iterator delegate) { + this.delegate = delegate; + this.peeked = delegate.hasNext() ? delegate.next() : null; + } + + Object[] peek() { + return peeked; + } + + @Override + public boolean hasNext() { + return peeked != null; + } + + @Override + public Object[] next() { + if (peeked == null) + throw new NoSuchElementException(); + final Object[] result = peeked; + peeked = delegate.hasNext() ? delegate.next() : null; + return result; + } + } + + private void accumulateToBucket(final AggregationResult result, final long bucketTs, final double value, + final AggregationType type) { + final int idx = result.findBucketIndex(bucketTs); + if (idx >= 0) { + final double existing = result.getValue(idx); + final long count = result.getCount(idx); + final double merged = switch (type) { + case SUM -> existing + value; + case COUNT -> existing + 1; + case AVG -> existing + value; // accumulate sum, divide by count later + case MIN -> Math.min(existing, value); + case MAX -> Math.max(existing, value); + }; + result.updateValue(idx, merged); + result.updateCount(idx, count + 1); + } else { + result.addBucket(bucketTs, type == AggregationType.COUNT ? 1.0 : value, 1); + } + } +} diff --git a/engine/src/main/java/com/arcadedb/engine/timeseries/TimeSeriesMaintenanceScheduler.java b/engine/src/main/java/com/arcadedb/engine/timeseries/TimeSeriesMaintenanceScheduler.java new file mode 100644 index 0000000000..3fa371a1e7 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/engine/timeseries/TimeSeriesMaintenanceScheduler.java @@ -0,0 +1,143 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.database.Database; +import com.arcadedb.database.DatabaseContext; +import com.arcadedb.database.DatabaseInternal; +import com.arcadedb.log.LogManager; +import com.arcadedb.schema.LocalTimeSeriesType; + +import java.lang.ref.WeakReference; +import java.util.Map; +import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.atomic.AtomicInteger; +import java.util.concurrent.Executors; +import java.util.concurrent.ScheduledExecutorService; +import java.util.concurrent.ScheduledFuture; +import java.util.concurrent.TimeUnit; +import java.util.logging.Level; + +/** + * Background scheduler that automatically applies retention and downsampling + * policies for TimeSeries types. Runs as a daemon thread and checks each + * registered type at a configurable interval (default: 60 seconds). + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public class TimeSeriesMaintenanceScheduler { + + private static final long DEFAULT_CHECK_INTERVAL_MS = 60_000; // 1 minute + + private final ScheduledExecutorService executor; + private final Map> tasks = new ConcurrentHashMap<>(); + + /** Maximum number of concurrent maintenance tasks (compaction + retention per type). */ + private static final int MAX_THREADS = 4; + + private static final AtomicInteger THREAD_COUNTER = new AtomicInteger(); + + public TimeSeriesMaintenanceScheduler() { + this.executor = Executors.newScheduledThreadPool(MAX_THREADS, r -> { + final Thread t = new Thread(r, "ArcadeDB-TS-Maintenance-" + THREAD_COUNTER.incrementAndGet()); + t.setDaemon(true); + return t; + }); + } + + /** + * Schedules automatic compaction, retention, and downsampling for a TimeSeries type. + * Compaction is always scheduled to prevent unbounded mutable-bucket growth. + * Retention and downsampling steps are only executed when policies are configured. + */ + public void schedule(final Database database, final LocalTimeSeriesType tsType) { + final String typeName = tsType.getName(); + + final WeakReference dbRef = new WeakReference<>(database); + final WeakReference typeRef = new WeakReference<>(tsType); + + // Cancel any existing task for this type (e.g., if retention policy was changed via ALTER) + final ScheduledFuture existing = tasks.remove(typeName); + if (existing != null) + existing.cancel(false); + + tasks.put(typeName, executor.scheduleAtFixedRate(() -> { + final Database db = dbRef.get(); + final LocalTimeSeriesType type = typeRef.get(); + if (db == null || !db.isOpen() || type == null) { + cancel(typeName); + return; + } + + // The maintenance thread is created by a ScheduledExecutorService and does not + // have a DatabaseContext initialized. We must initialize it before calling any + // database operation (begin/commit) or we get "Transaction context not found". + DatabaseContext.INSTANCE.init((DatabaseInternal) db); + try { + final TimeSeriesEngine engine = type.getEngine(); + if (engine == null) + return; + + final long nowMs = System.currentTimeMillis(); + + // Compact mutable data before retention/downsampling so that + // all samples are in the sealed store and subject to truncation. + engine.compactAll(); + + // Apply retention policy + if (type.getRetentionMs() > 0) { + final long cutoff = nowMs - type.getRetentionMs(); + engine.applyRetention(cutoff); + } + + // Apply downsampling tiers + if (!type.getDownsamplingTiers().isEmpty()) + engine.applyDownsampling(type.getDownsamplingTiers(), nowMs); + + } catch (final Throwable e) { + LogManager.instance().log(this, Level.WARNING, + "Error in TimeSeries maintenance for type '%s'", e, typeName); + } + }, 5_000, DEFAULT_CHECK_INTERVAL_MS, TimeUnit.MILLISECONDS)); + } + + /** + * Cancels the maintenance task for a specific type. + */ + public void cancel(final String typeName) { + final ScheduledFuture future = tasks.remove(typeName); + if (future != null) + future.cancel(false); + } + + /** + * Shuts down the scheduler and cancels all tasks. + */ + public void shutdown() { + executor.shutdown(); + try { + if (!executor.awaitTermination(10, TimeUnit.SECONDS)) + executor.shutdownNow(); + } catch (final InterruptedException e) { + executor.shutdownNow(); + Thread.currentThread().interrupt(); + } + tasks.clear(); + } +} diff --git a/engine/src/main/java/com/arcadedb/engine/timeseries/TimeSeriesSealedStore.java b/engine/src/main/java/com/arcadedb/engine/timeseries/TimeSeriesSealedStore.java new file mode 100644 index 0000000000..82e48a5926 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/engine/timeseries/TimeSeriesSealedStore.java @@ -0,0 +1,2013 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.log.LogManager; +import com.arcadedb.engine.timeseries.codec.DeltaOfDeltaCodec; +import com.arcadedb.engine.timeseries.codec.DictionaryCodec; +import com.arcadedb.engine.timeseries.codec.GorillaXORCodec; +import com.arcadedb.engine.timeseries.codec.Simple8bCodec; +import com.arcadedb.engine.timeseries.codec.TimeSeriesCodec; +import com.arcadedb.engine.timeseries.simd.TimeSeriesVectorOps; +import com.arcadedb.engine.timeseries.simd.TimeSeriesVectorOpsProvider; +import com.arcadedb.schema.Type; + +import java.io.File; +import java.io.IOException; +import java.io.RandomAccessFile; +import java.nio.ByteBuffer; +import java.nio.channels.FileChannel; +import java.nio.charset.StandardCharsets; +import java.nio.file.Files; +import java.nio.file.StandardCopyOption; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.BitSet; +import java.util.Comparator; +import java.util.HashMap; +import java.util.Iterator; +import java.util.LinkedHashSet; +import java.util.List; +import java.util.Map; +import java.util.concurrent.locks.ReadWriteLock; +import java.util.concurrent.locks.ReentrantReadWriteLock; +import java.util.logging.Level; +import java.util.zip.CRC32; + +/** + * Immutable columnar storage for compacted TimeSeries data. + * Uses FileChannel positioned reads for zero-overhead access. + *

+ * Index file (.ts.sealed) layout — 27-byte header: + * - [0..3] magic "TSIX" (4 bytes) + * - [4] format version byte (always {@value #CURRENT_VERSION}) + * - [5..6] column count (short) + * - [7..10] block count (int) + * - [11..18] global min timestamp (long) + * - [19..26] global max timestamp (long) + * - [27..] block entries (inline metadata + compressed column data) + *

+ * Block entry layout: + * - magic "TSBL" (4), minTs (8), maxTs (8), sampleCount (4), colSizes (4*colCount) + * - numericColCount (4), [min (8) + max (8) + sum (8)] * numericColCount (schema order, no colIdx) + * - tag metadata: tagColCount (2), per TAG column: distinctCount (2), per value: len (2) + UTF-8 bytes + * - compressed column data bytes + * - blockCRC32 (4) — CRC32 of everything from blockMagic to end of compressed data + *

+ * High-Availability / Replication note: + * Sealed store files ({@code .ts.sealed}) are written via {@link RandomAccessFile} and + * {@link FileChannel} directly to the local filesystem, bypassing ArcadeDB's + * page-level replication infrastructure. This is by design: compacted time-series data + * is derived (it is produced by compacting the replicated mutable {@link TimeSeriesBucket} + * pages) and therefore does not need to be replicated separately. Each HA node independently + * performs its own compaction from its own replicated mutable buckets, eventually reaching + * an equivalent sealed store. In-flight mutable data (the {@code .tstb} bucket files) is + * fully replicated through the normal {@link com.arcadedb.engine.PaginatedComponent} path. + * The consequence is that, immediately after a failover, a follower that has not yet + * compacted may serve queries from the mutable bucket only until its maintenance scheduler + * runs the next compaction cycle. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public class TimeSeriesSealedStore implements AutoCloseable { + + public static final int CURRENT_VERSION = 0; + private static final int MAGIC_VALUE = 0x54534958; // "TSIX" + private static final int BLOCK_MAGIC_VALUE = 0x5453424C; // "TSBL" + private static final int HEADER_SIZE = 27; + // Shared with DeltaOfDeltaCodec and GorillaXORCodec: all three validate/use the same limit + private static final int MAX_BLOCK_SIZE = DeltaOfDeltaCodec.MAX_BLOCK_SIZE; + + private final String basePath; + private final List columns; + private RandomAccessFile indexFile; + private FileChannel indexChannel; + + enum BlockMatchResult { SKIP, FAST_PATH, SLOW_PATH } + + // In-memory block directory (loaded at open) — protected by directoryLock + private final List blockDirectory = new ArrayList<>(); + private final ReadWriteLock directoryLock = new ReentrantReadWriteLock(); + private volatile long globalMinTs = Long.MAX_VALUE; // volatile: read without write lock + private volatile long globalMaxTs = Long.MIN_VALUE; // volatile: read without write lock + private boolean headerDirty; + + static final class BlockEntry { + final long minTimestamp; + final long maxTimestamp; + final int sampleCount; + final long[] columnOffsets; + final int[] columnSizes; + final double[] columnMins; // per-column min (NaN for non-numeric) + final double[] columnMaxs; // per-column max + final double[] columnSums; // per-column sum + String[][] tagDistinctValues; // indexed by schema column index, null for non-TAG columns + long blockStartOffset; // file offset where block meta begins (for lazy CRC) + int storedCRC; // CRC32 stored on disk (-1 if written inline, not yet flushed) + volatile boolean crcValidated; // true after first successful CRC check (volatile: read without lock) + + BlockEntry(final long minTs, final long maxTs, final int sampleCount, final int columnCount, + final double[] mins, final double[] maxs, final double[] sums) { + this.minTimestamp = minTs; + this.maxTimestamp = maxTs; + this.sampleCount = sampleCount; + this.columnOffsets = new long[columnCount]; + this.columnSizes = new int[columnCount]; + this.columnMins = mins; + this.columnMaxs = maxs; + this.columnSums = sums; + this.crcValidated = true; // newly created blocks don't need validation + } + } + + public TimeSeriesSealedStore(final String basePath, final List columns) throws IOException { + this.basePath = basePath; + this.columns = columns; + + // Clean up stale .tmp files left by interrupted shutdown or maintenance + final File tmpFile = new File(basePath + ".ts.sealed.tmp"); + if (tmpFile.exists() && !tmpFile.delete()) + throw new IOException("Failed to delete stale temporary file: " + tmpFile.getAbsolutePath()); + + final File f = new File(basePath + ".ts.sealed"); + final boolean exists = f.exists(); + this.indexFile = new RandomAccessFile(f, "rw"); + this.indexChannel = indexFile.getChannel(); + + try { + if (exists && indexFile.length() >= HEADER_SIZE) + loadDirectory(); + else + writeEmptyHeader(); + } catch (final IOException e) { + // Close handles to avoid leaking them if initialization fails + try { indexChannel.close(); } catch (final IOException ignored) {} + try { indexFile.close(); } catch (final IOException ignored) {} + throw e; + } + } + + /** + * Appends a block of compressed column data with per-column statistics. + * Stats enable block-level aggregation without decompression. + * + * @param sampleCount number of samples in the block + * @param minTs minimum timestamp + * @param maxTs maximum timestamp + * @param compressedColumns compressed byte arrays, one per column + * @param columnMins per-column min (NaN for non-numeric columns) + * @param columnMaxs per-column max (NaN for non-numeric columns) + * @param columnSums per-column sum (NaN for non-numeric columns) + */ + public void appendBlock(final int sampleCount, final long minTs, final long maxTs, + final byte[][] compressedColumns, + final double[] columnMins, final double[] columnMaxs, final double[] columnSums, + final String[][] tagDistinctValues) throws IOException { + directoryLock.writeLock().lock(); + try { + final int colCount = columns.size(); + + // Count numeric columns (those with non-NaN stats) + int numericColCount = 0; + for (int c = 0; c < colCount; c++) + if (!Double.isNaN(columnMins[c])) + numericColCount++; + + // Build tag metadata section + final byte[] tagMeta = buildTagMetadata(tagDistinctValues, colCount); + + // Block header: magic(4) + minTs(8) + maxTs(8) + sampleCount(4) + colSizes(4*colCount) + // + numericColCount(4) + [min(8) + max(8) + sum(8)] * numericColCount + // + tag metadata + final int statsSize = 4 + (8 + 8 + 8) * numericColCount; + final int metaSize = 4 + 8 + 8 + 4 + 4 * colCount + statsSize + tagMeta.length; + final ByteBuffer metaBuf = ByteBuffer.allocate(metaSize); + metaBuf.putInt(BLOCK_MAGIC_VALUE); + metaBuf.putLong(minTs); + metaBuf.putLong(maxTs); + metaBuf.putInt(sampleCount); + for (final byte[] col : compressedColumns) + metaBuf.putInt(col.length); + + // Write stats section (schema order, no colIdx — iterate columns, skip non-numeric) + metaBuf.putInt(numericColCount); + for (int c = 0; c < colCount; c++) { + if (!Double.isNaN(columnMins[c])) { + metaBuf.putDouble(columnMins[c]); + metaBuf.putDouble(columnMaxs[c]); + metaBuf.putDouble(columnSums[c]); + } + } + + // Write tag metadata + metaBuf.put(tagMeta); + metaBuf.flip(); + + // Compute CRC32 over meta + compressed data + // Use metaBuf.limit() (not .array().length) since the backing array may be larger after flip + final CRC32 crc = new CRC32(); + crc.update(metaBuf.array(), 0, metaBuf.limit()); + + long offset = indexFile.length(); + indexFile.seek(offset); + indexFile.write(metaBuf.array(), 0, metaBuf.limit()); + offset += metaSize; + + final BlockEntry entry = new BlockEntry(minTs, maxTs, sampleCount, colCount, columnMins, columnMaxs, columnSums); + entry.tagDistinctValues = tagDistinctValues; + // Write compressed column data + for (int c = 0; c < colCount; c++) { + entry.columnOffsets[c] = offset; + entry.columnSizes[c] = compressedColumns[c].length; + crc.update(compressedColumns[c]); + indexFile.write(compressedColumns[c]); + offset += compressedColumns[c].length; + } + + // Write block CRC32 + final ByteBuffer crcBuf = ByteBuffer.allocate(4); + crcBuf.putInt((int) crc.getValue()); + crcBuf.flip(); + indexFile.write(crcBuf.array()); + + blockDirectory.add(entry); + + if (minTs < globalMinTs) + globalMinTs = minTs; + if (maxTs > globalMaxTs) + globalMaxTs = maxTs; + + headerDirty = true; + } finally { + directoryLock.writeLock().unlock(); + } + } + + /** + * Flushes the header to disk if any blocks have been appended since the last flush. + * Called automatically by {@link #close()}. + */ + public void flushHeader() throws IOException { + directoryLock.writeLock().lock(); + try { + if (headerDirty) { + rewriteHeader(); + headerDirty = false; + } + } finally { + directoryLock.writeLock().unlock(); + } + } + + /** + * Scans blocks overlapping the given time range and returns decompressed data. + */ + public List scanRange(final long fromTs, final long toTs, final int[] columnIndices, + final TagFilter tagFilter) throws IOException { + // Hold the read lock for the entire scan including file I/O. + // This prevents concurrent writers from closing/replacing the file channel + // while reads are in progress (stale offset race). + directoryLock.readLock().lock(); + try { + final List results = new ArrayList<>(); + final int tsColIdx = findTimestampColumnIndex(); + + for (final BlockEntry entry : blockDirectory) { + if (entry.maxTimestamp < fromTs || entry.minTimestamp > toTs) + continue; + + final BlockMatchResult tagMatch = tagFilter != null + ? blockMatchesTagFilter(entry, tagFilter) + : BlockMatchResult.FAST_PATH; + if (tagMatch == BlockMatchResult.SKIP) + continue; + + final long[] timestamps = decompressTimestamps(entry, tsColIdx); + final Object[][] decompressedCols = decompressColumns(entry, columnIndices, tsColIdx); + + final int resultCols = decompressedCols.length + 1; + for (int i = 0; i < timestamps.length; i++) { + if (timestamps[i] < fromTs || timestamps[i] > toTs) + continue; + + final Object[] row = new Object[resultCols]; + row[0] = timestamps[i]; + for (int c = 0; c < decompressedCols.length; c++) + row[c + 1] = decompressedCols[c][i]; + + // For SLOW_PATH blocks (mixed tag values), apply per-row filtering. + // Use matchesMapped() so the filter works correctly when columnIndices is a subset. + if (tagMatch == BlockMatchResult.SLOW_PATH && !tagFilter.matchesMapped(row, columnIndices)) + continue; + + results.add(row); + } + } + return results; + } finally { + directoryLock.readLock().unlock(); + } + } + + /** + * Returns an iterator over sealed blocks overlapping the given time range. + * Eagerly collects all matching rows under the read lock to prevent stale + * file offsets after atomic file replacement by concurrent writers. + *

+ * Optimizations: + * - Binary search on block directory to skip to first matching block + * - Early termination when blocks are past the time range (blocks are sorted) + * - Timestamps decompressed first; value columns only if the block has matches + * - Binary search within each block's sorted timestamps for the matching range + * + * @param fromTs start timestamp (inclusive) + * @param toTs end timestamp (inclusive) + * @param columnIndices which columns to return (null = all) + * + * @return iterator yielding Object[] { timestamp, col1, col2, ... }. + * Note: all matching rows are fully materialised into memory before the iterator + * is returned, because the read lock must be held for all file I/O and released before + * the caller iterates. For very large time ranges consider using aggregation instead. + */ + public Iterator iterateRange(final long fromTs, final long toTs, final int[] columnIndices, + final TagFilter tagFilter) throws IOException { + // Hold the read lock for all file I/O to prevent stale offsets after + // atomic file replacement by concurrent writers (truncate/downsample). + directoryLock.readLock().lock(); + try { + final List results = new ArrayList<>(); + final int tsColIdx = findTimestampColumnIndex(); + final int dirSize = blockDirectory.size(); + + // Binary search: find first block whose maxTimestamp >= fromTs + int startBlockIdx = 0; + if (dirSize > 0) { + int lo = 0, hi = dirSize - 1; + while (lo < hi) { + final int mid = (lo + hi) >>> 1; + if (blockDirectory.get(mid).maxTimestamp < fromTs) + lo = mid + 1; + else + hi = mid; + } + startBlockIdx = lo; + } + + for (int blockIdx = startBlockIdx; blockIdx < dirSize; blockIdx++) { + final BlockEntry entry = blockDirectory.get(blockIdx); + + // Early termination: blocks are sorted, so if minTs > toTs all remaining are past range + if (entry.minTimestamp > toTs) + break; + + if (entry.maxTimestamp < fromTs) + continue; + + final BlockMatchResult tagMatch = tagFilter != null + ? blockMatchesTagFilter(entry, tagFilter) + : BlockMatchResult.FAST_PATH; + if (tagMatch == BlockMatchResult.SKIP) + continue; + + final long[] ts = decompressTimestamps(entry, tsColIdx); + final int start = lowerBound(ts, fromTs); + final int end = upperBound(ts, toTs); + + if (start >= end) + continue; + + final Object[][] decompCols = decompressColumns(entry, columnIndices, tsColIdx); + final int resultCols = decompCols.length + 1; + + for (int i = start; i < end; i++) { + final Object[] row = new Object[resultCols]; + row[0] = ts[i]; + for (int c = 0; c < decompCols.length; c++) + row[c + 1] = decompCols[c][i]; + // Use matchesMapped() so the filter works correctly when columnIndices is a subset. + if (tagMatch == BlockMatchResult.SLOW_PATH && !tagFilter.matchesMapped(row, columnIndices)) + continue; + results.add(row); + } + } + return results.iterator(); + } finally { + directoryLock.readLock().unlock(); + } + } + + /** + * Finds the first index where ts[i] >= target (lower bound). + */ + private static int lowerBound(final long[] ts, final long target) { + int lo = 0, hi = ts.length; + while (lo < hi) { + final int mid = (lo + hi) >>> 1; + if (ts[mid] < target) + lo = mid + 1; + else + hi = mid; + } + return lo; + } + + /** + * Finds the first index where ts[i] > target (upper bound). + */ + private static int upperBound(final long[] ts, final long target) { + int lo = 0, hi = ts.length; + while (lo < hi) { + final int mid = (lo + hi) >>> 1; + if (ts[mid] <= target) + lo = mid + 1; + else + hi = mid; + } + return lo; + } + + private static int lowerBound(final long[] ts, final int from, final int to, final long target) { + int lo = from, hi = to; + while (lo < hi) { + final int mid = (lo + hi) >>> 1; + if (ts[mid] < target) + lo = mid + 1; + else + hi = mid; + } + return lo; + } + + private static int upperBound(final long[] ts, final int from, final int to, final long target) { + int lo = from, hi = to; + while (lo < hi) { + final int mid = (lo + hi) >>> 1; + if (ts[mid] <= target) + lo = mid + 1; + else + hi = mid; + } + return lo; + } + + /** + * Push-down aggregation on sealed blocks. + */ + public AggregationResult aggregate(final long fromTs, final long toTs, final int columnIndex, + final AggregationType type, final long bucketIntervalMs) throws IOException { + final AggregationResult result = new AggregationResult(); + final int tsColIdx = findTimestampColumnIndex(); + final int targetColSchemaIdx = findNonTsColumnSchemaIndex(columnIndex); + + // Hold the read lock for the entire scan including file I/O to prevent stale offsets + // after atomic file replacement by concurrent writers (truncate/downsample). + directoryLock.readLock().lock(); + try { + for (final BlockEntry entry : blockDirectory) { + if (entry.maxTimestamp < fromTs || entry.minTimestamp > toTs) + continue; + + final long[] timestamps = decompressTimestamps(entry, tsColIdx); + final double[] values = decompressDoubleColumn(entry, targetColSchemaIdx); + + for (int i = 0; i < timestamps.length; i++) { + if (timestamps[i] < fromTs || timestamps[i] > toTs) + continue; + + final long bucketTs = bucketIntervalMs > 0 ? (timestamps[i] / bucketIntervalMs) * bucketIntervalMs : fromTs; + + accumulateSample(result, bucketTs, values[i], type); + } + } + return result; + } finally { + directoryLock.readLock().unlock(); + } + } + + /** + * Push-down multi-column aggregation on sealed blocks. + * Processes compressed blocks directly without creating Object[] row arrays. + * When a block fits entirely within a single time bucket, uses block-level + * statistics (min/max/sum/count) to skip decompression entirely. + */ + public void aggregateMultiBlocks(final long fromTs, final long toTs, + final List requests, final long bucketIntervalMs, + final MultiColumnAggregationResult result, final AggregationMetrics metrics, + final TagFilter tagFilter) throws IOException { + final int tsColIdx = findTimestampColumnIndex(); + final int reqCount = requests.size(); + + // Pre-compute schema column indices for each request + final int[] schemaColIndices = new int[reqCount]; + final boolean[] isCount = new boolean[reqCount]; + for (int r = 0; r < reqCount; r++) { + isCount[r] = requests.get(r).type() == AggregationType.COUNT; + if (!isCount[r]) + schemaColIndices[r] = requests.get(r).columnIndex(); + else + schemaColIndices[r] = -1; + } + + final double[] rowValues = new double[reqCount]; + + // Pre-allocate decode buffers reused across all blocks in this call + final long[] reusableTsBuf = new long[MAX_BLOCK_SIZE]; + final double[] reusableValBuf = new double[MAX_BLOCK_SIZE]; + + // Hold the read lock for the entire scan including file I/O to prevent stale offsets + // after atomic file replacement by concurrent writers (truncate/downsample). + directoryLock.readLock().lock(); + try { + for (final BlockEntry entry : blockDirectory) { + if (entry.maxTimestamp < fromTs || entry.minTimestamp > toTs) { + if (metrics != null) + metrics.addSkippedBlock(); + continue; + } + + // Block-level tag filter: SKIP blocks that cannot contain matching rows + final BlockMatchResult tagMatch = tagFilter != null + ? blockMatchesTagFilter(entry, tagFilter) + : BlockMatchResult.FAST_PATH; + if (tagMatch == BlockMatchResult.SKIP) { + if (metrics != null) + metrics.addSkippedBlock(); + continue; + } + + // Check if entire block falls within a single time bucket and is fully inside the query range + // FAST_PATH: block is homogeneous for the filtered tag, so block-level stats are valid + if (tagMatch == BlockMatchResult.FAST_PATH + && bucketIntervalMs > 0 && entry.minTimestamp >= fromTs && entry.maxTimestamp <= toTs) { + final long blockMinBucket = (entry.minTimestamp / bucketIntervalMs) * bucketIntervalMs; + final long blockMaxBucket = (entry.maxTimestamp / bucketIntervalMs) * bucketIntervalMs; + + if (blockMinBucket == blockMaxBucket) { + // FAST PATH: use block-level stats directly — no decompression needed + if (metrics != null) + metrics.addFastPathBlock(); + for (int r = 0; r < reqCount; r++) { + if (isCount[r]) + rowValues[r] = entry.sampleCount; + else { + final int sci = schemaColIndices[r]; + rowValues[r] = switch (requests.get(r).type()) { + case MIN -> entry.columnMins[sci]; + case MAX -> entry.columnMaxs[sci]; + case SUM, AVG -> entry.columnSums[sci]; + case COUNT -> entry.sampleCount; + }; + } + } + result.accumulateBlockStats(blockMinBucket, rowValues, entry.sampleCount); + continue; + } + } + + // SLOW PATH: decompress and iterate (boundary blocks spanning multiple buckets) + if (metrics != null) + metrics.addSlowPathBlock(); + + // Coalesced I/O: read all column data in one pread call + long t0 = metrics != null ? System.nanoTime() : 0; + final byte[] blockData = readBlockData(entry); + if (metrics != null) + metrics.addIo(System.nanoTime() - t0); + + // Decode timestamps into reusable buffer + t0 = metrics != null ? System.nanoTime() : 0; + final int tsCount = DeltaOfDeltaCodec.decode( + sliceColumn(blockData, entry, tsColIdx), reusableTsBuf); + if (metrics != null) + metrics.addDecompTs(System.nanoTime() - t0); + + // Decompress only the columns needed by the requests (deduplicated) + // Use reusable buffer for the first column; allocate for additional distinct columns + final double[][] decompressedCols = new double[columns.size()][]; + boolean reusableValBufferUsed = false; + for (int r = 0; r < reqCount; r++) { + if (!isCount[r] && decompressedCols[schemaColIndices[r]] == null) { + t0 = metrics != null ? System.nanoTime() : 0; + final byte[] colBytes = sliceColumn(blockData, entry, schemaColIndices[r]); + final ColumnDefinition col = columns.get(schemaColIndices[r]); + if (!reusableValBufferUsed && col.getCompressionHint() == TimeSeriesCodec.GORILLA_XOR) { + // Decode into reusable buffer (only safe for one column at a time) + GorillaXORCodec.decode(colBytes, reusableValBuf); + decompressedCols[schemaColIndices[r]] = reusableValBuf; + reusableValBufferUsed = true; + } else { + decompressedCols[schemaColIndices[r]] = decompressDoubleColumnFromBytes(colBytes, schemaColIndices[r]); + } + if (metrics != null) + metrics.addDecompVal(System.nanoTime() - t0); + } + } + + // Decompress tag columns for SLOW_PATH tag filtering + final boolean needRowTagFilter = tagFilter != null && tagMatch == BlockMatchResult.SLOW_PATH; + String[][] tagCols = null; + List filterConditions = null; + if (needRowTagFilter) { + filterConditions = tagFilter.getConditions(); + tagCols = new String[filterConditions.size()][]; + for (int ci = 0; ci < filterConditions.size(); ci++) { + final int schemaIdx = findNonTsColumnSchemaIndex(filterConditions.get(ci).columnIndex()); + final byte[] colBytes = sliceColumn(blockData, entry, schemaIdx); + tagCols[ci] = DictionaryCodec.decode(colBytes); + } + } + + // Use tsCount (not array length) since reusableTsBuf may be larger than actual data + final long[] timestamps = reusableTsBuf; + + // Aggregate using segment-based vectorized accumulation + t0 = metrics != null ? System.nanoTime() : 0; + + // Clip to query range using binary search on sorted timestamps + final int rangeStart = lowerBound(timestamps, 0, tsCount, fromTs); + final int rangeEnd = upperBound(timestamps, 0, tsCount, toTs); + + if (bucketIntervalMs > 0) { + if (needRowTagFilter) { + // Per-row accumulation with tag filtering (cannot use SIMD on mixed-tag blocks) + for (int i = rangeStart; i < rangeEnd; i++) { + if (!matchesTagConditions(tagCols, filterConditions, i)) + continue; + final long bucketTs = (timestamps[i] / bucketIntervalMs) * bucketIntervalMs; + for (int r = 0; r < reqCount; r++) { + if (isCount[r]) + result.accumulateSingleStat(bucketTs, r, 1.0, 1); + else + result.accumulateSingleStat(bucketTs, r, decompressedCols[schemaColIndices[r]][i], 1); + } + } + } else { + // Vectorized path: find contiguous segments within each bucket and use SIMD ops + final TimeSeriesVectorOps ops = TimeSeriesVectorOpsProvider.getInstance(); + + int segStart = rangeStart; + while (segStart < rangeEnd) { + final long bucketTs = (timestamps[segStart] / bucketIntervalMs) * bucketIntervalMs; + final long nextBucketTs = bucketTs + bucketIntervalMs; + + // Find end of this bucket's segment + int segEnd = segStart + 1; + while (segEnd < rangeEnd && timestamps[segEnd] < nextBucketTs) + segEnd++; + + final int segLen = segEnd - segStart; + + // Accumulate each request using vectorized ops on the segment + for (int r = 0; r < reqCount; r++) { + if (isCount[r]) { + result.accumulateSingleStat(bucketTs, r, segLen, segLen); + } else { + final double[] colData = decompressedCols[schemaColIndices[r]]; + final double val = switch (requests.get(r).type()) { + case SUM, AVG -> ops.sum(colData, segStart, segLen); + case MIN -> ops.min(colData, segStart, segLen); + case MAX -> ops.max(colData, segStart, segLen); + case COUNT -> segLen; + }; + result.accumulateSingleStat(bucketTs, r, val, segLen); + } + } + + segStart = segEnd; + } + } + } else { + // No bucket interval — accumulate all into one bucket + for (int i = 0; i < tsCount; i++) { + final long ts = timestamps[i]; + if (ts < fromTs || ts > toTs) + continue; + + if (needRowTagFilter && !matchesTagConditions(tagCols, filterConditions, i)) + continue; + + for (int r = 0; r < reqCount; r++) + rowValues[r] = isCount[r] ? 1.0 : decompressedCols[schemaColIndices[r]][i]; + + result.accumulateRow(fromTs, rowValues); + } + } + if (metrics != null) + metrics.addAccum(System.nanoTime() - t0); + } + } finally { + directoryLock.readLock().unlock(); + } + } + + /** + * Removes all blocks with maxTimestamp < threshold. + */ + public void truncateBefore(final long timestamp) throws IOException { + directoryLock.writeLock().lock(); + try { + final List retained = new ArrayList<>(); + for (final BlockEntry entry : blockDirectory) + if (entry.maxTimestamp >= timestamp) + retained.add(entry); + + if (retained.size() == blockDirectory.size()) + return; // Nothing to truncate + + // Rewrite the file with only retained blocks + final int colCount = columns.size(); + final String tempPath = basePath + ".ts.sealed.tmp"; + + // Build new directory in a local list — do NOT modify blockDirectory until after the + // atomic file swap. If Files.move() fails, the live file and blockDirectory remain intact. + final List newDirectory = new ArrayList<>(); + try (final RandomAccessFile tempFile = new RandomAccessFile(tempPath, "rw")) { + final ByteBuffer headerBuf = ByteBuffer.allocate(HEADER_SIZE); + headerBuf.putInt(MAGIC_VALUE); + headerBuf.put((byte) CURRENT_VERSION); + headerBuf.putShort((short) colCount); + headerBuf.putInt(0); + headerBuf.putLong(Long.MAX_VALUE); + headerBuf.putLong(Long.MIN_VALUE); + headerBuf.flip(); + tempFile.getChannel().write(headerBuf); + + for (final BlockEntry oldEntry : retained) + copyBlockToFile(tempFile, oldEntry, colCount, newDirectory); + } + + // Atomic file swap: close handles first (required on Windows), then atomically replace. + // If the move fails, reopen the original file so the store remains usable. + indexChannel.close(); + indexFile.close(); + + final File oldFile = new File(basePath + ".ts.sealed"); + final File tmpFile = new File(tempPath); + try { + Files.move(tmpFile.toPath(), oldFile.toPath(), StandardCopyOption.ATOMIC_MOVE, StandardCopyOption.REPLACE_EXISTING); + } catch (final IOException moveEx) { + // Move failed — original file is still in place; reopen handles so the store stays usable + try { + indexFile = new RandomAccessFile(oldFile, "rw"); + indexChannel = indexFile.getChannel(); + } catch (final IOException reopenEx) { + LogManager.instance().log(this, Level.SEVERE, + "Failed to reopen sealed store after failed atomic move: %s", reopenEx, oldFile.getAbsolutePath()); + } + throw moveEx; + } + + // Only update in-memory state after the successful file swap + blockDirectory.clear(); + blockDirectory.addAll(newDirectory); + globalMinTs = Long.MAX_VALUE; + globalMaxTs = Long.MIN_VALUE; + for (final BlockEntry e : blockDirectory) { + if (e.minTimestamp < globalMinTs) globalMinTs = e.minTimestamp; + if (e.maxTimestamp > globalMaxTs) globalMaxTs = e.maxTimestamp; + } + + indexFile = new RandomAccessFile(oldFile, "rw"); + indexChannel = indexFile.getChannel(); + rewriteHeader(); + } finally { + directoryLock.writeLock().unlock(); + } + } + + /** + * Downsamples blocks older than cutoffTs to the given granularity. + * Blocks already at the target resolution or coarser are left untouched (idempotency). + * Numeric fields are averaged per (bucketTs, tagKey) group; tag columns preserved. + */ + public void downsampleBlocks(final long cutoffTs, final long granularityMs, + final int tsColIdx, final List tagColIndices, final List numericColIndices) throws IOException { + directoryLock.writeLock().lock(); + try { + + final List toDownsample = new ArrayList<>(); + final List toKeep = new ArrayList<>(); + + for (final BlockEntry entry : blockDirectory) { + if (entry.maxTimestamp >= cutoffTs) { + toKeep.add(entry); + continue; + } + // Check if block is already at target resolution (density check) + if (entry.sampleCount <= 1 || (entry.sampleCount > 1 + && (entry.maxTimestamp - entry.minTimestamp) / (entry.sampleCount - 1) >= granularityMs)) { + toKeep.add(entry); + continue; + } + toDownsample.add(entry); + } + + if (toDownsample.isEmpty()) + return; + + // Decompress all qualifying blocks and aggregate per (bucketTs, tagKey) + // Use List as map key (not null-byte-joined String) since tag values may contain null bytes. + final Map, Map> groupedData = new HashMap<>(); // tagKey -> (bucketTs -> [sum0, count0, sum1, count1, ...]) + final int numFields = numericColIndices.size(); + final int accSize = numFields * 2; // sum + count per numeric field + + for (final BlockEntry entry : toDownsample) { + final long[] timestamps = decompressTimestamps(entry, tsColIdx); + + // Decompress tag columns + final Object[][] tagData = new Object[tagColIndices.size()][]; + for (int t = 0; t < tagColIndices.size(); t++) { + final int ci = tagColIndices.get(t); + final byte[] compressed = readBytes(entry.columnOffsets[ci], entry.columnSizes[ci]); + tagData[t] = switch (columns.get(ci).getCompressionHint()) { + case DICTIONARY -> { + final String[] vals = DictionaryCodec.decode(compressed); + final Object[] boxed = new Object[vals.length]; + System.arraycopy(vals, 0, boxed, 0, vals.length); + yield boxed; + } + default -> new Object[entry.sampleCount]; + }; + } + + // Decompress numeric columns + final double[][] numData = new double[numFields][]; + for (int n = 0; n < numFields; n++) { + final int ci = numericColIndices.get(n); + numData[n] = decompressDoubleColumn(entry, ci); + } + + // Group samples by (tagValues list, bucketTs) + for (int i = 0; i < timestamps.length; i++) { + final long bucketTs = (timestamps[i] / granularityMs) * granularityMs; + + // Build tag key as List to avoid ambiguity with null bytes in tag values + final List tagKey = new ArrayList<>(tagData.length); + for (final Object[] tagCol : tagData) + tagKey.add(tagCol[i] != null ? tagCol[i].toString() : ""); + + final Map buckets = groupedData.computeIfAbsent(tagKey, k -> new HashMap<>()); + final double[] acc = buckets.computeIfAbsent(bucketTs, k -> new double[accSize]); + for (int n = 0; n < numFields; n++) { + acc[n * 2] += numData[n][i]; // sum + acc[n * 2 + 1] += 1.0; // count + } + } + } + + // Build new downsampled samples from grouped data + final List newSamples = new ArrayList<>(); + for (final Map.Entry, Map> tagEntry : groupedData.entrySet()) { + final List tagParts = tagEntry.getKey(); + for (final Map.Entry bucketEntry : tagEntry.getValue().entrySet()) { + final long bucketTs = bucketEntry.getKey(); + final double[] acc = bucketEntry.getValue(); + + // Build a full row: [timestamp, tag0, tag1, ..., field0, field1, ...] + // ordered by column index + final Object[] row = new Object[columns.size()]; + row[tsColIdx] = bucketTs; + for (int t = 0; t < tagColIndices.size(); t++) + row[tagColIndices.get(t)] = t < tagParts.size() ? tagParts.get(t) : ""; + for (int n = 0; n < numFields; n++) { + final double count = acc[n * 2 + 1]; + row[numericColIndices.get(n)] = count > 0 ? acc[n * 2] / count : 0.0; + } + newSamples.add(row); + } + } + + // Sort by timestamp + newSamples.sort(Comparator.comparingLong(row -> (long) row[tsColIdx])); + + // Build new sealed blocks from downsampled data + final int colCount = columns.size(); + final List newBlocksCompressed = new ArrayList<>(); + final List newBlocksMeta = new ArrayList<>(); // [minTs, maxTs, sampleCount] + final List newBlocksMins = new ArrayList<>(); + final List newBlocksMaxs = new ArrayList<>(); + final List newBlocksSums = new ArrayList<>(); + final List newBlocksTagDV = new ArrayList<>(); + + int chunkStart = 0; + while (chunkStart < newSamples.size()) { + final int chunkEnd = Math.min(chunkStart + MAX_BLOCK_SIZE, newSamples.size()); + final int chunkLen = chunkEnd - chunkStart; + + // Extract timestamps for this chunk + final long[] chunkTs = new long[chunkLen]; + for (int i = 0; i < chunkLen; i++) + chunkTs[i] = (long) newSamples.get(chunkStart + i)[tsColIdx]; + + // Per-column stats + final double[] mins = new double[colCount]; + final double[] maxs = new double[colCount]; + final double[] sums = new double[colCount]; + Arrays.fill(mins, Double.NaN); + Arrays.fill(maxs, Double.NaN); + + final byte[][] compressedCols = new byte[colCount][]; + for (int c = 0; c < colCount; c++) { + if (c == tsColIdx) { + compressedCols[c] = DeltaOfDeltaCodec.encode(chunkTs); + } else { + final Object[] chunkValues = new Object[chunkLen]; + for (int i = 0; i < chunkLen; i++) + chunkValues[i] = newSamples.get(chunkStart + i)[c]; + compressedCols[c] = compressColumn(columns.get(c), chunkValues); + + // Compute stats for numeric columns + final TimeSeriesCodec codec = columns.get(c).getCompressionHint(); + if (codec == TimeSeriesCodec.GORILLA_XOR || codec == TimeSeriesCodec.SIMPLE8B) { + double min = Double.MAX_VALUE, max = -Double.MAX_VALUE, sum = 0; + for (final Object v : chunkValues) { + final double d = v != null ? ((Number) v).doubleValue() : 0.0; + if (d < min) + min = d; + if (d > max) + max = d; + sum += d; + } + mins[c] = min; + maxs[c] = max; + sums[c] = sum; + } + } + } + + // Collect distinct tag values for this chunk + final String[][] chunkTagDV = new String[colCount][]; + for (int c = 0; c < colCount; c++) { + if (columns.get(c).getRole() == ColumnDefinition.ColumnRole.TAG) { + final LinkedHashSet distinctSet = new LinkedHashSet<>(); + for (int i = chunkStart; i < chunkEnd; i++) { + final Object val = newSamples.get(i)[c]; + distinctSet.add(val != null ? val.toString() : ""); + } + chunkTagDV[c] = distinctSet.toArray(new String[0]); + } + } + + newBlocksCompressed.add(compressedCols); + newBlocksMeta.add(new long[] { chunkTs[0], chunkTs[chunkLen - 1], chunkLen }); + newBlocksMins.add(mins); + newBlocksMaxs.add(maxs); + newBlocksSums.add(sums); + newBlocksTagDV.add(chunkTagDV); + chunkStart = chunkEnd; + } + + // Rewrite sealed file: toKeep blocks (raw copy) + new downsampled blocks + rewriteWithBlocks(toKeep, newBlocksCompressed, newBlocksMeta, newBlocksMins, newBlocksMaxs, newBlocksSums, + newBlocksTagDV); + } finally { + directoryLock.writeLock().unlock(); + } + } + + /** + * Rewrites the sealed file, copying retained blocks as raw bytes and appending new blocks. + * Blocks are written in ascending minTimestamp order so that the on-disk layout matches + * the in-memory block directory, preserving binary search correctness after a restart. + * Uses atomic tmp-file rename. + */ + private void rewriteWithBlocks(final List retained, + final List newCompressed, final List newMeta, + final List newMins, final List newMaxs, final List newSums, + final List newTagDistinctValues) throws IOException { + + final int colCount = columns.size(); + final String tempPath = basePath + ".ts.sealed.tmp"; + + // Build a merged, minTimestamp-sorted write plan so that the on-disk layout is + // always in ascending order (required by binary search in iterateRange/scanRange). + // A negative index means a "new" (downsampled) block; non-negative means retained. + record WriteSpec(long minTs, boolean retained, int idx) {} + final List writeOrder = new ArrayList<>(retained.size() + newCompressed.size()); + for (int i = 0; i < retained.size(); i++) + writeOrder.add(new WriteSpec(retained.get(i).minTimestamp, true, i)); + for (int b = 0; b < newCompressed.size(); b++) + writeOrder.add(new WriteSpec(newMeta.get(b)[0], false, b)); + writeOrder.sort(Comparator.comparingLong(WriteSpec::minTs)); + + // Build new directory in a local list — do NOT modify blockDirectory until after the + // atomic file swap. If Files.move() fails, the live file and blockDirectory remain intact. + final List newDirectory = new ArrayList<>(); + try (final RandomAccessFile tempFile = new RandomAccessFile(tempPath, "rw")) { + tempFile.setLength(0); + // Write placeholder header + final ByteBuffer headerBuf = ByteBuffer.allocate(HEADER_SIZE); + headerBuf.putInt(MAGIC_VALUE); + headerBuf.put((byte) CURRENT_VERSION); + headerBuf.putShort((short) colCount); + headerBuf.putInt(0); + headerBuf.putLong(Long.MAX_VALUE); + headerBuf.putLong(Long.MIN_VALUE); + headerBuf.flip(); + tempFile.getChannel().write(headerBuf); + + // Write all blocks in ascending minTimestamp order + for (final WriteSpec spec : writeOrder) { + if (spec.retained()) { + copyBlockToFile(tempFile, retained.get(spec.idx()), colCount, newDirectory); + } else { + final int b = spec.idx(); + final long[] meta = newMeta.get(b); + final BlockEntry entry = writeNewBlockToFile(tempFile, (int) meta[2], meta[0], meta[1], + newCompressed.get(b), newMins.get(b), newMaxs.get(b), newSums.get(b), colCount, + newTagDistinctValues != null ? newTagDistinctValues.get(b) : null); + newDirectory.add(entry); + } + } + } + + // Atomic file swap: close handles first (required on Windows), then atomically replace. + // If the move fails, reopen the original file so the store remains usable. + indexChannel.close(); + indexFile.close(); + + final File oldFile = new File(basePath + ".ts.sealed"); + final File tmpFile = new File(tempPath); + try { + Files.move(tmpFile.toPath(), oldFile.toPath(), StandardCopyOption.ATOMIC_MOVE, StandardCopyOption.REPLACE_EXISTING); + } catch (final IOException moveEx) { + // Move failed — original file is still in place; reopen handles so the store stays usable + try { + indexFile = new RandomAccessFile(oldFile, "rw"); + indexChannel = indexFile.getChannel(); + } catch (final IOException reopenEx) { + LogManager.instance().log(this, Level.SEVERE, + "Failed to reopen sealed store after failed atomic move: %s", reopenEx, oldFile.getAbsolutePath()); + } + throw moveEx; + } + + // Only update in-memory state after the successful file swap + blockDirectory.clear(); + blockDirectory.addAll(newDirectory); + globalMinTs = Long.MAX_VALUE; + globalMaxTs = Long.MIN_VALUE; + for (final BlockEntry e : blockDirectory) { + if (e.minTimestamp < globalMinTs) globalMinTs = e.minTimestamp; + if (e.maxTimestamp > globalMaxTs) globalMaxTs = e.maxTimestamp; + } + + indexFile = new RandomAccessFile(oldFile, "rw"); + indexChannel = indexFile.getChannel(); + rewriteHeader(); + } + + private void copyBlockToFile(final RandomAccessFile tempFile, final BlockEntry oldEntry, final int colCount, + final List target) throws IOException { + final byte[][] compressedCols = new byte[colCount][]; + for (int c = 0; c < colCount; c++) + compressedCols[c] = readBytes(oldEntry.columnOffsets[c], oldEntry.columnSizes[c]); + + final BlockEntry newEntry = writeNewBlockToFile(tempFile, oldEntry.sampleCount, oldEntry.minTimestamp, + oldEntry.maxTimestamp, compressedCols, oldEntry.columnMins, oldEntry.columnMaxs, oldEntry.columnSums, + colCount, oldEntry.tagDistinctValues); + target.add(newEntry); + } + + /** + * Writes a single block to a temp file and returns the resulting {@link BlockEntry}. + * Does NOT modify {@link #blockDirectory} or the global min/max timestamps — + * callers are responsible for those updates. + */ + private BlockEntry writeNewBlockToFile(final RandomAccessFile tempFile, final int sampleCount, + final long minTs, final long maxTs, final byte[][] compressedCols, + final double[] columnMins, final double[] columnMaxs, final double[] columnSums, + final int colCount, final String[][] tagDistinctValues) throws IOException { + + int numericColCount = 0; + for (int c = 0; c < colCount; c++) + if (!Double.isNaN(columnMins[c])) + numericColCount++; + + final byte[] tagMeta = buildTagMetadata(tagDistinctValues, colCount); + + final int statsSize = 4 + (8 + 8 + 8) * numericColCount; + final int metaSize = 4 + 8 + 8 + 4 + 4 * colCount + statsSize + tagMeta.length; + final ByteBuffer metaBuf = ByteBuffer.allocate(metaSize); + metaBuf.putInt(BLOCK_MAGIC_VALUE); + metaBuf.putLong(minTs); + metaBuf.putLong(maxTs); + metaBuf.putInt(sampleCount); + for (final byte[] col : compressedCols) + metaBuf.putInt(col.length); + metaBuf.putInt(numericColCount); + for (int c = 0; c < colCount; c++) { + if (!Double.isNaN(columnMins[c])) { + metaBuf.putDouble(columnMins[c]); + metaBuf.putDouble(columnMaxs[c]); + metaBuf.putDouble(columnSums[c]); + } + } + metaBuf.put(tagMeta); + metaBuf.flip(); + + // Use metaBuf.limit() (not .array().length) since the backing array may be larger after flip + final CRC32 crc = new CRC32(); + crc.update(metaBuf.array(), 0, metaBuf.limit()); + + long dataOffset = tempFile.length(); + tempFile.seek(dataOffset); + tempFile.write(metaBuf.array(), 0, metaBuf.limit()); + dataOffset += metaSize; + + final BlockEntry newEntry = new BlockEntry(minTs, maxTs, sampleCount, colCount, columnMins, columnMaxs, columnSums); + newEntry.tagDistinctValues = tagDistinctValues; + for (int c = 0; c < colCount; c++) { + newEntry.columnOffsets[c] = dataOffset; + newEntry.columnSizes[c] = compressedCols[c].length; + crc.update(compressedCols[c]); + tempFile.write(compressedCols[c]); + dataOffset += compressedCols[c].length; + } + + final ByteBuffer crcBuf = ByteBuffer.allocate(4); + crcBuf.putInt((int) crc.getValue()); + crcBuf.flip(); + tempFile.write(crcBuf.array()); + + return newEntry; + } + + /** + * Lock-free phase of compaction: snapshots existing sealed blocks (under a brief read lock), + * then writes all of them plus the new compressed blocks to {@code .ts.sealed.tmp} — entirely + * without holding any lock. + *

+ * Call {@link #commitTempCompactionFile(List)} under the caller's write lock to atomically + * swap the temp file for the live sealed file and install the returned block directory. + * + * @param newCompressed compressed column bytes for each new block + * @param newMeta {@code [minTs, maxTs, sampleCount]} for each new block + * @param newMins per-column min stats for each new block + * @param newMaxs per-column max stats for each new block + * @param newSums per-column sum stats for each new block + * @param newTagDistinctValues tag metadata for each new block (may be null) + * + * @return the new {@link BlockEntry} list to pass to {@link #commitTempCompactionFile(List)} + */ + List writeTempCompactionFile( + final List newCompressed, final List newMeta, + final List newMins, final List newMaxs, final List newSums, + final List newTagDistinctValues) throws IOException { + + final int colCount = columns.size(); + final String tempPath = basePath + ".ts.sealed.tmp"; + + // Snapshot the current block list and pre-read all retained block bytes under the read lock. + // This guards against concurrent truncateBefore / downsampleBlocks closing the channel. + final List retained; + final List retainedBytes; + directoryLock.readLock().lock(); + try { + retained = new ArrayList<>(blockDirectory); + retainedBytes = new ArrayList<>(retained.size()); + for (final BlockEntry e : retained) { + final byte[][] cols = new byte[colCount][]; + for (int c = 0; c < colCount; c++) + cols[c] = readBytes(e.columnOffsets[c], e.columnSizes[c]); + retainedBytes.add(cols); + } + } finally { + directoryLock.readLock().unlock(); + } + + // Build merged, minTimestamp-sorted write plan (same ordering as rewriteWithBlocks). + record WriteSpec(long minTs, boolean isRetained, int idx) {} + final List writeOrder = new ArrayList<>(retained.size() + newCompressed.size()); + for (int i = 0; i < retained.size(); i++) + writeOrder.add(new WriteSpec(retained.get(i).minTimestamp, true, i)); + for (int b = 0; b < newCompressed.size(); b++) + writeOrder.add(new WriteSpec(newMeta.get(b)[0], false, b)); + writeOrder.sort(Comparator.comparingLong(WriteSpec::minTs)); + + final List newDirectory = new ArrayList<>(writeOrder.size()); + + // Write placeholder header + all blocks to the temp file (no lock held). + // Truncate first so any leftover bytes from a previous partial write are cleared. + try (final RandomAccessFile tempFile = new RandomAccessFile(tempPath, "rw")) { + tempFile.setLength(0); + final ByteBuffer headerBuf = ByteBuffer.allocate(HEADER_SIZE); + headerBuf.putInt(MAGIC_VALUE); + headerBuf.put((byte) CURRENT_VERSION); + headerBuf.putShort((short) colCount); + headerBuf.putInt(0); + headerBuf.putLong(Long.MAX_VALUE); + headerBuf.putLong(Long.MIN_VALUE); + headerBuf.flip(); + tempFile.getChannel().write(headerBuf); + + for (final WriteSpec spec : writeOrder) { + final BlockEntry entry; + if (spec.isRetained()) { + final int i = spec.idx(); + final BlockEntry old = retained.get(i); + entry = writeNewBlockToFile(tempFile, old.sampleCount, old.minTimestamp, old.maxTimestamp, + retainedBytes.get(i), old.columnMins, old.columnMaxs, old.columnSums, colCount, old.tagDistinctValues); + } else { + final int b = spec.idx(); + final long[] meta = newMeta.get(b); + entry = writeNewBlockToFile(tempFile, (int) meta[2], meta[0], meta[1], + newCompressed.get(b), newMins.get(b), newMaxs.get(b), newSums.get(b), colCount, + newTagDistinctValues != null ? newTagDistinctValues.get(b) : null); + } + newDirectory.add(entry); + } + } + + return newDirectory; + } + + /** + * Completes the compaction by atomically swapping {@code .ts.sealed.tmp} for the live + * {@code .ts.sealed} file and installing the given block directory. + *

+ * Must be called while the caller holds its own write lock (e.g. + * {@code compactionLock.writeLock()} in {@link TimeSeriesShard}) to prevent concurrent + * queries from reading the sealed store while the channel is being replaced. + * This method also acquires {@link #directoryLock} internally for the in-memory updates. + * + * @param newBlockDirectory the block entries returned by {@link #writeTempCompactionFile} + */ + void commitTempCompactionFile(final List newBlockDirectory) throws IOException { + directoryLock.writeLock().lock(); + try { + // Atomic file swap: close handles first (required on Windows), then atomically replace. + // If the move fails, reopen the original file so the store remains usable. + indexChannel.close(); + indexFile.close(); + + final File sealedFile = new File(basePath + ".ts.sealed"); + final File tmpFile = new File(basePath + ".ts.sealed.tmp"); + try { + Files.move(tmpFile.toPath(), sealedFile.toPath(), StandardCopyOption.ATOMIC_MOVE, StandardCopyOption.REPLACE_EXISTING); + } catch (final IOException moveEx) { + // Move failed — original file is still in place; reopen handles so the store stays usable + try { + indexFile = new RandomAccessFile(sealedFile, "rw"); + indexChannel = indexFile.getChannel(); + } catch (final IOException reopenEx) { + LogManager.instance().log(this, Level.SEVERE, + "Failed to reopen sealed store after failed atomic move: %s", reopenEx, sealedFile.getAbsolutePath()); + } + throw moveEx; + } + + indexFile = new RandomAccessFile(sealedFile, "rw"); + indexChannel = indexFile.getChannel(); + + blockDirectory.clear(); + blockDirectory.addAll(newBlockDirectory); + + globalMinTs = Long.MAX_VALUE; + globalMaxTs = Long.MIN_VALUE; + for (final BlockEntry e : blockDirectory) { + if (e.minTimestamp < globalMinTs) + globalMinTs = e.minTimestamp; + if (e.maxTimestamp > globalMaxTs) + globalMaxTs = e.maxTimestamp; + } + + rewriteHeader(); + } finally { + directoryLock.writeLock().unlock(); + } + } + + /** + * Appends additional blocks to the existing {@code .ts.sealed.tmp} file that was + * already written by {@link #writeTempCompactionFile}. + *

+ * Used by Phase 4 of lock-free compaction (called under the caller's write lock) to + * include the partial last page's data that was read after the lock was acquired. + * Since the partial page is small (≤ one page's worth of samples), this is fast. + * + * @param newCompressed compressed column bytes for each additional block + * @param newMeta {@code [minTs, maxTs, sampleCount]} for each additional block + * @param newMins per-column min stats for each block + * @param newMaxs per-column max stats for each block + * @param newSums per-column sum stats for each block + * @param newTagDV tag metadata for each block (may be null) + * @param directory block directory from {@link #writeTempCompactionFile}; new + * entries are appended in-place + */ + void appendBlocksToTempFile( + final List newCompressed, final List newMeta, + final List newMins, final List newMaxs, final List newSums, + final List newTagDV, + final List directory) throws IOException { + if (newCompressed.isEmpty()) + return; + + final String tempPath = basePath + ".ts.sealed.tmp"; + final int colCount = columns.size(); + + try (final RandomAccessFile tempFile = new RandomAccessFile(tempPath, "rw")) { + // Read the current header to get existing block count and global min/max timestamps + final ByteBuffer hdrBuf = ByteBuffer.allocate(HEADER_SIZE); + tempFile.seek(0); + tempFile.getChannel().read(hdrBuf); + hdrBuf.flip(); + hdrBuf.position(7); // skip magic(4) + version(1) + colCount(2) + final int existingBlockCount = hdrBuf.getInt(); + long curGlobalMin = hdrBuf.getLong(); + long curGlobalMax = hdrBuf.getLong(); + + // Append each new block; writeNewBlockToFile always seeks to tempFile.length() + for (int b = 0; b < newCompressed.size(); b++) { + final long[] meta = newMeta.get(b); + final BlockEntry entry = writeNewBlockToFile(tempFile, (int) meta[2], meta[0], meta[1], + newCompressed.get(b), newMins.get(b), newMaxs.get(b), newSums.get(b), colCount, + newTagDV != null ? newTagDV.get(b) : null); + directory.add(entry); + if (meta[0] < curGlobalMin) + curGlobalMin = meta[0]; + if (meta[1] > curGlobalMax) + curGlobalMax = meta[1]; + } + + // Update the header: block count (offset 7) and global min/max (offsets 11 and 19) + final ByteBuffer updateBuf = ByteBuffer.allocate(4 + 8 + 8); + updateBuf.putInt(existingBlockCount + newCompressed.size()); + updateBuf.putLong(curGlobalMin); + updateBuf.putLong(curGlobalMax); + updateBuf.flip(); + tempFile.getChannel().write(updateBuf, 7); + } + } + + /** + * Deletes the temp compaction file ({@code .ts.sealed.tmp}) if it exists. + * Called from error-recovery paths to leave a clean state. + */ + void deleteTempFileIfExists() { + final File tmp = new File(basePath + ".ts.sealed.tmp"); + if (tmp.exists() && !tmp.delete()) + LogManager.instance().log(this, Level.WARNING, + "Failed to delete stale compaction temp file '%s'; next compaction may fail or use stale data", + null, tmp.getAbsolutePath()); + } + + static byte[] compressColumn(final ColumnDefinition col, final Object[] values) { + final TimeSeriesCodec codec = col.getCompressionHint(); + return switch (codec) { + case GORILLA_XOR -> { + final double[] doubles = new double[values.length]; + for (int i = 0; i < values.length; i++) + doubles[i] = values[i] != null ? ((Number) values[i]).doubleValue() : 0.0; + yield GorillaXORCodec.encode(doubles); + } + case SIMPLE8B -> { + final long[] longs = new long[values.length]; + for (int i = 0; i < values.length; i++) + longs[i] = values[i] != null ? ((Number) values[i]).longValue() : 0L; + yield Simple8bCodec.encode(longs); + } + case DICTIONARY -> { + final String[] strings = new String[values.length]; + for (int i = 0; i < values.length; i++) + strings[i] = values[i] != null ? values[i].toString() : ""; + yield DictionaryCodec.encode(strings); + } + default -> throw new IllegalStateException("Unknown compression codec: " + codec); + }; + } + + /** + * Truncates the sealed store to exactly {@code targetBlockCount} blocks, + * removing any blocks appended after the watermark during an interrupted compaction. + */ + public void truncateToBlockCount(final long targetBlockCount) throws IOException { + directoryLock.writeLock().lock(); + try { + if (targetBlockCount >= blockDirectory.size()) + return; // nothing to truncate + + final List retained = new ArrayList<>(blockDirectory.subList(0, (int) targetBlockCount)); + final int colCount = columns.size(); + final String tempPath = basePath + ".ts.sealed.tmp"; + + // Build new directory in a local list — do NOT modify blockDirectory until after the + // atomic file swap. If Files.move() fails, the live file and blockDirectory remain intact. + final List newDirectory = new ArrayList<>(); + try (final RandomAccessFile tempFile = new RandomAccessFile(tempPath, "rw")) { + tempFile.setLength(0); + final ByteBuffer headerBuf = ByteBuffer.allocate(HEADER_SIZE); + headerBuf.putInt(MAGIC_VALUE); + headerBuf.put((byte) CURRENT_VERSION); + headerBuf.putShort((short) colCount); + headerBuf.putInt(0); + headerBuf.putLong(Long.MAX_VALUE); + headerBuf.putLong(Long.MIN_VALUE); + headerBuf.flip(); + tempFile.getChannel().write(headerBuf); + + for (final BlockEntry entry : retained) + copyBlockToFile(tempFile, entry, colCount, newDirectory); + } + + // Atomic file swap: close handles first (required on Windows), then atomically replace. + // If the move fails, reopen the original file so the store remains usable. + indexChannel.close(); + indexFile.close(); + + final File oldFile = new File(basePath + ".ts.sealed"); + final File tmpFile = new File(tempPath); + try { + Files.move(tmpFile.toPath(), oldFile.toPath(), StandardCopyOption.ATOMIC_MOVE, StandardCopyOption.REPLACE_EXISTING); + } catch (final IOException moveEx) { + // Move failed — original file is still in place; reopen handles so the store stays usable + try { + indexFile = new RandomAccessFile(oldFile, "rw"); + indexChannel = indexFile.getChannel(); + } catch (final IOException reopenEx) { + LogManager.instance().log(this, Level.SEVERE, + "Failed to reopen sealed store after failed atomic move: %s", reopenEx, oldFile.getAbsolutePath()); + } + throw moveEx; + } + + // Only update in-memory state after the successful file swap + blockDirectory.clear(); + blockDirectory.addAll(newDirectory); + globalMinTs = Long.MAX_VALUE; + globalMaxTs = Long.MIN_VALUE; + for (final BlockEntry e : blockDirectory) { + if (e.minTimestamp < globalMinTs) globalMinTs = e.minTimestamp; + if (e.maxTimestamp > globalMaxTs) globalMaxTs = e.maxTimestamp; + } + + indexFile = new RandomAccessFile(oldFile, "rw"); + indexChannel = indexFile.getChannel(); + rewriteHeader(); + } finally { + directoryLock.writeLock().unlock(); + } + } + + public int getBlockCount() { + directoryLock.readLock().lock(); + try { + return blockDirectory.size(); + } finally { + directoryLock.readLock().unlock(); + } + } + + /** + * Returns the total number of samples across all sealed blocks. + * O(blockCount), all data already in memory from the block directory. + */ + public long getTotalSampleCount() { + directoryLock.readLock().lock(); + try { + long total = 0; + for (final BlockEntry entry : blockDirectory) + total += entry.sampleCount; + return total; + } finally { + directoryLock.readLock().unlock(); + } + } + + public long getGlobalMinTimestamp() { + return globalMinTs; + } + + public long getGlobalMaxTimestamp() { + return globalMaxTs; + } + + public long getBlockMinTimestamp(final int blockIndex) { + directoryLock.readLock().lock(); + try { + return blockDirectory.get(blockIndex).minTimestamp; + } finally { + directoryLock.readLock().unlock(); + } + } + + public long getBlockMaxTimestamp(final int blockIndex) { + directoryLock.readLock().lock(); + try { + return blockDirectory.get(blockIndex).maxTimestamp; + } finally { + directoryLock.readLock().unlock(); + } + } + + @Override + public void close() throws IOException { + flushHeader(); + if (indexChannel != null && indexChannel.isOpen()) + indexChannel.close(); + if (indexFile != null) + indexFile.close(); + } + + // --- Private helpers --- + + /** + * Returns SKIP if the block cannot contain matching rows, FAST_PATH if the block + * is homogeneous for the filtered tag(s), or SLOW_PATH if per-row filtering is needed. + */ + BlockMatchResult blockMatchesTagFilter(final BlockEntry entry, final TagFilter tagFilter) { + if (tagFilter == null) + return BlockMatchResult.FAST_PATH; + + if (entry.tagDistinctValues == null) + return BlockMatchResult.SLOW_PATH; + + boolean allSingleMatch = true; + for (final TagFilter.Condition cond : tagFilter.getConditions()) { + final int schemaIdx = findNonTsColumnSchemaIndex(cond.columnIndex()); + if (schemaIdx < 0 || schemaIdx >= entry.tagDistinctValues.length || entry.tagDistinctValues[schemaIdx] == null) + return BlockMatchResult.SLOW_PATH; + + final String[] distinctVals = entry.tagDistinctValues[schemaIdx]; + boolean anyMatch = false; + for (final String dv : distinctVals) { + if (cond.values().contains(dv)) { + anyMatch = true; + break; + } + } + if (!anyMatch) + return BlockMatchResult.SKIP; + + if (distinctVals.length != 1) + allSingleMatch = false; + } + return allSingleMatch ? BlockMatchResult.FAST_PATH : BlockMatchResult.SLOW_PATH; + } + + /** + * Checks if a row at index i matches all tag filter conditions. + */ + private static boolean matchesTagConditions(final String[][] tagCols, + final List conditions, final int i) { + for (int ci = 0; ci < tagCols.length; ci++) + if (!conditions.get(ci).values().contains(tagCols[ci][i])) + return false; + return true; + } + + /** + * Builds the tag metadata byte array for a block. + */ + private byte[] buildTagMetadata(final String[][] tagDistinctValues, final int colCount) { + short tagColCount = 0; + if (tagDistinctValues != null) + for (int c = 0; c < colCount; c++) + if (columns.get(c).getRole() == ColumnDefinition.ColumnRole.TAG && tagDistinctValues[c] != null) + tagColCount++; + + if (tagColCount == 0) { + final ByteBuffer buf = ByteBuffer.allocate(2); + buf.putShort((short) 0); + return buf.array(); + } + + // Pre-compute UTF-8 bytes + final byte[][][] utf8Values = new byte[colCount][][]; + int totalSize = 2; // tagColCount + for (int c = 0; c < colCount; c++) { + if (columns.get(c).getRole() == ColumnDefinition.ColumnRole.TAG + && tagDistinctValues[c] != null) { + utf8Values[c] = new byte[tagDistinctValues[c].length][]; + totalSize += 2; // distinctCount + for (int v = 0; v < tagDistinctValues[c].length; v++) { + utf8Values[c][v] = tagDistinctValues[c][v].getBytes(StandardCharsets.UTF_8); + if (utf8Values[c][v].length > 32767) + throw new IllegalArgumentException( + "Tag value too long: UTF-8 encoding is " + utf8Values[c][v].length + + " bytes (max 32767): '" + tagDistinctValues[c][v].substring(0, Math.min(40, tagDistinctValues[c][v].length())) + "...'"); + totalSize += 2 + utf8Values[c][v].length; + } + } + } + + final ByteBuffer buf = ByteBuffer.allocate(totalSize); + buf.putShort(tagColCount); + for (int c = 0; c < colCount; c++) { + if (utf8Values[c] != null) { + buf.putShort((short) utf8Values[c].length); + for (final byte[] val : utf8Values[c]) { + buf.putShort((short) val.length); + buf.put(val); + } + } + } + return buf.array(); + } + + private void writeEmptyHeader() throws IOException { + final ByteBuffer buf = ByteBuffer.allocate(HEADER_SIZE); + buf.putInt(MAGIC_VALUE); + buf.put((byte) CURRENT_VERSION); + buf.putShort((short) columns.size()); + buf.putInt(0); // block count + buf.putLong(Long.MAX_VALUE); // min ts + buf.putLong(Long.MIN_VALUE); // max ts + buf.flip(); + indexChannel.write(buf, 0); + indexChannel.force(true); + } + + private void rewriteHeader() throws IOException { + final ByteBuffer buf = ByteBuffer.allocate(HEADER_SIZE); + buf.putInt(MAGIC_VALUE); + buf.put((byte) CURRENT_VERSION); + buf.putShort((short) columns.size()); + buf.putInt(blockDirectory.size()); + buf.putLong(globalMinTs); + buf.putLong(globalMaxTs); + buf.flip(); + indexChannel.write(buf, 0); + indexChannel.force(false); + } + + private void loadDirectory() throws IOException { + final ByteBuffer headerBuf = ByteBuffer.allocate(HEADER_SIZE); + indexChannel.read(headerBuf, 0); + headerBuf.flip(); + + final int magic = headerBuf.getInt(); + if (magic != MAGIC_VALUE) + throw new IOException("Invalid sealed store magic: " + Integer.toHexString(magic)); + + final int version = headerBuf.get() & 0xFF; + if (version != CURRENT_VERSION) + throw new IOException( + "Unsupported sealed store format version " + version + " (expected: " + CURRENT_VERSION + ")"); + + final int colCount = headerBuf.getShort() & 0xFFFF; + if (colCount != columns.size()) + throw new IOException("Column count mismatch in sealed store header: file has " + colCount + + " columns, schema has " + columns.size()); + final int blockCount = headerBuf.getInt(); + globalMinTs = headerBuf.getLong(); + globalMaxTs = headerBuf.getLong(); + + // Rebuild block directory by scanning block metadata records + blockDirectory.clear(); + final long fileLength = indexFile.length(); + long pos = HEADER_SIZE; + + final int baseMetaSize = 4 + 8 + 8 + 4 + 4 * colCount; // magic + minTs + maxTs + sampleCount + colSizes + + while (pos + baseMetaSize <= fileLength) { + final ByteBuffer metaBuf = ByteBuffer.allocate(baseMetaSize); + if (indexChannel.read(metaBuf, pos) < baseMetaSize) + break; + metaBuf.flip(); + + final int blockMagic = metaBuf.getInt(); + if (blockMagic != BLOCK_MAGIC_VALUE) + break; // not a valid block header — stop scanning + + final long minTs = metaBuf.getLong(); + final long maxTs = metaBuf.getLong(); + final int sampleCount = metaBuf.getInt(); + + final int[] colSizes = new int[colCount]; + for (int c = 0; c < colCount; c++) + colSizes[c] = metaBuf.getInt(); + + // Read stats section: numericColCount(4) + [min(8) + max(8) + sum(8)] * numericColCount (schema order) + long statsPos = pos + baseMetaSize; + final ByteBuffer numBuf = ByteBuffer.allocate(4); + if (indexChannel.read(numBuf, statsPos) < 4) + break; + numBuf.flip(); + final int numericColCount = numBuf.getInt(); + statsPos += 4; + + final double[] mins = new double[colCount]; + final double[] maxs = new double[colCount]; + final double[] sums = new double[colCount]; + Arrays.fill(mins, Double.NaN); + Arrays.fill(maxs, Double.NaN); + + if (numericColCount > 0) { + final int tripletSize = (8 + 8 + 8) * numericColCount; + final ByteBuffer statsBuf = ByteBuffer.allocate(tripletSize); + if (indexChannel.read(statsBuf, statsPos) < tripletSize) + break; + statsBuf.flip(); + // Stats are in schema order — iterate columns, populate non-NaN entries + int numericIdx = 0; + for (int c = 0; c < colCount && numericIdx < numericColCount; c++) { + if (columns.get(c).getRole() == ColumnDefinition.ColumnRole.TIMESTAMP + || columns.get(c).getRole() == ColumnDefinition.ColumnRole.TAG) + continue; + mins[c] = statsBuf.getDouble(); + maxs[c] = statsBuf.getDouble(); + sums[c] = statsBuf.getDouble(); + numericIdx++; + } + statsPos += tripletSize; + } + + // Read tag metadata section + String[][] blockTagDistinctValues = null; + long tagEndPos = statsPos; + final ByteBuffer tagCountBuf = ByteBuffer.allocate(2); + if (indexChannel.read(tagCountBuf, tagEndPos) < 2) + break; + tagCountBuf.flip(); + // Read as unsigned short for safety + final int tagColCount = tagCountBuf.getShort() & 0xFFFF; + tagEndPos += 2; + + if (tagColCount > 0) { + blockTagDistinctValues = new String[colCount][]; + int tagIdx = 0; + for (int c = 0; c < colCount && tagIdx < tagColCount; c++) { + if (columns.get(c).getRole() != ColumnDefinition.ColumnRole.TAG) + continue; + final ByteBuffer dcBuf = ByteBuffer.allocate(2); + indexChannel.read(dcBuf, tagEndPos); + dcBuf.flip(); + // Read as unsigned short: values up to 65535 (MAX_DICTIONARY_SIZE) are valid + final int distinctCount = dcBuf.getShort() & 0xFFFF; + tagEndPos += 2; + + blockTagDistinctValues[c] = new String[distinctCount]; + for (int v = 0; v < distinctCount; v++) { + final ByteBuffer lenBuf = ByteBuffer.allocate(2); + indexChannel.read(lenBuf, tagEndPos); + lenBuf.flip(); + // Read as unsigned short: tag values are bounded to 32767 bytes at write time + final int valLen = lenBuf.getShort() & 0xFFFF; + tagEndPos += 2; + + final byte[] valBytes = new byte[valLen]; + final ByteBuffer valBuf = ByteBuffer.wrap(valBytes); + indexChannel.read(valBuf, tagEndPos); + blockTagDistinctValues[c][v] = new String(valBytes, StandardCharsets.UTF_8); + tagEndPos += valLen; + } + tagIdx++; + } + } + + final BlockEntry entry = new BlockEntry(minTs, maxTs, sampleCount, colCount, mins, maxs, sums); + entry.tagDistinctValues = blockTagDistinctValues; + entry.blockStartOffset = pos; + long dataPos = tagEndPos; + for (int c = 0; c < colCount; c++) { + entry.columnOffsets[c] = dataPos; + entry.columnSizes[c] = colSizes[c]; + dataPos += colSizes[c]; + } + + // Read stored CRC32 (validate lazily on first block read) + final ByteBuffer crcBuf = ByteBuffer.allocate(4); + if (indexChannel.read(crcBuf, dataPos) < 4) + throw new IOException("Unexpected end of sealed store: missing block CRC32"); + crcBuf.flip(); + entry.storedCRC = crcBuf.getInt(); + entry.crcValidated = false; + + dataPos += 4; // skip CRC + + blockDirectory.add(entry); + pos = dataPos; + } + } + + private long[] decompressTimestamps(final BlockEntry entry, final int tsColIdx) throws IOException { + validateBlockCRC(entry); + final byte[] compressed = readBytes(entry.columnOffsets[tsColIdx], entry.columnSizes[tsColIdx]); + return DeltaOfDeltaCodec.decode(compressed); + } + + private double[] decompressDoubleColumn(final BlockEntry entry, final int schemaColIdx) throws IOException { + final byte[] compressed = readBytes(entry.columnOffsets[schemaColIdx], entry.columnSizes[schemaColIdx]); + final ColumnDefinition col = columns.get(schemaColIdx); + + if (col.getCompressionHint() == TimeSeriesCodec.GORILLA_XOR) + return GorillaXORCodec.decode(compressed); + + // For SIMPLE8B encoded longs, convert to doubles + if (col.getCompressionHint() == TimeSeriesCodec.SIMPLE8B) { + final long[] longs = Simple8bCodec.decode(compressed); + final double[] result = new double[longs.length]; + for (int i = 0; i < longs.length; i++) + result[i] = longs[i]; + return result; + } + + throw new IllegalArgumentException( + "decompressDoubleColumn: codec " + col.getCompressionHint() + " is not a numeric codec (column " + schemaColIdx + ")"); + } + + private Object[][] decompressColumns(final BlockEntry entry, final int[] columnIndices, final int tsColIdx) throws IOException { + final List result = new ArrayList<>(); + + // Build a BitSet for O(1) column-index lookup in the hot path (avoids O(n) linear scan per column) + final BitSet colIndexSet; + if (columnIndices != null) { + colIndexSet = new BitSet(); + for (final int idx : columnIndices) + colIndexSet.set(idx); + } else { + colIndexSet = null; + } + + int nonTsIdx = 0; + for (int c = 0; c < columns.size(); c++) { + if (c == tsColIdx) + continue; + + if (colIndexSet != null && !colIndexSet.get(nonTsIdx)) { + nonTsIdx++; + continue; + } + + final byte[] compressed = readBytes(entry.columnOffsets[c], entry.columnSizes[c]); + final ColumnDefinition col = columns.get(c); + + final Object[] decompressed = switch (col.getCompressionHint()) { + case GORILLA_XOR -> { + final double[] vals = GorillaXORCodec.decode(compressed); + final Object[] boxed = new Object[vals.length]; + for (int i = 0; i < vals.length; i++) + boxed[i] = vals[i]; + yield boxed; + } + case SIMPLE8B -> { + final long[] vals = Simple8bCodec.decode(compressed); + final Object[] boxed = new Object[vals.length]; + if (col.getDataType() == Type.INTEGER) { + for (int i = 0; i < vals.length; i++) + boxed[i] = (int) vals[i]; + } else { + for (int i = 0; i < vals.length; i++) + boxed[i] = vals[i]; + } + yield boxed; + } + case DICTIONARY -> { + final String[] vals = DictionaryCodec.decode(compressed); + final Object[] boxed = new Object[vals.length]; + System.arraycopy(vals, 0, boxed, 0, vals.length); + yield boxed; + } + default -> new Object[entry.sampleCount]; + }; + + result.add(decompressed); + nonTsIdx++; + } + return result.toArray(new Object[0][]); + } + + private byte[] readBytes(final long offset, final int size) throws IOException { + final ByteBuffer buf = ByteBuffer.allocate(size); + int totalRead = 0; + while (totalRead < size) { + final int read = indexChannel.read(buf, offset + totalRead); + if (read == -1) + throw new IOException("Unexpected end of sealed store at offset " + (offset + totalRead)); + totalRead += read; + } + return buf.array(); + } + + /** + * Reads all column data for a block in a single I/O call. + * Columns are contiguous on disk, so one pread covers all of them. + */ + private byte[] readBlockData(final BlockEntry entry) throws IOException { + final long dataStart = entry.columnOffsets[0]; + int totalDataSize = 0; + for (final int s : entry.columnSizes) + totalDataSize += s; + final byte[] data = readBytes(dataStart, totalDataSize); + if (!entry.crcValidated) { + final int metaSize = (int) (dataStart - entry.blockStartOffset); + final byte[] metaBytes = readBytes(entry.blockStartOffset, metaSize); + final CRC32 crc = new CRC32(); + crc.update(metaBytes); + crc.update(data); + checkCRC(entry, crc); + } + return data; + } + + /** + * Validates block CRC32 on first access (used by scanRange/iterateRange paths). + * Reads the entire block (meta + data) in one call to verify. + */ + private void validateBlockCRC(final BlockEntry entry) throws IOException { + if (entry.crcValidated) + return; + final long endOfData = entry.columnOffsets[entry.columnSizes.length - 1] + + entry.columnSizes[entry.columnSizes.length - 1]; + final int blockSize = (int) (endOfData - entry.blockStartOffset); + final byte[] blockBytes = readBytes(entry.blockStartOffset, blockSize); + final CRC32 crc = new CRC32(); + crc.update(blockBytes); + checkCRC(entry, crc); + } + + private void checkCRC(final BlockEntry entry, final CRC32 crc) throws IOException { + if ((int) crc.getValue() != entry.storedCRC) + throw new IOException("CRC mismatch in sealed store block at offset " + entry.blockStartOffset + + " (stored=0x" + Integer.toHexString(entry.storedCRC) + + ", computed=0x" + Integer.toHexString((int) crc.getValue()) + ")"); + entry.crcValidated = true; + } + + /** + * Slices a single column's bytes from the coalesced block data. + */ + private static byte[] sliceColumn(final byte[] blockData, final BlockEntry entry, final int colIdx) { + final int offset = (int) (entry.columnOffsets[colIdx] - entry.columnOffsets[0]); + return Arrays.copyOfRange(blockData, offset, offset + entry.columnSizes[colIdx]); + } + + /** + * Decompresses a double column from pre-read bytes (no I/O). + */ + private double[] decompressDoubleColumnFromBytes(final byte[] compressed, final int schemaColIdx) throws IOException { + final ColumnDefinition col = columns.get(schemaColIdx); + + if (col.getCompressionHint() == TimeSeriesCodec.GORILLA_XOR) + return GorillaXORCodec.decode(compressed); + + if (col.getCompressionHint() == TimeSeriesCodec.SIMPLE8B) { + final long[] longs = Simple8bCodec.decode(compressed); + final double[] result = new double[longs.length]; + for (int i = 0; i < longs.length; i++) + result[i] = longs[i]; + return result; + } + + throw new IllegalArgumentException( + "decompressDoubleColumnFromBytes: codec " + col.getCompressionHint() + " is not a numeric codec (column " + schemaColIdx + ")"); + } + + private int findTimestampColumnIndex() { + for (int i = 0; i < columns.size(); i++) + if (columns.get(i).getRole() == ColumnDefinition.ColumnRole.TIMESTAMP) + return i; + return 0; + } + + private int findNonTsColumnSchemaIndex(final int nonTsIndex) { + int count = 0; + for (int i = 0; i < columns.size(); i++) { + if (columns.get(i).getRole() == ColumnDefinition.ColumnRole.TIMESTAMP) + continue; + if (count == nonTsIndex) + return i; + count++; + } + throw new IllegalArgumentException("Column index " + nonTsIndex + " out of range"); + } + + private void accumulateSample(final AggregationResult result, final long bucketTs, final double value, + final AggregationType type) { + final int idx = result.findBucketIndex(bucketTs); + if (idx >= 0) { + final double existing = result.getValue(idx); + final long count = result.getCount(idx); + final double merged = switch (type) { + case SUM -> existing + value; + case COUNT -> existing + 1; + case AVG -> existing + value; // accumulate sum, divide by count later + case MIN -> Math.min(existing, value); + case MAX -> Math.max(existing, value); + }; + result.updateValue(idx, merged); + result.updateCount(idx, count + 1); + } else { + result.addBucket(bucketTs, type == AggregationType.COUNT ? 1 : value, 1); + } + } + +} diff --git a/engine/src/main/java/com/arcadedb/engine/timeseries/TimeSeriesShard.java b/engine/src/main/java/com/arcadedb/engine/timeseries/TimeSeriesShard.java new file mode 100644 index 0000000000..c59ea0b4b8 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/engine/timeseries/TimeSeriesShard.java @@ -0,0 +1,818 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.database.DatabaseInternal; +import com.arcadedb.engine.timeseries.codec.DeltaOfDeltaCodec; +import com.arcadedb.engine.timeseries.codec.DictionaryCodec; +import com.arcadedb.engine.timeseries.codec.TimeSeriesCodec; +import com.arcadedb.exception.ConcurrentModificationException; +import com.arcadedb.schema.LocalSchema; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.HashSet; +import java.util.Iterator; +import java.util.LinkedHashSet; +import java.util.List; +import java.util.NoSuchElementException; +import java.util.concurrent.locks.Lock; +import java.util.concurrent.locks.ReadWriteLock; +import java.util.concurrent.locks.ReentrantLock; +import java.util.concurrent.locks.ReentrantReadWriteLock; + +/** + * Pairs a mutable TimeSeriesBucket with a sealed TimeSeriesSealedStore. + * Implements crash-safe compaction. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public class TimeSeriesShard implements AutoCloseable { + + private final int shardIndex; + private final DatabaseInternal database; + private final List columns; + private final long compactionBucketIntervalMs; + private final TimeSeriesBucket mutableBucket; + private final TimeSeriesSealedStore sealedStore; + // Read lock: held by scan/iterate (concurrent reads allowed). + // Write lock: held by compact() to prevent queries from seeing data twice + // during the window where sealed blocks are written but mutable not yet cleared. + private final ReadWriteLock compactionLock = new ReentrantReadWriteLock(); + // Prevents concurrent compact() calls on this shard (e.g. maintenance scheduler + explicit call). + // Both writeTempCompactionFile() and truncateToBlockCount() use the same .ts.sealed.tmp path; + // concurrent execution would corrupt each other's temp files. + private final Lock compactionMutex = new ReentrantLock(); + + public TimeSeriesShard(final DatabaseInternal database, final String baseName, final int shardIndex, + final List columns) throws IOException { + this(database, baseName, shardIndex, columns, 0); + } + + public TimeSeriesShard(final DatabaseInternal database, final String baseName, final int shardIndex, + final List columns, final long compactionBucketIntervalMs) throws IOException { + this.shardIndex = shardIndex; + this.database = database; + this.columns = columns; + this.compactionBucketIntervalMs = compactionBucketIntervalMs; + + final String shardName = baseName + "_shard_" + shardIndex; + final String shardPath = database.getDatabasePath() + "/" + shardName; + final LocalSchema schema = (LocalSchema) database.getSchema(); + + // Check if the bucket was already loaded by the component factory (cold open) + final com.arcadedb.engine.Component existing = schema.getFileByName(shardName); + if (existing instanceof TimeSeriesBucket tsb) { + this.mutableBucket = tsb; + this.mutableBucket.setColumns(columns); + } else { + // First-time creation: register the file with the schema BEFORE initialising the header + // page, so that the nested-TX commit in initHeaderPage() can resolve the file by its ID. + this.mutableBucket = new TimeSeriesBucket(database, shardName, shardPath, columns); + schema.registerFile(mutableBucket); + // Initialise the header page in a self-contained nested transaction. Using a nested TX + // here ensures that page 0 is committed immediately and is NOT placed in any enclosing + // transaction's dirty set. This is required so that the nested TX used by appendSamples() + // can later commit page 0 without conflicting with an enclosing transaction that was open + // when this shard was created (a common test pattern). + database.begin(); + try { + mutableBucket.initHeaderPage(); + database.commit(); + } catch (final Exception e) { + if (database.isTransactionActive()) + database.rollback(); + throw e instanceof IOException ? (IOException) e : + new IOException("Failed to initialise header for shard " + shardIndex, e); + } + } + + // If sealedStore construction fails, close mutableBucket to avoid a resource leak + final TimeSeriesSealedStore tempSealedStore; + try { + tempSealedStore = new TimeSeriesSealedStore(shardPath, columns); + } catch (final IOException e) { + try { this.mutableBucket.close(); } catch (final Exception ignored) {} + throw e; + } + this.sealedStore = tempSealedStore; + + // Crash recovery: if a compaction was interrupted, truncate any partial sealed blocks + database.begin(); + try { + if (mutableBucket.isCompactionInProgress()) { + final long watermark = mutableBucket.getCompactionWatermark(); + sealedStore.truncateToBlockCount(watermark); + mutableBucket.setCompactionInProgress(false); + database.commit(); + } else { + database.rollback(); + } + } catch (final Exception e) { + if (database.isTransactionActive()) + database.rollback(); + // Close both stores to avoid resource leaks before propagating the error + try { this.sealedStore.close(); } catch (final Exception ignored) {} + try { this.mutableBucket.close(); } catch (final Exception ignored) {} + throw e instanceof IOException ? (IOException) e : + new IOException("Crash recovery failed for shard " + shardIndex, e); + } + } + + /** + * Appends samples to the mutable bucket. + *

+ * The read lock is held for the entire internal transaction lifecycle + * (begin → write → commit), not just during the page writes. This is the key invariant + * that prevents MVCC conflicts with Phase 4: + *

    + *
  • Phase 4 acquires the write lock to clear the mutable bucket.
  • + *
  • The write lock can only be granted after all read-lock holders have released.
  • + *
  • Because this method releases the read lock only after the commit, Phase 4 + * is guaranteed that every in-flight append has already persisted its page-0 modifications + * before Phase 4 starts its own transaction. Phase 4 always sees the latest page-0 + * version and commits without conflict; insert transactions are never affected.
  • + *
+ *

+ * This method always manages its own transaction. If the caller already has an active + * transaction, ArcadeDB creates a nested transaction (a new {@code TransactionContext} pushed + * onto the per-thread stack). The nested transaction commits independently; the caller's outer + * transaction remains unaffected because it holds none of the modified pages in its dirty set. + */ + public void appendSamples(final long[] timestamps, final Object[]... columnValues) throws IOException { + compactionLock.readLock().lock(); + try { + database.begin(); + try { + mutableBucket.appendSamples(timestamps, columnValues); + database.commit(); + } catch (final ConcurrentModificationException cme) { + // Roll back the nested TX only. Do NOT touch the caller's outer transaction: + // the Javadoc promises it remains unaffected, and the caller must decide whether + // to rollback/retry their own transaction. + if (database.isTransactionActive()) + database.rollback(); + throw cme; // propagate as-is so callers can catch and retry + } catch (final Exception e) { + if (database.isTransactionActive()) + database.rollback(); + throw e instanceof IOException ? (IOException) e : new IOException("Failed to append timeseries samples", e); + } + } finally { + compactionLock.readLock().unlock(); + } + } + + /** + * Scans both sealed and mutable layers, merging results by timestamp. + * Holds the read lock to prevent concurrent {@link #compact()} from clearing the + * mutable bucket after sealing, which would cause double-counting. + */ + public List scanRange(final long fromTs, final long toTs, final int[] columnIndices, + final TagFilter tagFilter) throws IOException { + compactionLock.readLock().lock(); + try { + final List results = new ArrayList<>(); + + // Sealed layer first (already filtered by tagFilter inside sealedStore) + final List sealedResults = sealedStore.scanRange(fromTs, toTs, columnIndices, tagFilter); + results.addAll(sealedResults); + + // Then mutable layer + final List mutableResults = mutableBucket.scanRange(fromTs, toTs, columnIndices); + addFiltered(results, mutableResults, tagFilter, columnIndices); + + return results; + } finally { + compactionLock.readLock().unlock(); + } + } + + /** + * Returns an iterator over both sealed and mutable layers. + * Sealed data is iterated first, then mutable. Tag filter is applied inline. + * Both iterators are eagerly materialized under the read lock to prevent + * concurrent {@link #compact()} from clearing the mutable bucket and causing + * stale reads after the lock is released. + */ + public Iterator iterateRange(final long fromTs, final long toTs, final int[] columnIndices, + final TagFilter tagFilter) throws IOException { + final Iterator sealedIter; + final Iterator mutableIter; + compactionLock.readLock().lock(); + try { + sealedIter = sealedStore.iterateRange(fromTs, toTs, columnIndices, tagFilter); + // Eagerly materialize the mutable iterator under the lock. + // A lazy iterator would risk reading stale (cleared) pages if compaction + // acquires the write lock and clears the bucket before next() is called. + final List mutableRows = mutableBucket.scanRange(fromTs, toTs, columnIndices); + mutableIter = mutableRows.iterator(); + } finally { + compactionLock.readLock().unlock(); + } + + // Chain sealed then mutable, with inline tag filtering. + // The sealed iterator is fully materialised; the mutable iterator is lazy but its + // MVCC snapshot was already established under the read lock above. + return new Iterator<>() { + private Iterator current = sealedIter; + private boolean switchedToMutable = false; + private Object[] nextRow = null; + + { + advance(); + } + + private void advance() { + nextRow = null; + while (true) { + if (current.hasNext()) { + final Object[] row = current.next(); + // Use matchesMapped() so the filter works correctly when columnIndices is a subset. + if (tagFilter == null || tagFilter.matchesMapped(row, columnIndices)) { + nextRow = row; + return; + } + } else if (!switchedToMutable) { + current = mutableIter; + switchedToMutable = true; + } else + return; + } + } + + @Override + public boolean hasNext() { + return nextRow != null; + } + + @Override + public Object[] next() { + if (nextRow == null) + throw new NoSuchElementException(); + final Object[] result = nextRow; + advance(); + return result; + } + }; + } + + /** + * Maximum number of samples per sealed block. Keeps decompression cost bounded. + */ + static final int SEALED_BLOCK_SIZE = 65_536; + + /** + * Compacts mutable data into sealed columnar storage. + * Data is written in chunks of {@link #SEALED_BLOCK_SIZE} rows to keep + * individual sealed blocks small for fast decompression during queries. + *

+ * LSMTree-style lock-free pattern — matches how {@code LSMTreeIndexCompactor} works: + *

    + *
  1. Phase 0 (brief write lock + brief TX): snapshot the current data page count + * N and persist the crash-recovery flag. The write lock guarantees that no concurrent + * {@link #appendSamples} can modify page-0 during this TX, so the commit always + * succeeds.
  2. + *
  3. Phase 1 (no lock, read-only TX then rollback): read full/immutable pages + * 1..N-1 via a short read-only transaction that is immediately rolled back. These pages + * are permanently full — {@link #appendSamples} always writes to the LAST page — so + * their content is stable and no MVCC conflict is possible.
  4. + *
  5. Phase 2 (lock-free, no TX): sort and compress the snapshot data.
  6. + *
  7. Phase 3 (lock-free, no TX): write all sealed blocks to {@code .ts.sealed.tmp}. + * Concurrent queries still read from the live sealed file; no double-counting.
  8. + *
  9. Phase 4 (brief write lock + brief TX): read any remaining partial pages, + * merge with Phase-2 spill if present, compress, atomically swap the temp file, + * clear the compacted mutable bucket, and reset the crash-recovery flag. The write + * lock blocks concurrent {@link #appendSamples} so the TX commit cannot conflict.
  10. + *
+ *

+ * The result is that {@link #appendSamples} is never blocked during the heavy I/O phases + * (1–3), and is only briefly blocked (≤ a few milliseconds) during the two write-lock + * windows (phases 0 and 4). + *

+ * Crash-safe: the {@code compactionInProgress} flag and watermark are committed in Phase 0. + * If the process crashes before Phase 4 clears the flag, the {@link #TimeSeriesShard} + * constructor truncates the sealed store back to the watermark and the mutable pages are + * left intact for re-compaction. + */ + public void compact() throws IOException { + // Serialize concurrent compact() calls: writeTempCompactionFile() and truncateToBlockCount() + // both use the same .ts.sealed.tmp path; concurrent execution would corrupt each other's + // temp files (e.g. maintenance scheduler and an explicit compactAll() running in parallel). + compactionMutex.lock(); + try { + compactInternal(); + } finally { + compactionMutex.unlock(); + } + } + + private void compactInternal() throws IOException { + final long initialBlockCount = sealedStore.getBlockCount(); + + // ── Phase 0 (brief writeLock + brief TX): snapshot page count, set crash flag ───────── + // The write lock blocks concurrent appendSamples() so the TX modifying page-0 cannot + // get an MVCC conflict from a concurrent insert. + final int snapshotDataPageCount; + compactionLock.writeLock().lock(); + try { + database.begin(); + try { + final int pageCount = mutableBucket.getDataPageCount(); + if (pageCount == 0) { + database.rollback(); + return; + } + snapshotDataPageCount = pageCount; + mutableBucket.setCompactionInProgress(true); + mutableBucket.setCompactionWatermark(initialBlockCount); + database.commit(); + } catch (final Exception e) { + if (database.isTransactionActive()) + database.rollback(); + throw e instanceof IOException ? (IOException) e : new IOException("Compaction failed in phase 0", e); + } + } finally { + compactionLock.writeLock().unlock(); + } + + // Pages 1..lastFullPage are FULL (immutable): appendSamples() never writes to full pages + // — it always appends to the LAST page. These are safe to read without the write lock. + // Page snapshotDataPageCount is the current partial last page; it will be read in Phase 4 + // under the write lock together with any new pages created between Phase 0 and Phase 4. + final int lastFullPage = snapshotDataPageCount - 1; + + // Shared output lists for compressed blocks built in Phases 1+2. + final List allCompressedList = new ArrayList<>(); + final List allMetaList = new ArrayList<>(); + final List allMinsList = new ArrayList<>(); + final List allMaxsList = new ArrayList<>(); + final List allSumsList = new ArrayList<>(); + final List allTagDVList = new ArrayList<>(); + + // ── Phase 1 (no lock, read-only TX): read full/immutable pages ─────────────────────────── + // Full pages are never written to by concurrent appendSamples(); the read-only snapshot TX + // is always conflict-free. Roll it back immediately — nothing is modified. + // phase2Spill holds the raw samples of the last partial chunk from Phase 2 (if any). + // These samples are merged with Phase 4's partial-page data to avoid splitting a single + // sealed block across the Phase-2/Phase-4 page boundary. + Object[] phase2Spill = null; + if (lastFullPage > 0) { + database.begin(); + try { + final Object[] snapshotData = mutableBucket.readFullPagesForCompaction(lastFullPage); + if (snapshotData != null) + // ── Phase 2 (lock-free, no TX): sort + compress immutable page data ───────────── + // returnSpill=true: the last partial chunk is returned as raw data instead of being + // emitted as a block, so Phase 4 can merge it with the partial-page samples and + // produce a single correctly-sized block. + phase2Spill = buildCompressedBlocks(snapshotData, allCompressedList, allMetaList, allMinsList, allMaxsList, + allSumsList, allTagDVList, true); + } finally { + database.rollback(); // read-only: rollback is always safe + } + } + + // ── Phase 3 (lock-free, no TX): write existing sealed + Phase 2 blocks to temp file ───── + // Concurrent queries still read from the CURRENT sealed file — no double-counting. + // The temp file is created even when allCompressedList is empty so that Phase 4 can + // append the partial page's data and then swap atomically in one shot. + final List newBlockDirectory; + try { + newBlockDirectory = sealedStore.writeTempCompactionFile( + allCompressedList, allMetaList, allMinsList, allMaxsList, allSumsList, allTagDVList); + } catch (final Exception e) { + sealedStore.deleteTempFileIfExists(); + clearCompactionFlagBestEffort(); + throw e instanceof IOException ? (IOException) e : new IOException("Compaction failed writing temp file", e); + } + + // ── Phase 4a (brief writeLock + read-only TX): snapshot remaining pages ────────────── + // Grab the write lock just long enough to snapshot the current page count and read the + // pages accumulated since Phase 0. Release the lock immediately so appends can resume + // while we sort + compress (Phase 4b). + final int phase4aPageCount; + Object[] phase4aData; + compactionLock.writeLock().lock(); + try { + database.begin(); + try { + phase4aPageCount = mutableBucket.getDataPageCount(); + if (phase4aPageCount > lastFullPage) + phase4aData = mutableBucket.readPagesRangeForCompaction(lastFullPage + 1, phase4aPageCount); + else + phase4aData = null; + } finally { + database.rollback(); // read-only snapshot + } + } finally { + compactionLock.writeLock().unlock(); + } + + // ── Phase 4b (lock-free, no TX): compress remaining pages + write to temp file ──────── + // Merge Phase-2 spill with the Phase-4a snapshot, then compress and append to the temp + // file. Concurrent appends proceed freely during this CPU/IO-intensive step. + final Object[] toCompress4b; + if (phase2Spill != null && phase4aData != null) + toCompress4b = mergeCompactionData(phase2Spill, phase4aData); + else if (phase2Spill != null) + toCompress4b = phase2Spill; + else + toCompress4b = phase4aData; + + // Use returnSpill=true so the last partial chunk is held back for Phase 4c, + // where it can be merged with any tail pages created during Phase 4b. + Object[] phase4bSpill = null; + if (toCompress4b != null) { + final List remCompressed = new ArrayList<>(); + final List remMeta = new ArrayList<>(); + final List remMins = new ArrayList<>(); + final List remMaxs = new ArrayList<>(); + final List remSums = new ArrayList<>(); + final List remTagDV = new ArrayList<>(); + phase4bSpill = buildCompressedBlocks(toCompress4b, remCompressed, remMeta, remMins, remMaxs, remSums, + remTagDV, true); + if (!remCompressed.isEmpty()) + sealedStore.appendBlocksToTempFile(remCompressed, remMeta, remMins, remMaxs, remSums, remTagDV, + newBlockDirectory); + } + + // ── Phase 4c (brief writeLock + brief TX): read tail pages, swap + clear ────────────── + // Only pages created DURING Phase 4b need to be processed under the lock. This is + // typically just 0-2 pages worth of data, keeping the lock hold time minimal. + compactionLock.writeLock().lock(); + try { + database.begin(); + try { + final int finalPageCount = mutableBucket.getDataPageCount(); + + // Read only the tail pages created after Phase 4a's snapshot. + Object[] tailData = null; + if (finalPageCount > phase4aPageCount) + tailData = mutableBucket.readPagesRangeForCompaction(phase4aPageCount + 1, finalPageCount); + + // Merge Phase 4b spill with tail data + final Object[] toCompressFinal; + if (phase4bSpill != null && tailData != null) + toCompressFinal = mergeCompactionData(phase4bSpill, tailData); + else if (phase4bSpill != null) + toCompressFinal = phase4bSpill; + else + toCompressFinal = tailData; + + if (toCompressFinal != null) { + final List tailCompressed = new ArrayList<>(); + final List tailMeta = new ArrayList<>(); + final List tailMins = new ArrayList<>(); + final List tailMaxs = new ArrayList<>(); + final List tailSums = new ArrayList<>(); + final List tailTagDV = new ArrayList<>(); + buildCompressedBlocks(toCompressFinal, tailCompressed, tailMeta, tailMins, tailMaxs, tailSums, tailTagDV, + false); + if (!tailCompressed.isEmpty()) + sealedStore.appendBlocksToTempFile(tailCompressed, tailMeta, tailMins, tailMaxs, tailSums, tailTagDV, + newBlockDirectory); + } + + // Atomically swap temp file into the sealed store; updates in-memory blockDirectory. + sealedStore.commitTempCompactionFile(newBlockDirectory); + + // Clear the entire mutable bucket and persist the crash-recovery flag reset. + // No MVCC conflict: writeLock blocks all concurrent appendSamples(). + mutableBucket.clearDataPages(); + mutableBucket.setCompactionInProgress(false); + database.commit(); + } catch (final Exception e) { + if (database.isTransactionActive()) + database.rollback(); + // Restore sealed store to initial state so the next run re-compacts cleanly. + // (The crash-recovery flag remains set, so a restart also handles this correctly.) + try { + sealedStore.truncateToBlockCount(initialBlockCount); + } catch (final IOException te) { + throw new IOException("Compaction failed and sealed store rollback also failed: " + te.getMessage(), e); + } + throw e instanceof IOException ? (IOException) e : new IOException("Compaction failed in phase 4c", e); + } + } finally { + compactionLock.writeLock().unlock(); + } + } + + /** + * Sorts the given snapshot data by timestamp and builds compressed sealed blocks, + * appending the results to the supplied output lists. + * Called from Phase 2 (lock-free) and Phase 4 (under writeLock). + * + * @param returnSpill when {@code true}, the last partial chunk is NOT emitted as a block but + * is instead returned as raw compaction data (same format as {@code data}). + * The caller (Phase 4) prepends this spill to the partial-page data so the + * bucket/block boundary is never split across phases. Pass {@code false} + * in Phase 4 where all remaining data must become blocks. + * @return the spill raw data array, or {@code null} when {@code returnSpill} is {@code false} + * or there is no partial last chunk. + */ + private Object[] buildCompressedBlocks( + final Object[] data, + final List compressedOut, final List metaOut, + final List minsOut, final List maxsOut, final List sumsOut, + final List tagDVOut, final boolean returnSpill) { + + final long[] timestamps = (long[]) data[0]; + final int totalSamples = timestamps.length; + if (totalSamples == 0) + return null; + + final int[] sortedIndices = sortIndices(timestamps); + final long[] sortedTs = applyOrder(timestamps, sortedIndices); + + final int colCount = columns.size(); + final Object[][] sortedColArrays = new Object[colCount][]; + int nonTsIdx = 0; + for (int c = 0; c < colCount; c++) { + if (columns.get(c).getRole() == ColumnDefinition.ColumnRole.TIMESTAMP) + sortedColArrays[c] = null; + else { + sortedColArrays[c] = applyOrderObjects((Object[]) data[nonTsIdx + 1], sortedIndices); + nonTsIdx++; + } + } + + int chunkStart = 0; + while (chunkStart < totalSamples) { + int chunkEnd; + if (compactionBucketIntervalMs > 0) { + final long bucketStart = (sortedTs[chunkStart] / compactionBucketIntervalMs) * compactionBucketIntervalMs; + final long bucketEnd = bucketStart + compactionBucketIntervalMs; + chunkEnd = chunkStart; + final int maxEnd = Math.min(chunkStart + SEALED_BLOCK_SIZE, totalSamples); + while (chunkEnd < maxEnd && sortedTs[chunkEnd] < bucketEnd) + chunkEnd++; + if (chunkEnd == chunkStart) + chunkEnd = chunkStart + 1; + } else { + chunkEnd = Math.min(chunkStart + SEALED_BLOCK_SIZE, totalSamples); + } + + chunkEnd = adjustChunkForDictionaryLimit(chunkStart, chunkEnd, colCount, sortedColArrays); + + // If this is the last chunk and it may be partial, return it as spill so Phase 4 can + // merge it with partial-page data and produce a single correctly-bounded block. + if (returnSpill && chunkEnd == totalSamples) { + final boolean isPartial; + if (compactionBucketIntervalMs > 0) + // Bucket-aligned: always hold back the last chunk since Phase 4 may have more + // samples in the same bucket. + isPartial = true; + else + // Fixed-size: partial when the chunk is smaller than a full sealed block. + isPartial = (chunkEnd - chunkStart) < SEALED_BLOCK_SIZE; + if (isPartial) + return extractSpillData(sortedTs, sortedColArrays, chunkStart, chunkEnd, colCount); + } + + final int chunkLen = chunkEnd - chunkStart; + final long[] chunkTs = Arrays.copyOfRange(sortedTs, chunkStart, chunkEnd); + + final double[] mins = new double[colCount]; + final double[] maxs = new double[colCount]; + final double[] sums = new double[colCount]; + Arrays.fill(mins, Double.NaN); + Arrays.fill(maxs, Double.NaN); + + final byte[][] compressedCols = new byte[colCount][]; + for (int c = 0; c < colCount; c++) { + if (columns.get(c).getRole() == ColumnDefinition.ColumnRole.TIMESTAMP) { + compressedCols[c] = DeltaOfDeltaCodec.encode(chunkTs); + } else { + final Object[] chunkValues = Arrays.copyOfRange(sortedColArrays[c], chunkStart, chunkEnd); + compressedCols[c] = TimeSeriesSealedStore.compressColumn(columns.get(c), chunkValues); + + final TimeSeriesCodec codec = columns.get(c).getCompressionHint(); + if (codec == TimeSeriesCodec.GORILLA_XOR || codec == TimeSeriesCodec.SIMPLE8B) { + double min = Double.MAX_VALUE, max = -Double.MAX_VALUE, sum = 0; + for (final Object v : chunkValues) { + final double d = v != null ? ((Number) v).doubleValue() : 0.0; + if (d < min) + min = d; + if (d > max) + max = d; + sum += d; + } + mins[c] = min; + maxs[c] = max; + sums[c] = sum; + } + } + } + + final String[][] chunkTagDistinctValues = new String[colCount][]; + for (int c = 0; c < colCount; c++) { + if (columns.get(c).getRole() == ColumnDefinition.ColumnRole.TAG && sortedColArrays[c] != null) { + final LinkedHashSet distinctSet = new LinkedHashSet<>(); + for (int i = chunkStart; i < chunkEnd; i++) { + final Object val = sortedColArrays[c][i]; + distinctSet.add(val != null ? val.toString() : ""); + } + chunkTagDistinctValues[c] = distinctSet.toArray(new String[0]); + } + } + + compressedOut.add(compressedCols); + metaOut.add(new long[]{chunkTs[0], chunkTs[chunkLen - 1], chunkLen}); + minsOut.add(mins); + maxsOut.add(maxs); + sumsOut.add(sums); + tagDVOut.add(chunkTagDistinctValues); + chunkStart = chunkEnd; + } + return null; // no spill + } + + /** + * Extracts the raw samples in {@code [from, to)} from the sorted compaction arrays and + * returns them in the same format as the {@code data} parameter of + * {@link #buildCompressedBlocks}: element 0 is {@code long[]} timestamps, subsequent + * elements are {@code Object[]} non-timestamp column arrays. + */ + private Object[] extractSpillData(final long[] sortedTs, final Object[][] sortedColArrays, + final int from, final int to, final int colCount) { + final Object[] spill = new Object[colCount]; + spill[0] = Arrays.copyOfRange(sortedTs, from, to); + int spillIdx = 1; + for (int c = 0; c < colCount; c++) { + if (sortedColArrays[c] == null) + continue; // TIMESTAMP column — skip; its data lives in element 0 + spill[spillIdx++] = Arrays.copyOfRange(sortedColArrays[c], from, to); + } + return spill; + } + + /** + * Concatenates two compaction data arrays (same format as the {@code data} parameter of + * {@link #buildCompressedBlocks}). Used to merge Phase-2 spill with Phase-4 partial-page + * data before compressing them together. + */ + private static Object[] mergeCompactionData(final Object[] a, final Object[] b) { + final long[] tsA = (long[]) a[0]; + final long[] tsB = (long[]) b[0]; + final long[] mergedTs = new long[tsA.length + tsB.length]; + System.arraycopy(tsA, 0, mergedTs, 0, tsA.length); + System.arraycopy(tsB, 0, mergedTs, tsA.length, tsB.length); + final Object[] result = new Object[a.length]; + result[0] = mergedTs; + for (int i = 1; i < a.length; i++) { + final Object[] colA = (Object[]) a[i]; + final Object[] colB = (Object[]) b[i]; + final Object[] merged = new Object[colA.length + colB.length]; + System.arraycopy(colA, 0, merged, 0, colA.length); + System.arraycopy(colB, 0, merged, colA.length, colB.length); + result[i] = merged; + } + return result; + } + + /** + * Best-effort: clear the compaction-in-progress flag after a non-crash error. + */ + private void clearCompactionFlagBestEffort() { + compactionLock.writeLock().lock(); + try { + database.begin(); + mutableBucket.setCompactionInProgress(false); + database.commit(); + } catch (final Exception ignored) { + if (database.isTransactionActive()) + try { + database.rollback(); + } catch (final Exception re) { /* ignored */ } + } finally { + compactionLock.writeLock().unlock(); + } + } + + public TimeSeriesBucket getMutableBucket() { + return mutableBucket; + } + + public TimeSeriesSealedStore getSealedStore() { + return sealedStore; + } + + /** + * Returns the compaction read/write lock. + * Callers that aggregate both sealed and mutable data must hold the read lock for the + * entire duration to prevent compaction from completing between the two reads (which + * would cause the compacted mutable data to be invisible to the caller). + */ + java.util.concurrent.locks.ReadWriteLock getCompactionLock() { + return compactionLock; + } + + public int getShardIndex() { + return shardIndex; + } + + @Override + public void close() throws IOException { + mutableBucket.close(); + sealedStore.close(); + } + + // --- Private helpers --- + + private static void addFiltered(final List results, final List source, final TagFilter filter, + final int[] columnIndices) { + if (filter == null) + results.addAll(source); + else + for (final Object[] row : source) + // Use matchesMapped() so the filter works correctly when columnIndices is a subset. + if (filter.matchesMapped(row, columnIndices)) + results.add(row); + } + + private static int[] sortIndices(final long[] timestamps) { + final int n = timestamps.length; + final int[] indices = new int[n]; + for (int i = 0; i < n; i++) + indices[i] = i; + // Merge sort on primitive int[] to avoid Integer boxing + mergeSort(indices, new int[n], 0, n, timestamps); + return indices; + } + + private static void mergeSort(final int[] arr, final int[] temp, final int from, final int to, final long[] keys) { + if (to - from <= 1) + return; + final int mid = (from + to) >>> 1; + mergeSort(arr, temp, from, mid, keys); + mergeSort(arr, temp, mid, to, keys); + // Merge + int i = from, j = mid, k = from; + while (i < mid && j < to) { + if (keys[arr[i]] <= keys[arr[j]]) + temp[k++] = arr[i++]; + else + temp[k++] = arr[j++]; + } + while (i < mid) + temp[k++] = arr[i++]; + while (j < to) + temp[k++] = arr[j++]; + System.arraycopy(temp, from, arr, from, to - from); + } + + private static long[] applyOrder(final long[] data, final int[] indices) { + final long[] result = new long[data.length]; + for (int i = 0; i < indices.length; i++) + result[i] = data[indices[i]]; + return result; + } + + private static Object[] applyOrderObjects(final Object[] data, final int[] indices) { + final Object[] result = new Object[data.length]; + for (int i = 0; i < indices.length; i++) + result[i] = data[indices[i]]; + return result; + } + + /** + * If any DICTIONARY column in [chunkStart, chunkEnd) exceeds {@link DictionaryCodec#MAX_DICTIONARY_SIZE}, + * shrinks chunkEnd until all dictionary columns fit. Guarantees at least one row per chunk. + */ + private int adjustChunkForDictionaryLimit(final int chunkStart, int chunkEnd, + final int colCount, final Object[][] sortedColArrays) { + for (int c = 0; c < colCount; c++) { + if (columns.get(c).getCompressionHint() != TimeSeriesCodec.DICTIONARY || sortedColArrays[c] == null) + continue; + final HashSet distinct = new HashSet<>(); + for (int i = chunkStart; i < chunkEnd; i++) { + distinct.add(sortedColArrays[c][i] != null ? sortedColArrays[c][i] : ""); + if (distinct.size() > DictionaryCodec.MAX_DICTIONARY_SIZE) { + // Shrink chunk to i (exclusive) — the last row that still fits + chunkEnd = Math.max(chunkStart + 1, i); + break; + } + } + } + return chunkEnd; + } +} diff --git a/engine/src/main/java/com/arcadedb/engine/timeseries/codec/DeltaOfDeltaCodec.java b/engine/src/main/java/com/arcadedb/engine/timeseries/codec/DeltaOfDeltaCodec.java new file mode 100644 index 0000000000..1d57cc0297 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/engine/timeseries/codec/DeltaOfDeltaCodec.java @@ -0,0 +1,301 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries.codec; + +import java.nio.ByteBuffer; + +/** + * Delta-of-delta encoding for monotonically increasing timestamps. + * Based on the Facebook Gorilla paper: stores first value raw, then deltas, + * then delta-of-deltas using variable-bit encoding. + *

+ * Encoding scheme for delta-of-deltas (dod): + * - dod == 0: store '0' (1 bit) + * - dod in [-64, 63]: store '10' + 7-bit ZigZag value (9 bits) + * - dod in [-256, 255]: store '110' + 9-bit ZigZag value (12 bits) + * - dod in [-2047, 2047]: store '1110' + 12-bit ZigZag value (16 bits) + * - otherwise: store '1111' + 64-bit raw value (68 bits) + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public final class DeltaOfDeltaCodec { + + /** Maximum number of values per encoded block — mirrors TimeSeriesSealedStore.MAX_BLOCK_SIZE. */ + public static final int MAX_BLOCK_SIZE = 65536; + + private DeltaOfDeltaCodec() { + } + + public static byte[] encode(final long[] timestamps) { + if (timestamps == null || timestamps.length == 0) + return new byte[0]; + + final BitWriter writer = new BitWriter(timestamps.length * 2 + 16); + + // Write count + writer.writeBits(timestamps.length, 32); + // Write first value raw (64 bits) + writer.writeBits(timestamps[0], 64); + + if (timestamps.length == 1) + return writer.toByteArray(); + + // Write first delta raw (64 bits) + long prevDelta = timestamps[1] - timestamps[0]; + writer.writeBits(prevDelta, 64); + + for (int i = 2; i < timestamps.length; i++) { + final long delta = timestamps[i] - timestamps[i - 1]; + final long dod = delta - prevDelta; + prevDelta = delta; + + if (dod == 0) { + writer.writeBit(0); + } else if (dod >= -64 && dod <= 63) { + writer.writeBits(0b10, 2); + writer.writeBits(zigZagEncode(dod), 7); + } else if (dod >= -255 && dod <= 255) { + writer.writeBits(0b110, 3); + writer.writeBits(zigZagEncode(dod), 9); + } else if (dod >= -2047 && dod <= 2047) { + writer.writeBits(0b1110, 4); + writer.writeBits(zigZagEncode(dod), 12); + } else { + writer.writeBits(0b1111, 4); + writer.writeBits(dod, 64); + } + } + return writer.toByteArray(); + } + + public static long[] decode(final byte[] data) { + if (data == null || data.length == 0) + return new long[0]; + + final BitReader reader = new BitReader(data); + + final int count = (int) reader.readBits(32); + if (count <= 0 || count > MAX_BLOCK_SIZE) + throw new IllegalArgumentException("DeltaOfDelta decode: invalid count " + count + " (expected 1.." + MAX_BLOCK_SIZE + ")"); + final long[] result = new long[count]; + result[0] = reader.readBits(64); + + if (count == 1) + return result; + + long prevDelta = reader.readBits(64); + result[1] = result[0] + prevDelta; + + for (int i = 2; i < count; i++) { + long dod; + if (reader.readBit() == 0) { + dod = 0; + } else if (reader.readBit() == 0) { + // prefix '10' + dod = zigZagDecode(reader.readBits(7)); + } else if (reader.readBit() == 0) { + // prefix '110' + dod = zigZagDecode(reader.readBits(9)); + } else if (reader.readBit() == 0) { + // prefix '1110' + dod = zigZagDecode(reader.readBits(12)); + } else { + // prefix '1111' + dod = reader.readBits(64); + } + prevDelta = prevDelta + dod; + result[i] = result[i - 1] + prevDelta; + } + return result; + } + + /** + * Decodes into a pre-allocated output buffer, returning the number of decoded values. + * The output array must be at least as large as the encoded count. + */ + public static int decode(final byte[] data, final long[] output) { + if (data == null || data.length == 0) + return 0; + + final BitReader reader = new BitReader(data); + final int count = (int) reader.readBits(32); + if (count <= 0 || count > MAX_BLOCK_SIZE) + throw new IllegalArgumentException("DeltaOfDelta decode: invalid count " + count + " (expected 1.." + MAX_BLOCK_SIZE + ")"); + output[0] = reader.readBits(64); + + if (count == 1) + return count; + + long prevDelta = reader.readBits(64); + output[1] = output[0] + prevDelta; + + for (int i = 2; i < count; i++) { + long dod; + if (reader.readBit() == 0) { + dod = 0; + } else if (reader.readBit() == 0) { + dod = zigZagDecode(reader.readBits(7)); + } else if (reader.readBit() == 0) { + dod = zigZagDecode(reader.readBits(9)); + } else if (reader.readBit() == 0) { + dod = zigZagDecode(reader.readBits(12)); + } else { + dod = reader.readBits(64); + } + prevDelta = prevDelta + dod; + output[i] = output[i - 1] + prevDelta; + } + return count; + } + + static long zigZagEncode(final long value) { + return (value << 1) ^ (value >> 63); + } + + static long zigZagDecode(final long encoded) { + return (encoded >>> 1) ^ -(encoded & 1); + } + + /** + * Bit-level writer backed by a growing byte array. + */ + static final class BitWriter { + private byte[] buffer; + private int bitPos = 0; + + BitWriter(final int initialCapacity) { + this.buffer = new byte[Math.max(initialCapacity, 16)]; + } + + void writeBit(final int bit) { + ensureCapacity(1); + if (bit != 0) + buffer[bitPos >> 3] |= (byte) (1 << (7 - (bitPos & 7))); + bitPos++; + } + + void writeBits(final long value, final int numBits) { + ensureCapacity(numBits); + for (int i = numBits - 1; i >= 0; i--) { + if (((value >> i) & 1) != 0) + buffer[bitPos >> 3] |= (byte) (1 << (7 - (bitPos & 7))); + bitPos++; + } + } + + byte[] toByteArray() { + final int byteLen = (bitPos + 7) >> 3; + final byte[] result = new byte[byteLen]; + System.arraycopy(buffer, 0, result, 0, byteLen); + return result; + } + + private void ensureCapacity(final int additionalBits) { + final int requiredBytes = ((bitPos + additionalBits) + 7) >> 3; + if (requiredBytes > buffer.length) { + final byte[] newBuffer = new byte[Math.max(buffer.length * 2, requiredBytes)]; + System.arraycopy(buffer, 0, newBuffer, 0, buffer.length); + buffer = newBuffer; + } + } + } + + /** + * Sliding-window bit reader over a byte array. + * Maintains a pre-loaded 64-bit register ({@code window}) with bits MSB-aligned. + * Each {@code readBits(n)} extracts the top n bits via a single shift, avoiding + * the per-call byte-assembly loop of the previous implementation. + *

+ * Refill happens when the window drops to ≤56 valid bits, loading up to 8 bytes + * in one pass. This amortizes array access across ~7-8 decoded values, converting + * the critical Gorilla XOR decode loop from ~10 array loads per value to ~1. + */ + static final class BitReader { + private final byte[] data; + private final int dataLen; + private long window; // up to 64 valid bits, MSB-aligned + private int bitsInWindow; // number of valid bits in window + private int bytePos; // next byte to consume from data[] + + BitReader(final byte[] data) { + this.data = data; + this.dataLen = data.length; + this.window = 0; + this.bitsInWindow = 0; + this.bytePos = 0; + refill(); + } + + int readBit() { + if (bitsInWindow == 0) + refill(); + final int bit = (int) (window >>> 63); + window <<= 1; + bitsInWindow--; + return bit; + } + + long readBits(final int numBits) { + if (numBits == 0) + return 0; + if (numBits <= bitsInWindow) { + // Fast path: extract directly from window — no array access + final long result = window >>> (64 - numBits); + // Java shift: (long << 64) is a no-op (shift distance masked to 0..63), so special-case it + window = numBits < 64 ? window << numBits : 0; + bitsInWindow -= numBits; + if (bitsInWindow <= 56) + refill(); + return result; + } + // Slow path: numBits > bitsInWindow (only for 64-bit header reads) + if (bitsInWindow > 0) { + final int have = bitsInWindow; + long result = window >>> (64 - have); + window = 0; + bitsInWindow = 0; + refill(); + final int remaining = numBits - have; + result = (result << remaining) | (window >>> (64 - remaining)); + window <<= remaining; + bitsInWindow -= remaining; + if (bitsInWindow <= 56) + refill(); + return result; + } + // bitsInWindow == 0 + refill(); + final long result = window >>> (64 - numBits); + window = numBits < 64 ? window << numBits : 0; + bitsInWindow -= numBits; + if (bitsInWindow <= 56) + refill(); + return result; + } + + private void refill() { + // Pack bytes into the lower portion of the window until we have >56 bits or exhaust input. + // The threshold of 56 ensures adding 8 bits never overflows the 64-bit register. + while (bitsInWindow <= 56 && bytePos < dataLen) { + window |= (long) (data[bytePos++] & 0xFF) << (56 - bitsInWindow); + bitsInWindow += 8; + } + } + } +} diff --git a/engine/src/main/java/com/arcadedb/engine/timeseries/codec/DictionaryCodec.java b/engine/src/main/java/com/arcadedb/engine/timeseries/codec/DictionaryCodec.java new file mode 100644 index 0000000000..d1ba8c7979 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/engine/timeseries/codec/DictionaryCodec.java @@ -0,0 +1,130 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries.codec; + +import java.nio.ByteBuffer; +import java.nio.charset.StandardCharsets; +import java.util.HashMap; +import java.util.Map; + +/** + * Dictionary encoding for low-cardinality string columns (e.g., tags). + * Builds a per-block dictionary (String → int16), emits dictionary + int16[] indices. + *

+ * Important: The maximum number of distinct values per block is {@link #MAX_DICTIONARY_SIZE} (65535). + * This limit is enforced at encode time (throws {@link IllegalArgumentException}). + * During compaction, {@code TimeSeriesShard} automatically splits chunks that would exceed + * this limit into smaller blocks, so high-cardinality tag data is handled gracefully. + *

+ * Format: + * - 4 bytes: value count + * - 2 bytes: dictionary size + * - For each dictionary entry: 2 bytes length + UTF-8 bytes + * - For each value: 2 bytes dictionary index + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public final class DictionaryCodec { + + public static final int MAX_DICTIONARY_SIZE = 65535; + + private DictionaryCodec() { + } + + public static byte[] encode(final String[] values) { + if (values == null || values.length == 0) + return new byte[0]; + + // Build dictionary (use int counter to avoid short overflow) + final Map dict = new HashMap<>(); + final String[] dictEntries = new String[Math.min(values.length, MAX_DICTIONARY_SIZE)]; + int nextIndex = 0; + + final int[] indices = new int[values.length]; + for (int i = 0; i < values.length; i++) { + Integer idx = dict.get(values[i]); + if (idx == null) { + if (nextIndex >= MAX_DICTIONARY_SIZE) + throw new IllegalArgumentException("Dictionary overflow: more than " + MAX_DICTIONARY_SIZE + " unique values"); + idx = nextIndex; + dict.put(values[i], idx); + dictEntries[nextIndex] = values[i]; + nextIndex++; + } + indices[i] = idx; + } + + // Pre-compute UTF-8 bytes once and reuse for both size calculation and writing + final byte[][] utf8Entries = new byte[nextIndex][]; + int size = 4 + 2; // count + dict size + for (int i = 0; i < nextIndex; i++) { + utf8Entries[i] = dictEntries[i].getBytes(StandardCharsets.UTF_8); + if (utf8Entries[i].length > 65535) + throw new IllegalArgumentException( + "Dictionary entry too long: UTF-8 encoding of '" + dictEntries[i].substring(0, Math.min(20, dictEntries[i].length())) + + "...' is " + utf8Entries[i].length + " bytes (max 65535)"); + size += 2 + utf8Entries[i].length; + } + size += values.length * 2; // indices + + final ByteBuffer buf = ByteBuffer.allocate(size); + buf.putInt(values.length); + buf.putShort((short) nextIndex); + + for (int i = 0; i < nextIndex; i++) { + buf.putShort((short) utf8Entries[i].length); + buf.put(utf8Entries[i]); + } + + for (final int index : indices) + buf.putShort((short) index); + + return buf.array(); + } + + public static String[] decode(final byte[] data) throws java.io.IOException { + if (data == null || data.length == 0) + return new String[0]; + + try { + final ByteBuffer buf = ByteBuffer.wrap(data); + final int count = buf.getInt(); + final int dictSize = buf.getShort() & 0xFFFF; + + final String[] dictEntries = new String[dictSize]; + for (int i = 0; i < dictSize; i++) { + final int len = buf.getShort() & 0xFFFF; + final byte[] utf8 = new byte[len]; + buf.get(utf8); + dictEntries[i] = new String(utf8, StandardCharsets.UTF_8); + } + + final String[] result = new String[count]; + for (int i = 0; i < count; i++) { + final int idx = buf.getShort() & 0xFFFF; + if (idx >= dictSize) + throw new java.io.IOException("DictionaryCodec: invalid dictionary index " + idx + " (dict size=" + dictSize + ")"); + result[i] = dictEntries[idx]; + } + return result; + } catch (final java.nio.BufferUnderflowException e) { + throw new java.io.IOException("DictionaryCodec: malformed data (truncated buffer, size=" + data.length + ")", e); + } + } +} diff --git a/engine/src/main/java/com/arcadedb/engine/timeseries/codec/GorillaXORCodec.java b/engine/src/main/java/com/arcadedb/engine/timeseries/codec/GorillaXORCodec.java new file mode 100644 index 0000000000..87c9c0091e --- /dev/null +++ b/engine/src/main/java/com/arcadedb/engine/timeseries/codec/GorillaXORCodec.java @@ -0,0 +1,183 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries.codec; + +/** + * Gorilla XOR encoding for floating-point values. + * XOR consecutive IEEE 754 doubles; store only meaningful bits + * (leading zeros + trailing zeros + middle block). + *

+ * Encoding scheme for XOR'd value: + * - xor == 0: store '0' (1 bit) — same as previous + * - leading/trailing same as previous: store '10' + meaningful bits + * - otherwise: store '11' + 6-bit leading zeros + 6-bit block length + block bits + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public final class GorillaXORCodec { + + private GorillaXORCodec() { + } + + public static byte[] encode(final double[] values) { + if (values == null || values.length == 0) + return new byte[0]; + + final DeltaOfDeltaCodec.BitWriter writer = new DeltaOfDeltaCodec.BitWriter(values.length * 2 + 16); + + // Write count + writer.writeBits(values.length, 32); + // Write first value raw + writer.writeBits(Double.doubleToRawLongBits(values[0]), 64); + + if (values.length == 1) + return writer.toByteArray(); + + int prevLeading = Integer.MAX_VALUE; + int prevTrailing = 0; + long prevBits = Double.doubleToRawLongBits(values[0]); + + for (int i = 1; i < values.length; i++) { + final long currentBits = Double.doubleToRawLongBits(values[i]); + final long xor = currentBits ^ prevBits; + + if (xor == 0) { + writer.writeBit(0); + } else { + writer.writeBit(1); + + final int leading = Long.numberOfLeadingZeros(xor); + final int trailing = Long.numberOfTrailingZeros(xor); + + if (leading >= prevLeading && trailing >= prevTrailing) { + // Case '10': reuse previous block position + writer.writeBit(0); + final int blockSize = 64 - prevLeading - prevTrailing; + writer.writeBits(xor >>> prevTrailing, blockSize); + } else { + // Case '11': new block position + writer.writeBit(1); + // Cap leading zeros at 63 (6 bits) + final int cappedLeading = Math.min(leading, 63); + writer.writeBits(cappedLeading, 6); + final int blockSize = 64 - cappedLeading - trailing; + // blockSize ranges 1..64; store (blockSize - 1) to fit in 6 bits + writer.writeBits(blockSize - 1, 6); + writer.writeBits(xor >>> trailing, blockSize); + + prevLeading = cappedLeading; + prevTrailing = trailing; + } + } + prevBits = currentBits; + } + return writer.toByteArray(); + } + + public static double[] decode(final byte[] data) { + if (data == null || data.length == 0) + return new double[0]; + + final DeltaOfDeltaCodec.BitReader reader = new DeltaOfDeltaCodec.BitReader(data); + + final int count = (int) reader.readBits(32); + if (count <= 0 || count > DeltaOfDeltaCodec.MAX_BLOCK_SIZE) + throw new IllegalArgumentException("GorillaXOR decode: invalid count " + count + " (expected 1.." + DeltaOfDeltaCodec.MAX_BLOCK_SIZE + ")"); + final double[] result = new double[count]; + + long prevBits = reader.readBits(64); + result[0] = Double.longBitsToDouble(prevBits); + + if (count == 1) + return result; + + // Initialize to MAX_VALUE to match the encoder's initial state: the first XOR'd value + // always encodes with a new block position (case '11'), which writes prevLeading/prevTrailing. + int prevLeading = Integer.MAX_VALUE; + int prevTrailing = 0; + + for (int i = 1; i < count; i++) { + if (reader.readBit() == 0) { + // Same as previous + result[i] = Double.longBitsToDouble(prevBits); + } else { + long xor; + if (reader.readBit() == 0) { + // Case '10': reuse previous block position + final int blockSize = 64 - prevLeading - prevTrailing; + xor = reader.readBits(blockSize) << prevTrailing; + } else { + // Case '11': new block position + prevLeading = (int) reader.readBits(6); + final int blockSize = (int) reader.readBits(6) + 1; + prevTrailing = 64 - prevLeading - blockSize; + xor = reader.readBits(blockSize) << prevTrailing; + } + prevBits = prevBits ^ xor; + result[i] = Double.longBitsToDouble(prevBits); + } + } + return result; + } + + /** + * Decodes into a pre-allocated output buffer, returning the number of decoded values. + * The output array must be at least as large as the encoded count. + */ + public static int decode(final byte[] data, final double[] output) { + if (data == null || data.length == 0) + return 0; + + final DeltaOfDeltaCodec.BitReader reader = new DeltaOfDeltaCodec.BitReader(data); + final int count = (int) reader.readBits(32); + if (count <= 0 || count > DeltaOfDeltaCodec.MAX_BLOCK_SIZE) + throw new IllegalArgumentException("GorillaXOR decode: invalid count " + count + " (expected 1.." + DeltaOfDeltaCodec.MAX_BLOCK_SIZE + ")"); + + long prevBits = reader.readBits(64); + output[0] = Double.longBitsToDouble(prevBits); + + if (count == 1) + return count; + + // Initialize to MAX_VALUE to match the encoder's initial state: the first XOR'd value + // always encodes with a new block position (case '11'), which writes prevLeading/prevTrailing. + int prevLeading = Integer.MAX_VALUE; + int prevTrailing = 0; + + for (int i = 1; i < count; i++) { + if (reader.readBit() == 0) { + output[i] = Double.longBitsToDouble(prevBits); + } else { + long xor; + if (reader.readBit() == 0) { + final int blockSize = 64 - prevLeading - prevTrailing; + xor = reader.readBits(blockSize) << prevTrailing; + } else { + prevLeading = (int) reader.readBits(6); + final int blockSize = (int) reader.readBits(6) + 1; + prevTrailing = 64 - prevLeading - blockSize; + xor = reader.readBits(blockSize) << prevTrailing; + } + prevBits = prevBits ^ xor; + output[i] = Double.longBitsToDouble(prevBits); + } + } + return count; + } +} diff --git a/engine/src/main/java/com/arcadedb/engine/timeseries/codec/Simple8bCodec.java b/engine/src/main/java/com/arcadedb/engine/timeseries/codec/Simple8bCodec.java new file mode 100644 index 0000000000..d2b6cb7093 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/engine/timeseries/codec/Simple8bCodec.java @@ -0,0 +1,180 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries.codec; + +import java.nio.ByteBuffer; + +/** + * Simple-8b encoding for signed integer arrays using zigzag encoding. + * Signed values are converted to non-negative via zigzag encoding before packing. + *

+ * Supported value range: [-(2^59), (2^59)-1]. + * Values outside this range cause encode() to throw {@link IllegalArgumentException} + * because the maximum selector packs 1 value × 60 bits and ZigZag encoding of the + * boundary value -(2^59) produces exactly (1L<<60)-1, which is the largest encodable value. + * Values with |v| >= 2^59 would silently truncate — validation prevents silent data corruption. + *

+ * Packs multiple integers into 64-bit words using a selector scheme. + * The top 4 bits of each word are the selector (0-14), determining how many + * integers are packed and at what bit width. + *

+ * Selector table (selector → count × bits): + * 0: 240×0 (all zeros), 1: 120×0 (all zeros, half), 2: 60×1, 3: 30×2, + * 4: 20×3, 5: 15×4, 6: 12×5, 7: 10×6, 8: 8×7, 9: 7×8, 10: 6×10, + * 11: 5×12, 12: 4×15, 13: 3×20, 14: 2×30, 15: 1×60 + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public final class Simple8bCodec { + + // selector → number of integers packed + private static final int[] SELECTOR_COUNT = { 240, 120, 60, 30, 20, 15, 12, 10, 8, 7, 6, 5, 4, 3, 2, 1 }; + // selector → bits per integer + private static final int[] SELECTOR_BITS = { 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 15, 20, 30, 60 }; + + // Maximum zigzag-encoded value that fits in 60 bits (selector 15 = 1 value × 60 bits) + static final long MAX_ZIGZAG_VALUE = (1L << 60) - 1; + + private Simple8bCodec() { + } + + public static byte[] encode(final long[] values) { + if (values == null || values.length == 0) + return new byte[0]; + + // Zigzag-encode signed longs to non-negative values before packing. + // Validate that each zigzag-encoded value fits in 60 bits; values outside + // [-(2^59), (2^59)-1] cannot be represented and would silently truncate. + final long[] zigzagged = new long[values.length]; + for (int i = 0; i < values.length; i++) { + final long encoded = zigzagEncode(values[i]); + if (encoded > MAX_ZIGZAG_VALUE) + throw new IllegalArgumentException( + "Value " + values[i] + " at index " + i + " is outside the Simple-8b supported range [-(2^59), (2^59)-1]"); + zigzagged[i] = encoded; + } + + // Worst case: each value needs its own word + header + final ByteBuffer buf = ByteBuffer.allocate(4 + (zigzagged.length + 1) * 8); + buf.putInt(zigzagged.length); + + int pos = 0; + while (pos < zigzagged.length) { + final int remaining = zigzagged.length - pos; + + // Find the best selector + int bestSelector = 15; // fallback: 1 value × 60 bits + for (int sel = 0; sel < 16; sel++) { + final int count = Math.min(SELECTOR_COUNT[sel], remaining); + final int bits = SELECTOR_BITS[sel]; + + if (count <= 0) + continue; + + boolean fits = true; + if (bits == 0) { + // All must be zero + for (int j = 0; j < count; j++) { + if (zigzagged[pos + j] != 0) { + fits = false; + break; + } + } + } else { + final long maxVal = (1L << bits) - 1; + for (int j = 0; j < count; j++) { + if (zigzagged[pos + j] > maxVal) { + fits = false; + break; + } + } + } + + if (fits) { + bestSelector = sel; + break; // Take the first (most compact) selector that fits + } + } + + // Encode the word + final int count = Math.min(SELECTOR_COUNT[bestSelector], remaining); + final int bits = SELECTOR_BITS[bestSelector]; + long word = (long) bestSelector << 60; + + if (bits > 0) { + for (int j = 0; j < count; j++) + word |= (zigzagged[pos + j] & ((1L << bits) - 1)) << (j * bits); + } + + buf.putLong(word); + pos += count; + } + + buf.flip(); + final byte[] result = new byte[buf.remaining()]; + buf.get(result); + return result; + } + + public static long[] decode(final byte[] data) throws java.io.IOException { + if (data == null || data.length == 0) + return new long[0]; + + try { + final ByteBuffer buf = ByteBuffer.wrap(data); + final int totalCount = buf.getInt(); + if (totalCount < 0) + throw new java.io.IOException("Simple8bCodec: negative count " + totalCount + " in header"); + final long[] result = new long[totalCount]; + + int pos = 0; + while (pos < totalCount) { + final long word = buf.getLong(); + final int selector = (int) (word >>> 60) & 0xF; + final int count = Math.min(SELECTOR_COUNT[selector], totalCount - pos); + final int bits = SELECTOR_BITS[selector]; + + if (bits == 0) { + // All zeros — result is already initialized to 0 + pos += count; + } else { + final long mask = (1L << bits) - 1; + for (int j = 0; j < count; j++) { + result[pos + j] = (word >>> (j * bits)) & mask; + } + pos += count; + } + } + // Zigzag-decode back to signed values + for (int i = 0; i < result.length; i++) + result[i] = zigzagDecode(result[i]); + return result; + } catch (final java.nio.BufferUnderflowException e) { + throw new java.io.IOException("Simple8bCodec: malformed data (truncated buffer, size=" + data.length + ")", e); + } + } + + private static long zigzagEncode(final long n) { + return (n << 1) ^ (n >> 63); + } + + private static long zigzagDecode(final long n) { + return (n >>> 1) ^ -(n & 1); + } +} diff --git a/engine/src/main/java/com/arcadedb/engine/timeseries/codec/TimeSeriesCodec.java b/engine/src/main/java/com/arcadedb/engine/timeseries/codec/TimeSeriesCodec.java new file mode 100644 index 0000000000..08bc6364c5 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/engine/timeseries/codec/TimeSeriesCodec.java @@ -0,0 +1,49 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries.codec; + +/** + * Defines the compression codec types used by TimeSeries columnar storage. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public enum TimeSeriesCodec { + DELTA_OF_DELTA(0), + GORILLA_XOR(1), + DICTIONARY(2), + SIMPLE8B(3), + NONE(255); + + private final int code; + + TimeSeriesCodec(final int code) { + this.code = code; + } + + public int getCode() { + return code; + } + + public static TimeSeriesCodec fromCode(final int code) { + for (final TimeSeriesCodec codec : values()) + if (codec.code == code) + return codec; + throw new IllegalArgumentException("Unknown codec code: " + code); + } +} diff --git a/engine/src/main/java/com/arcadedb/engine/timeseries/promql/PromQLEvaluator.java b/engine/src/main/java/com/arcadedb/engine/timeseries/promql/PromQLEvaluator.java new file mode 100644 index 0000000000..8c02eef680 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/engine/timeseries/promql/PromQLEvaluator.java @@ -0,0 +1,731 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries.promql; + +import com.arcadedb.database.DatabaseInternal; +import com.arcadedb.log.LogManager; +import com.arcadedb.engine.timeseries.ColumnDefinition; +import com.arcadedb.engine.timeseries.TagFilter; +import com.arcadedb.engine.timeseries.TimeSeriesEngine; +import com.arcadedb.engine.timeseries.promql.PromQLResult.InstantVector; +import com.arcadedb.engine.timeseries.promql.PromQLResult.MatrixResult; +import com.arcadedb.engine.timeseries.promql.PromQLResult.MatrixSeries; +import com.arcadedb.engine.timeseries.promql.PromQLResult.RangeSeries; +import com.arcadedb.engine.timeseries.promql.PromQLResult.RangeVector; +import com.arcadedb.engine.timeseries.promql.PromQLResult.ScalarResult; +import com.arcadedb.engine.timeseries.promql.PromQLResult.VectorSample; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.AggregationExpr; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.BinaryExpr; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.BinaryOp; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.FunctionCallExpr; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.LabelMatcher; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.MatchOp; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.MatrixSelector; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.NumberLiteral; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.StringLiteral; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.UnaryExpr; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.VectorSelector; +import com.arcadedb.schema.DocumentType; +import com.arcadedb.schema.LocalTimeSeriesType; + +import java.util.ArrayList; +import java.util.Collections; +import java.util.Comparator; +import java.util.HashMap; +import java.util.HashSet; +import java.util.Iterator; +import java.util.LinkedHashMap; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.concurrent.ConcurrentHashMap; +import java.util.logging.Level; +import java.util.regex.Pattern; +import java.util.regex.PatternSyntaxException; + +/** + * Tree-walking interpreter for PromQL AST. + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public class PromQLEvaluator { + + private static final long DEFAULT_LOOKBACK_MS = 5 * 60_000; // 5 minutes + private static final int MAX_RECURSION_DEPTH = 64; + private static final long MAX_RANGE_STEPS = 1_000_000; + + private final DatabaseInternal database; + private final long lookbackMs; + private static final int MAX_REGEX_LENGTH = 1024; + private static final int MAX_PATTERN_CACHE = 1024; + // Detects patterns that cause catastrophic backtracking (ReDoS): + // 1. Nested quantifiers: (a+)+, (a*b)*, (a+){n,} + // 2. Alternation groups with outer quantifier: (a|aa)+ — overlapping alternatives + private static final Pattern REDOS_CHECK = Pattern.compile( + "\\((?:[^()\\\\]|\\\\.)*[+*](?:[^()\\\\]|\\\\.)*\\)[+*{]" // nested quantifier (including {n,}) + + "|\\((?:[^()\\\\]|\\\\.)*\\|(?:[^()\\\\]|\\\\.)*\\)[+*{]" // alternation with quantifier + ); + private final Set warnedMultiFieldTypes = Collections.newSetFromMap(new ConcurrentHashMap<>()); + // LRU-evicting pattern cache: synchronized LinkedHashMap removes eldest entry when full + private final Map patternCache; + + { + // LinkedHashMap with access-order = true gives LRU eviction (oldest access evicted first) + patternCache = Collections.synchronizedMap(new java.util.LinkedHashMap<>(MAX_PATTERN_CACHE, 0.75f, true) { + @Override + protected boolean removeEldestEntry(final Map.Entry eldest) { + return size() > MAX_PATTERN_CACHE; + } + }); + } + + public PromQLEvaluator(final DatabaseInternal database) { + this(database, DEFAULT_LOOKBACK_MS); + } + + public PromQLEvaluator(final DatabaseInternal database, final long lookbackMs) { + this.database = database; + this.lookbackMs = lookbackMs; + } + + /** + * Evaluate an instant query at a single timestamp. + */ + public PromQLResult evaluateInstant(final PromQLExpr expr, final long evalTimeMs) { + return evaluate(expr, evalTimeMs, evalTimeMs, evalTimeMs, 0, 0); + } + + /** + * Evaluate a range query, returning a matrix result with values at each step. + */ + public PromQLResult evaluateRange(final PromQLExpr expr, final long startMs, final long endMs, final long stepMs) { + if (endMs < startMs) + throw new IllegalArgumentException("endMs (" + endMs + ") must be >= startMs (" + startMs + ")"); + if (stepMs <= 0) + throw new IllegalArgumentException("stepMs must be positive, got: " + stepMs); + + final long maxSteps = (endMs - startMs) / stepMs + 1; + if (maxSteps > MAX_RANGE_STEPS) + throw new IllegalArgumentException( + "Range query would produce " + maxSteps + " steps, exceeding maximum of " + MAX_RANGE_STEPS + + ". Increase stepMs or reduce the time range"); + + // For range queries, evaluate at each step point and collect into MatrixResult + final Map> seriesMap = new LinkedHashMap<>(); + final Map> labelsMap = new LinkedHashMap<>(); + + for (long t = startMs; t <= endMs; t += stepMs) { + final PromQLResult result = evaluate(expr, t, startMs, endMs, stepMs, 0); + if (result instanceof InstantVector iv) { + for (final VectorSample sample : iv.samples()) { + final String key = labelKey(sample.labels()); + seriesMap.computeIfAbsent(key, k -> new ArrayList<>()).add(new double[] { sample.timestampMs(), sample.value() }); + labelsMap.putIfAbsent(key, sample.labels()); + } + } else if (result instanceof ScalarResult sr) { + final String key = "{}"; + seriesMap.computeIfAbsent(key, k -> new ArrayList<>()).add(new double[] { t, sr.value() }); + labelsMap.putIfAbsent(key, Map.of()); + } + } + + final List series = new ArrayList<>(); + for (final Map.Entry> entry : seriesMap.entrySet()) + series.add(new MatrixSeries(labelsMap.get(entry.getKey()), entry.getValue())); + return new MatrixResult(series); + } + + private PromQLResult evaluate(final PromQLExpr expr, final long evalTimeMs, final long queryStartMs, final long queryEndMs, + final long stepMs, final int depth) { + if (depth > MAX_RECURSION_DEPTH) + throw new IllegalArgumentException("PromQL expression exceeds maximum nesting depth of " + MAX_RECURSION_DEPTH); + final int next = depth + 1; + return switch (expr) { + case NumberLiteral nl -> new ScalarResult(nl.value(), evalTimeMs); + case StringLiteral ignored -> new ScalarResult(Double.NaN, evalTimeMs); + case VectorSelector vs -> evaluateVectorSelector(vs, evalTimeMs); + case MatrixSelector ms -> evaluateMatrixSelector(ms, evalTimeMs); + case AggregationExpr agg -> evaluateAggregation(agg, evalTimeMs, queryStartMs, queryEndMs, stepMs, next); + case FunctionCallExpr fn -> evaluateFunction(fn, evalTimeMs, queryStartMs, queryEndMs, stepMs, next); + case BinaryExpr bin -> evaluateBinary(bin, evalTimeMs, queryStartMs, queryEndMs, stepMs, next); + case UnaryExpr un -> evaluateUnary(un, evalTimeMs, queryStartMs, queryEndMs, stepMs, next); + }; + } + + private PromQLResult evaluateVectorSelector(final VectorSelector vs, final long evalTimeMs) { + final String typeName = sanitizeTypeName(vs.metricName()); + if (!database.getSchema().existsType(typeName)) + return new InstantVector(List.of()); + + final DocumentType docType = database.getSchema().getType(typeName); + if (!(docType instanceof LocalTimeSeriesType tsType) || tsType.getEngine() == null) + return new InstantVector(List.of()); + + final TimeSeriesEngine engine = tsType.getEngine(); + final List columns = tsType.getTsColumns(); + + warnIfMultipleFields(columns, vs.metricName()); + + final TagFilter tagFilter = buildTagFilter(vs.matchers(), columns); + final long offset = vs.offsetMs(); + final long queryEnd = evalTimeMs - offset; + final long queryStart = queryEnd - lookbackMs; + + final Iterator rowIter; + try { + rowIter = engine.iterateQuery(queryStart, queryEnd, null, tagFilter); + } catch (final Exception e) { + LogManager.instance().log(this, Level.WARNING, + "Error querying TimeSeries type '%s': %s", null, typeName, e.getMessage()); + return new InstantVector(List.of()); + } + + // Post-filter for NEQ/RE/NRE and group by label combination + final Map latestByLabels = new LinkedHashMap<>(); + while (rowIter.hasNext()) { + final Object[] row = rowIter.next(); + if (!matchesPostFilters(row, vs.matchers(), columns)) + continue; + final Map labels = extractLabels(row, columns, vs.metricName()); + final String key = labelKey(labels); + final long ts = (long) row[0]; + final double value = extractValue(row, columns); + // Keep the latest sample for each label combination + latestByLabels.put(key, new VectorSample(labels, value, ts)); + } + + return new InstantVector(new ArrayList<>(latestByLabels.values())); + } + + private PromQLResult evaluateMatrixSelector(final MatrixSelector ms, final long evalTimeMs) { + final VectorSelector vs = ms.selector(); + final String typeName = sanitizeTypeName(vs.metricName()); + if (!database.getSchema().existsType(typeName)) + return new RangeVector(List.of()); + + final DocumentType docType = database.getSchema().getType(typeName); + if (!(docType instanceof LocalTimeSeriesType tsType) || tsType.getEngine() == null) + return new RangeVector(List.of()); + + final TimeSeriesEngine engine = tsType.getEngine(); + final List columns = tsType.getTsColumns(); + + warnIfMultipleFields(columns, vs.metricName()); + + final TagFilter tagFilter = buildTagFilter(vs.matchers(), columns); + final long offset = vs.offsetMs(); + final long queryEnd = evalTimeMs - offset; + final long queryStart = queryEnd - ms.rangeMs(); + + final Iterator rowIter; + try { + rowIter = engine.iterateQuery(queryStart, queryEnd, null, tagFilter); + } catch (final Exception e) { + LogManager.instance().log(this, Level.WARNING, + "Error querying TimeSeries type '%s': %s", null, typeName, e.getMessage()); + return new RangeVector(List.of()); + } + + // Group rows by label combination + final Map> seriesByLabels = new LinkedHashMap<>(); + final Map> labelsMap = new LinkedHashMap<>(); + while (rowIter.hasNext()) { + final Object[] row = rowIter.next(); + if (!matchesPostFilters(row, vs.matchers(), columns)) + continue; + final Map labels = extractLabels(row, columns, vs.metricName()); + final String key = labelKey(labels); + seriesByLabels.computeIfAbsent(key, k -> new ArrayList<>()).add(new double[] { (long) row[0], extractValue(row, columns) }); + labelsMap.putIfAbsent(key, labels); + } + + final List result = new ArrayList<>(); + for (final Map.Entry> entry : seriesByLabels.entrySet()) + result.add(new RangeSeries(labelsMap.get(entry.getKey()), entry.getValue())); + return new RangeVector(result); + } + + private PromQLResult evaluateAggregation(final AggregationExpr agg, final long evalTimeMs, final long queryStartMs, + final long queryEndMs, final long stepMs, final int depth) { + final PromQLResult inner = evaluate(agg.expr(), evalTimeMs, queryStartMs, queryEndMs, stepMs, depth); + if (!(inner instanceof InstantVector iv)) + return new InstantVector(List.of()); + + // Group samples by labels + final Map> groups = new LinkedHashMap<>(); + for (final VectorSample sample : iv.samples()) { + final Map groupKey = computeGroupLabels(sample.labels(), agg.groupLabels(), agg.without()); + final String key = labelKey(groupKey); + groups.computeIfAbsent(key, k -> new ArrayList<>()).add(sample); + } + + final List result = new ArrayList<>(); + for (final Map.Entry> entry : groups.entrySet()) { + final List group = entry.getValue(); + final Map groupLabels = computeGroupLabels(group.getFirst().labels(), agg.groupLabels(), agg.without()); + final long ts = group.getFirst().timestampMs(); + + final double value = switch (agg.op()) { + case SUM -> { + double sum = 0; + for (final VectorSample s : group) sum += s.value(); + yield sum; + } + case AVG -> { + double sum = 0; + for (final VectorSample s : group) sum += s.value(); + yield sum / group.size(); + } + case MIN -> { + double min = Double.POSITIVE_INFINITY; + for (final VectorSample s : group) if (s.value() < min) min = s.value(); + yield min; + } + case MAX -> { + double max = Double.NEGATIVE_INFINITY; + for (final VectorSample s : group) if (s.value() > max) max = s.value(); + yield max; + } + case COUNT -> (double) group.size(); + case TOPK -> { + final int k = agg.param() != null ? (int) ((NumberLiteral) agg.param()).value() : 1; + group.sort(Comparator.comparingDouble(VectorSample::value).reversed()); + for (int i = 0; i < Math.min(k, group.size()); i++) + result.add(group.get(i)); + yield Double.NaN; // topk adds samples directly + } + case BOTTOMK -> { + final int k = agg.param() != null ? (int) ((NumberLiteral) agg.param()).value() : 1; + group.sort(Comparator.comparingDouble(VectorSample::value)); + for (int i = 0; i < Math.min(k, group.size()); i++) + result.add(group.get(i)); + yield Double.NaN; // bottomk adds samples directly + } + }; + + if (agg.op() != PromQLExpr.AggOp.TOPK && agg.op() != PromQLExpr.AggOp.BOTTOMK) + result.add(new VectorSample(groupLabels, value, ts)); + } + + return new InstantVector(result); + } + + private PromQLResult evaluateFunction(final FunctionCallExpr fn, final long evalTimeMs, final long queryStartMs, + final long queryEndMs, final long stepMs, final int depth) { + final String name = fn.name().toLowerCase(); + + // Range-vector functions + if (isRangeFunction(name)) { + if (fn.args().isEmpty()) + throw new IllegalArgumentException("Function '" + fn.name() + "' requires a range vector argument"); + final PromQLResult argResult = evaluate(fn.args().getFirst(), evalTimeMs, queryStartMs, queryEndMs, stepMs, depth); + if (!(argResult instanceof RangeVector rv)) + throw new IllegalArgumentException("Function '" + fn.name() + "' requires a range vector argument"); + + final List samples = new ArrayList<>(); + for (final RangeSeries series : rv.series()) { + final double value = switch (name) { + case "rate" -> PromQLFunctions.rate(series); + case "irate" -> PromQLFunctions.irate(series); + case "increase" -> PromQLFunctions.increase(series); + case "sum_over_time" -> PromQLFunctions.sumOverTime(series); + case "avg_over_time" -> PromQLFunctions.avgOverTime(series); + case "min_over_time" -> PromQLFunctions.minOverTime(series); + case "max_over_time" -> PromQLFunctions.maxOverTime(series); + case "count_over_time" -> PromQLFunctions.countOverTime(series); + default -> throw new IllegalArgumentException("Unknown range function: " + fn.name()); + }; + samples.add(new VectorSample(series.labels(), value, evalTimeMs)); + } + return new InstantVector(samples); + } + + // Scalar functions + if ("abs".equals(name) || "ceil".equals(name) || "floor".equals(name) || "round".equals(name)) + return evaluateScalarFunction(fn, evalTimeMs, queryStartMs, queryEndMs, stepMs, depth); + + throw new IllegalArgumentException("Unknown function: " + fn.name()); + } + + private PromQLResult evaluateScalarFunction(final FunctionCallExpr fn, final long evalTimeMs, final long queryStartMs, + final long queryEndMs, final long stepMs, final int depth) { + final String name = fn.name().toLowerCase(); + final PromQLResult argResult = evaluate(fn.args().getFirst(), evalTimeMs, queryStartMs, queryEndMs, stepMs, depth); + final double param = extractSecondParam(fn, evalTimeMs, queryStartMs, queryEndMs, stepMs, depth); + + if (argResult instanceof ScalarResult sr) + return new ScalarResult(applyScalarFn(name, sr.value(), param), evalTimeMs); + + if (argResult instanceof InstantVector iv) { + final List samples = new ArrayList<>(); + for (final VectorSample sample : iv.samples()) + samples.add(new VectorSample(sample.labels(), applyScalarFn(name, sample.value(), param), sample.timestampMs())); + return new InstantVector(samples); + } + + return argResult; + } + + private double extractSecondParam(final FunctionCallExpr fn, final long evalTimeMs, final long queryStartMs, + final long queryEndMs, final long stepMs, final int depth) { + if (fn.args().size() <= 1) + return 1.0; + final PromQLResult paramResult = evaluate(fn.args().get(1), evalTimeMs, queryStartMs, queryEndMs, stepMs, depth); + if (paramResult instanceof ScalarResult sr) + return sr.value(); + return 1.0; + } + + private double applyScalarFn(final String name, final double value, final double param) { + return switch (name) { + case "abs" -> PromQLFunctions.abs(value); + case "ceil" -> PromQLFunctions.ceil(value); + case "floor" -> PromQLFunctions.floor(value); + case "round" -> PromQLFunctions.round(value, param); + default -> value; + }; + } + + private PromQLResult evaluateBinary(final BinaryExpr bin, final long evalTimeMs, final long queryStartMs, + final long queryEndMs, final long stepMs, final int depth) { + final PromQLResult leftResult = evaluate(bin.left(), evalTimeMs, queryStartMs, queryEndMs, stepMs, depth); + final PromQLResult rightResult = evaluate(bin.right(), evalTimeMs, queryStartMs, queryEndMs, stepMs, depth); + + // scalar-scalar + if (leftResult instanceof ScalarResult ls && rightResult instanceof ScalarResult rs) + return new ScalarResult(applyBinaryOp(bin.op(), ls.value(), rs.value()), evalTimeMs); + + // vector-scalar + if (leftResult instanceof InstantVector iv && rightResult instanceof ScalarResult rs) + return applyVectorScalar(iv, rs.value(), bin.op(), false); + + // scalar-vector + if (leftResult instanceof ScalarResult ls && rightResult instanceof InstantVector iv) + return applyVectorScalar(iv, ls.value(), bin.op(), true); + + // vector-vector + if (leftResult instanceof InstantVector lv && rightResult instanceof InstantVector rv) + return applyVectorVector(lv, rv, bin.op()); + + return new ScalarResult(Double.NaN, evalTimeMs); + } + + private PromQLResult evaluateUnary(final UnaryExpr un, final long evalTimeMs, final long queryStartMs, final long queryEndMs, + final long stepMs, final int depth) { + final PromQLResult result = evaluate(un.expr(), evalTimeMs, queryStartMs, queryEndMs, stepMs, depth); + if (un.op() == '-') { + if (result instanceof ScalarResult sr) + return new ScalarResult(-sr.value(), sr.timestampMs()); + if (result instanceof InstantVector iv) { + final List samples = new ArrayList<>(); + for (final VectorSample s : iv.samples()) + samples.add(new VectorSample(s.labels(), -s.value(), s.timestampMs())); + return new InstantVector(samples); + } + } + return result; + } + + // --- Helper methods --- + + private Pattern compilePattern(final String regex) { + if (regex.length() > MAX_REGEX_LENGTH) + throw new IllegalArgumentException("Regex pattern exceeds maximum length of " + MAX_REGEX_LENGTH + " characters"); + // Reject patterns that cause catastrophic backtracking (ReDoS): + // covers nested quantifiers (a+)+, (a*b)*, (a+){n,} and alternation groups (a|aa)+ + if (REDOS_CHECK.matcher(regex).find()) + throw new IllegalArgumentException( + "Regex pattern is not allowed (ReDoS risk): " + regex); + // LRU-evicting cache: the synchronized LinkedHashMap automatically removes the + // eldest entry when size exceeds MAX_PATTERN_CACHE, avoiding thundering-herd + // re-compilation that bulk-clear caused. + synchronized (patternCache) { + return patternCache.computeIfAbsent(regex, r -> { + try { + return Pattern.compile(r); + } catch (final PatternSyntaxException e) { + throw new IllegalArgumentException("Invalid regex pattern: " + r, e); + } + }); + } + } + + private TagFilter buildTagFilter(final List matchers, final List columns) { + TagFilter filter = null; + for (final LabelMatcher m : matchers) { + if (m.op() != MatchOp.EQ || "__name__".equals(m.name())) + continue; + final int idx = findNonTsColumnIndex(m.name(), columns); + if (idx < 0) + continue; + filter = filter == null ? TagFilter.eq(idx, m.value()) : filter.and(idx, m.value()); + } + return filter; + } + + private boolean matchesPostFilters(final Object[] row, final List matchers, + final List columns) { + for (final LabelMatcher m : matchers) { + if (m.op() == MatchOp.EQ || "__name__".equals(m.name())) + continue; + final int rowIdx = findNonTsRowIndex(m.name(), columns); + if (rowIdx < 0) + return false; + final Object val = rowIdx < row.length ? row[rowIdx] : null; + final String strVal = val != null ? val.toString() : ""; + switch (m.op()) { + case NEQ: + if (strVal.equals(m.value())) + return false; + break; + case RE: + if (!compilePattern(m.value()).matcher(strVal).matches()) + return false; + break; + case NRE: + if (compilePattern(m.value()).matcher(strVal).matches()) + return false; + break; + default: + break; + } + } + return true; + } + + /** + * Extracts label key/value pairs from a row. + * The row format is always {@code [timestamp, non-ts-col-0, non-ts-col-1, ...]}, so we + * iterate non-TIMESTAMP columns and map them to row positions 1, 2, 3… regardless of + * where the TIMESTAMP column appears in the schema definition. + */ + private Map extractLabels(final Object[] row, final List columns, final String metricName) { + final Map labels = new LinkedHashMap<>(); + labels.put("__name__", metricName); + int nonTsIdx = 0; + for (final ColumnDefinition col : columns) { + if (col.getRole() == ColumnDefinition.ColumnRole.TIMESTAMP) + continue; + final int rowPos = 1 + nonTsIdx; + nonTsIdx++; + if (col.getRole() == ColumnDefinition.ColumnRole.TAG && rowPos < row.length && row[rowPos] != null) + labels.put(col.getName(), row[rowPos].toString()); + } + return labels; + } + + /** + * Extracts the first FIELD column value from a row. + * The row format is always {@code [timestamp, non-ts-col-0, non-ts-col-1, ...]}, so we + * iterate non-TIMESTAMP columns and map them to row positions 1, 2, 3… regardless of + * where the TIMESTAMP column appears in the schema definition. + */ + private double extractValue(final Object[] row, final List columns) { + int nonTsIdx = 0; + for (final ColumnDefinition col : columns) { + if (col.getRole() == ColumnDefinition.ColumnRole.TIMESTAMP) + continue; + final int rowPos = 1 + nonTsIdx; + nonTsIdx++; + if (col.getRole() == ColumnDefinition.ColumnRole.FIELD) + return rowPos < row.length && row[rowPos] instanceof Number ? ((Number) row[rowPos]).doubleValue() : Double.NaN; + } + return Double.NaN; + } + + /** + * Logs a warning if the given column list has more than one FIELD column, + * since PromQL evaluation only uses the first one. Logged once per type name. + */ + private void warnIfMultipleFields(final List columns, final String typeName) { + if (warnedMultiFieldTypes.contains(typeName)) + return; + int fieldCount = 0; + String firstName = null; + for (final ColumnDefinition col : columns) { + if (col.getRole() == ColumnDefinition.ColumnRole.FIELD) { + if (firstName == null) + firstName = col.getName(); + fieldCount++; + } + } + if (fieldCount > 1) { + warnedMultiFieldTypes.add(typeName); + LogManager.instance().log(this, Level.WARNING, + "PromQL evaluation: type '%s' has %d FIELD columns but only the first ('%s') is used. " + + "Use explicit column selection or split into separate types", + null, typeName, fieldCount, firstName); + } + } + + private int findNonTsColumnIndex(final String name, final List columns) { + int nonTsIdx = -1; + for (final ColumnDefinition col : columns) { + if (col.getRole() == ColumnDefinition.ColumnRole.TIMESTAMP) + continue; + nonTsIdx++; + if (col.getName().equals(name)) + return nonTsIdx; + } + return -1; + } + + /** + * Returns the row index for a named column in the engine row format + * {@code [timestamp, non-ts-col-0, non-ts-col-1, ...]}. + * Returns -1 if the column is not found or is the TIMESTAMP column. + */ + private int findNonTsRowIndex(final String name, final List columns) { + int rowIdx = 1; // row[0] is always the timestamp + for (final ColumnDefinition col : columns) { + if (col.getRole() == ColumnDefinition.ColumnRole.TIMESTAMP) + continue; + if (col.getName().equals(name)) + return rowIdx; + rowIdx++; + } + return -1; + } + + private Map computeGroupLabels(final Map labels, final List groupLabels, + final boolean without) { + if (groupLabels.isEmpty() && !without) + return Map.of(); + final Map result = new LinkedHashMap<>(); + if (without) { + final Set exclude = new HashSet<>(groupLabels); + for (final Map.Entry entry : labels.entrySet()) + if (!exclude.contains(entry.getKey())) + result.put(entry.getKey(), entry.getValue()); + } else { + for (final String label : groupLabels) + if (labels.containsKey(label)) + result.put(label, labels.get(label)); + } + return result; + } + + private String labelKey(final Map labels) { + if (labels.isEmpty()) + return "{}"; + final List sorted = new ArrayList<>(labels.keySet()); + Collections.sort(sorted); + final StringBuilder sb = new StringBuilder("{"); + for (int i = 0; i < sorted.size(); i++) { + if (i > 0) + sb.append(','); + sb.append(sorted.get(i)).append('=').append(labels.get(sorted.get(i))); + } + sb.append('}'); + return sb.toString(); + } + + private InstantVector applyVectorScalar(final InstantVector iv, final double scalar, final BinaryOp op, + final boolean scalarOnLeft) { + final List result = new ArrayList<>(); + for (final VectorSample s : iv.samples()) { + final double value = scalarOnLeft ? applyBinaryOp(op, scalar, s.value()) : applyBinaryOp(op, s.value(), scalar); + if (!isComparisonOp(op) || value != 0) + result.add(new VectorSample(s.labels(), isComparisonOp(op) ? s.value() : value, s.timestampMs())); + } + return new InstantVector(result); + } + + private InstantVector applyVectorVector(final InstantVector left, final InstantVector right, final BinaryOp op) { + // Simple vector-vector matching by label identity + final Map rightMap = new HashMap<>(); + for (final VectorSample s : right.samples()) + rightMap.put(labelKey(s.labels()), s); + + final List result = new ArrayList<>(); + for (final VectorSample ls : left.samples()) { + final VectorSample rs = rightMap.get(labelKey(ls.labels())); + if (rs != null) { + if (op == BinaryOp.AND) { + result.add(ls); + } else if (op == BinaryOp.UNLESS) { + // skip — matched, so excluded + } else { + final double value = applyBinaryOp(op, ls.value(), rs.value()); + if (!isComparisonOp(op) || value != 0) + result.add(new VectorSample(ls.labels(), isComparisonOp(op) ? ls.value() : value, ls.timestampMs())); + } + } else if (op == BinaryOp.OR || op == BinaryOp.UNLESS) { + result.add(ls); + } + } + // For OR, also add unmatched right-side samples + if (op == BinaryOp.OR) { + final Set leftKeys = new HashSet<>(); + for (final VectorSample s : left.samples()) + leftKeys.add(labelKey(s.labels())); + for (final VectorSample rs : right.samples()) + if (!leftKeys.contains(labelKey(rs.labels()))) + result.add(rs); + } + return new InstantVector(result); + } + + private double applyBinaryOp(final BinaryOp op, final double left, final double right) { + return switch (op) { + case ADD -> left + right; + case SUB -> left - right; + case MUL -> left * right; + case DIV -> right == 0 ? Double.NaN : left / right; + case MOD -> right == 0 ? Double.NaN : left % right; + case POW -> Math.pow(left, right); + case EQ -> left == right ? 1.0 : 0.0; + case NEQ -> left != right ? 1.0 : 0.0; + case LT -> left < right ? 1.0 : 0.0; + case GT -> left > right ? 1.0 : 0.0; + case LTE -> left <= right ? 1.0 : 0.0; + case GTE -> left >= right ? 1.0 : 0.0; + case AND, OR, UNLESS -> Double.NaN; // handled at vector level + }; + } + + private boolean isComparisonOp(final BinaryOp op) { + return op == BinaryOp.EQ || op == BinaryOp.NEQ || op == BinaryOp.LT || op == BinaryOp.GT || op == BinaryOp.LTE + || op == BinaryOp.GTE; + } + + private boolean isRangeFunction(final String name) { + return switch (name) { + case "rate", "irate", "increase", "sum_over_time", "avg_over_time", "min_over_time", "max_over_time", "count_over_time" -> + true; + default -> false; + }; + } + + private static final int MAX_TYPE_NAME_LENGTH = 256; + private static final Pattern VALID_TYPE_NAME_PATTERN = Pattern.compile("[a-zA-Z_][a-zA-Z0-9_]*"); + + public static String sanitizeTypeName(final String name) { + final String sanitized = name.replace('.', '_').replace('-', '_').replace(':', '_'); + if (!sanitized.equals(name)) + LogManager.instance().log(PromQLEvaluator.class, Level.WARNING, + "Metric name '%s' was sanitized to '%s'. Distinct Prometheus names that differ only by '.', '-', ':' vs '_' " + + "will map to the same ArcadeDB type and may return merged data", null, name, sanitized); + if (sanitized.length() > MAX_TYPE_NAME_LENGTH) + throw new IllegalArgumentException("Metric name too long: " + sanitized.length() + " chars (max " + MAX_TYPE_NAME_LENGTH + ")"); + if (!VALID_TYPE_NAME_PATTERN.matcher(sanitized).matches()) + throw new IllegalArgumentException("Invalid metric name: '" + name + "'"); + return sanitized; + } +} diff --git a/engine/src/main/java/com/arcadedb/engine/timeseries/promql/PromQLFunctions.java b/engine/src/main/java/com/arcadedb/engine/timeseries/promql/PromQLFunctions.java new file mode 100644 index 0000000000..a2c75acb21 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/engine/timeseries/promql/PromQLFunctions.java @@ -0,0 +1,149 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries.promql; + +import com.arcadedb.engine.timeseries.promql.PromQLResult.RangeSeries; + +import java.util.List; + +/** + * Static function implementations for PromQL functions. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public final class PromQLFunctions { + + private PromQLFunctions() { + } + + /** + * Per-second rate of increase, with counter-reset handling. + */ + public static double rate(final RangeSeries series) { + final List values = series.values(); + if (values.size() < 2) + return 0.0; + final double[] first = values.getFirst(); + final double[] last = values.getLast(); + final double durationSec = (last[0] - first[0]) / 1000.0; + if (durationSec <= 0) + return 0.0; + + double totalIncrease = 0; + double prev = first[1]; + for (int i = 1; i < values.size(); i++) { + final double current = values.get(i)[1]; + if (current < prev) + totalIncrease += current; // counter reset + else + totalIncrease += current - prev; + prev = current; + } + return totalIncrease / durationSec; + } + + /** + * Instant rate from the last two points. + */ + public static double irate(final RangeSeries series) { + final List values = series.values(); + if (values.size() < 2) + return 0.0; + final double[] prev = values.get(values.size() - 2); + final double[] last = values.getLast(); + final double durationSec = (last[0] - prev[0]) / 1000.0; + if (durationSec <= 0) + return 0.0; + double diff = last[1] - prev[1]; + if (diff < 0) + diff = last[1]; // counter reset + return diff / durationSec; + } + + /** + * Total increase over the range, with counter-reset handling. + */ + public static double increase(final RangeSeries series) { + final List values = series.values(); + if (values.size() < 2) + return 0.0; + double totalIncrease = 0; + double prev = values.getFirst()[1]; + for (int i = 1; i < values.size(); i++) { + final double current = values.get(i)[1]; + if (current < prev) + totalIncrease += current; // counter reset + else + totalIncrease += current - prev; + prev = current; + } + return totalIncrease; + } + + public static double sumOverTime(final RangeSeries series) { + double sum = 0; + for (final double[] v : series.values()) + sum += v[1]; + return sum; + } + + public static double avgOverTime(final RangeSeries series) { + if (series.values().isEmpty()) + return 0.0; + return sumOverTime(series) / series.values().size(); + } + + public static double minOverTime(final RangeSeries series) { + double min = Double.POSITIVE_INFINITY; + for (final double[] v : series.values()) + if (v[1] < min) + min = v[1]; + return min; + } + + public static double maxOverTime(final RangeSeries series) { + double max = Double.NEGATIVE_INFINITY; + for (final double[] v : series.values()) + if (v[1] > max) + max = v[1]; + return max; + } + + public static double countOverTime(final RangeSeries series) { + return series.values().size(); + } + + public static double abs(final double value) { + return Math.abs(value); + } + + public static double ceil(final double value) { + return Math.ceil(value); + } + + public static double floor(final double value) { + return Math.floor(value); + } + + public static double round(final double value, final double toNearest) { + if (toNearest == 0) + return value; + return Math.round(value / toNearest) * toNearest; + } +} diff --git a/engine/src/main/java/com/arcadedb/engine/timeseries/promql/PromQLParser.java b/engine/src/main/java/com/arcadedb/engine/timeseries/promql/PromQLParser.java new file mode 100644 index 0000000000..80337ec952 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/engine/timeseries/promql/PromQLParser.java @@ -0,0 +1,573 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries.promql; + +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.AggOp; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.AggregationExpr; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.BinaryExpr; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.BinaryOp; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.FunctionCallExpr; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.LabelMatcher; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.MatchOp; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.MatrixSelector; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.NumberLiteral; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.StringLiteral; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.UnaryExpr; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.VectorSelector; + +import java.util.ArrayList; +import java.util.List; +import java.util.Set; + +/** + * Recursive-descent parser for PromQL expressions. + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public class PromQLParser { + + private static final Set AGG_OPS = Set.of("sum", "avg", "min", "max", "count", "topk", "bottomk"); + private static final int MAX_PARSE_DEPTH = 128; + + private final Lexer lexer; + private int parseDepth; + + public PromQLParser(final String input) { + this.lexer = new Lexer(input); + } + + public PromQLExpr parse() { + final PromQLExpr expr = parseOr(); + if (lexer.hasMore()) + throw new IllegalArgumentException("Unexpected token: '" + lexer.peek() + "' at position " + lexer.pos); + return expr; + } + + // --- Operator precedence chain --- + + private PromQLExpr parseOr() { + if (++parseDepth > MAX_PARSE_DEPTH) + throw new IllegalArgumentException("PromQL expression exceeds maximum nesting depth of " + MAX_PARSE_DEPTH); + try { + PromQLExpr left = parseAndUnless(); + while (lexer.matchKeyword("or")) { + left = new BinaryExpr(left, BinaryOp.OR, parseAndUnless()); + } + return left; + } finally { + parseDepth--; + } + } + + private PromQLExpr parseAndUnless() { + PromQLExpr left = parseComparison(); + while (true) { + if (lexer.matchKeyword("and")) + left = new BinaryExpr(left, BinaryOp.AND, parseComparison()); + else if (lexer.matchKeyword("unless")) + left = new BinaryExpr(left, BinaryOp.UNLESS, parseComparison()); + else + break; + } + return left; + } + + private PromQLExpr parseComparison() { + PromQLExpr left = parseAddSub(); + while (true) { + final BinaryOp op; + if (lexer.match("==")) + op = BinaryOp.EQ; + else if (lexer.match("!=")) + op = BinaryOp.NEQ; + else if (lexer.match("<=")) + op = BinaryOp.LTE; + else if (lexer.match(">=")) + op = BinaryOp.GTE; + else if (lexer.match("<")) + op = BinaryOp.LT; + else if (lexer.match(">")) + op = BinaryOp.GT; + else + break; + left = new BinaryExpr(left, op, parseAddSub()); + } + return left; + } + + private PromQLExpr parseAddSub() { + PromQLExpr left = parseMulDiv(); + while (true) { + if (lexer.match("+")) + left = new BinaryExpr(left, BinaryOp.ADD, parseMulDiv()); + else if (lexer.matchMinus()) + left = new BinaryExpr(left, BinaryOp.SUB, parseMulDiv()); + else + break; + } + return left; + } + + private PromQLExpr parseMulDiv() { + PromQLExpr left = parsePow(); + while (true) { + if (lexer.match("*")) + left = new BinaryExpr(left, BinaryOp.MUL, parsePow()); + else if (lexer.match("/")) + left = new BinaryExpr(left, BinaryOp.DIV, parsePow()); + else if (lexer.match("%")) + left = new BinaryExpr(left, BinaryOp.MOD, parsePow()); + else + break; + } + return left; + } + + private PromQLExpr parsePow() { + PromQLExpr left = parseUnary(); + if (lexer.match("^")) { + if (++parseDepth > MAX_PARSE_DEPTH) + throw new IllegalArgumentException("PromQL expression exceeds maximum nesting depth of " + MAX_PARSE_DEPTH); + try { + left = new BinaryExpr(left, BinaryOp.POW, parsePow()); // right-associative + } finally { + parseDepth--; + } + } + return left; + } + + private PromQLExpr parseUnary() { + if (lexer.matchMinus()) { + if (++parseDepth > MAX_PARSE_DEPTH) + throw new IllegalArgumentException("PromQL expression exceeds maximum nesting depth of " + MAX_PARSE_DEPTH); + try { + return new UnaryExpr('-', parseUnary()); + } finally { + parseDepth--; + } + } + if (lexer.match("+")) { + if (++parseDepth > MAX_PARSE_DEPTH) + throw new IllegalArgumentException("PromQL expression exceeds maximum nesting depth of " + MAX_PARSE_DEPTH); + try { + return parseUnary(); + } finally { + parseDepth--; + } + } + return parsePrimary(); + } + + private PromQLExpr parsePrimary() { + lexer.skipWhitespace(); + + // Parenthesized expression + if (lexer.match("(")) { + final PromQLExpr expr = parseOr(); + lexer.expect(")"); + return maybeMatrixOrOffset(expr); + } + + // String literal + if (lexer.peekChar() == '"' || lexer.peekChar() == '\'') + return new StringLiteral(lexer.readString()); + + // Number literal + if (isNumberStart()) + return new NumberLiteral(lexer.readNumber()); + + // Identifier: could be aggregation, function, or metric name + final String ident = lexer.readIdent(); + if (ident.isEmpty()) + throw new IllegalArgumentException("Expected expression at position " + lexer.pos); + + final String lower = ident.toLowerCase(); + + // Aggregation operator + if (AGG_OPS.contains(lower)) + return parseAggregation(lower); + + // Function call or vector selector + lexer.skipWhitespace(); + if (lexer.peekChar() == '(' && !lexer.peekChar(1, '{')) + return parseFunctionCall(ident); + + // Vector selector + return parseVectorSelector(ident); + } + + private boolean isNumberStart() { + final char c = lexer.peekChar(); + if (c >= '0' && c <= '9') + return true; + // Check for .5 style numbers + return c == '.' && lexer.pos + 1 < lexer.input.length() && lexer.input.charAt(lexer.pos + 1) >= '0' + && lexer.input.charAt(lexer.pos + 1) <= '9'; + } + + private PromQLExpr parseAggregation(final String opName) { + final AggOp op = AggOp.valueOf(opName.toUpperCase()); + + lexer.skipWhitespace(); + List groupLabels = List.of(); + boolean without = false; + + // Check for by/without BEFORE the parenthesized expression + if (lexer.matchKeyword("by")) { + groupLabels = parseLabelList(); + } else if (lexer.matchKeyword("without")) { + without = true; + groupLabels = parseLabelList(); + } + + lexer.skipWhitespace(); + lexer.expect("("); + PromQLExpr param = null; + + // topk/bottomk have a parameter + if (op == AggOp.TOPK || op == AggOp.BOTTOMK) { + param = parseOr(); + lexer.expect(","); + } + + final PromQLExpr expr = parseOr(); + lexer.expect(")"); + + // Check for by/without AFTER the parenthesized expression + if (groupLabels.isEmpty()) { + lexer.skipWhitespace(); + if (lexer.matchKeyword("by")) { + groupLabels = parseLabelList(); + } else if (lexer.matchKeyword("without")) { + without = true; + groupLabels = parseLabelList(); + } + } + + return new AggregationExpr(op, expr, groupLabels, without, param); + } + + private List parseLabelList() { + lexer.skipWhitespace(); + lexer.expect("("); + final List labels = new ArrayList<>(); + while (!lexer.match(")")) { + if (!labels.isEmpty()) + lexer.expect(","); + labels.add(lexer.readIdent()); + } + return labels; + } + + private PromQLExpr parseFunctionCall(final String name) { + lexer.expect("("); + final List args = new ArrayList<>(); + while (!lexer.match(")")) { + if (!args.isEmpty()) + lexer.expect(","); + args.add(parseOr()); + } + return new FunctionCallExpr(name, args); + } + + private PromQLExpr parseVectorSelector(final String metricName) { + final List matchers = new ArrayList<>(); + lexer.skipWhitespace(); + + // Optional label matchers: {key="value", ...} + if (lexer.match("{")) { + while (!lexer.match("}")) { + if (!matchers.isEmpty()) + lexer.expect(","); + final String labelName = lexer.readIdent(); + final MatchOp matchOp = lexer.readMatchOp(); + final String labelValue = lexer.readString(); + matchers.add(new LabelMatcher(labelName, matchOp, labelValue)); + } + } + + PromQLExpr result = new VectorSelector(metricName, matchers, 0); + return maybeMatrixOrOffset(result); + } + + private PromQLExpr maybeMatrixOrOffset(final PromQLExpr expr) { + lexer.skipWhitespace(); + + // Range vector: [5m] + if (lexer.match("[")) { + final long rangeMs = lexer.readDuration(); + lexer.expect("]"); + VectorSelector vs; + if (expr instanceof VectorSelector v) + vs = v; + else + throw new IllegalArgumentException("Range selector requires a vector selector"); + + // Check for offset after range + long offsetMs = 0; + lexer.skipWhitespace(); + if (lexer.matchKeyword("offset")) + offsetMs = lexer.readDuration(); + + if (offsetMs != 0 || vs.offsetMs() != 0) + vs = new VectorSelector(vs.metricName(), vs.matchers(), offsetMs != 0 ? offsetMs : vs.offsetMs()); + + return new MatrixSelector(vs, rangeMs); + } + + // Offset modifier + if (lexer.matchKeyword("offset")) { + final long offsetMs = lexer.readDuration(); + if (expr instanceof VectorSelector vs) + return new VectorSelector(vs.metricName(), vs.matchers(), offsetMs); + throw new IllegalArgumentException("Offset modifier requires a vector selector"); + } + + return expr; + } + + // --- Duration parsing --- + + public static long parseDuration(final String s) { + long totalMs = 0; + long current = 0; + for (int i = 0; i < s.length(); i++) { + final char c = s.charAt(i); + if (c >= '0' && c <= '9') { + current = current * 10 + (c - '0'); + } else { + // Check for 'ms' (milliseconds) before consuming 'm' (minutes) + if (c == 'm' && i + 1 < s.length() && s.charAt(i + 1) == 's') { + totalMs += current; // already in milliseconds + current = 0; + i++; // skip the 's' + continue; + } + final long unitMs = switch (c) { + case 's' -> 1_000L; + case 'm' -> 60_000L; + case 'h' -> 3_600_000L; + case 'd' -> 86_400_000L; + case 'w' -> 604_800_000L; + case 'y' -> 31_536_000_000L; + default -> throw new IllegalArgumentException("Unknown duration unit: " + c); + }; + if (current > Long.MAX_VALUE / unitMs) + throw new IllegalArgumentException("Duration value too large: " + s); + totalMs += current * unitMs; + current = 0; + } + } + if (current != 0) + throw new IllegalArgumentException("Duration must end with a unit (ms/s/m/h/d/w/y): " + s); + return totalMs; + } + + // --- Inner Lexer --- + + static class Lexer { + final String input; + int pos; + + Lexer(final String input) { + this.input = input; + this.pos = 0; + } + + boolean hasMore() { + skipWhitespace(); + return pos < input.length(); + } + + char peekChar() { + skipWhitespace(); + return pos < input.length() ? input.charAt(pos) : 0; + } + + boolean peekChar(final int offset, final char c) { + final int idx = pos + offset; + return idx < input.length() && input.charAt(idx) == c; + } + + String peek() { + skipWhitespace(); + if (pos >= input.length()) + return ""; + return input.substring(pos, Math.min(pos + 10, input.length())); + } + + void skipWhitespace() { + while (pos < input.length() && Character.isWhitespace(input.charAt(pos))) + pos++; + } + + boolean match(final String s) { + skipWhitespace(); + if (input.startsWith(s, pos)) { + // For multi-char operators, make sure we don't match a prefix + if (s.length() == 1 && isOperatorChar(s.charAt(0))) { + // Check that we're not part of a longer operator + final int next = pos + 1; + if (s.equals("=") && next < input.length() && (input.charAt(next) == '=' || input.charAt(next) == '~')) + return false; + if (s.equals("!") && next < input.length() && (input.charAt(next) == '=' || input.charAt(next) == '~')) + return false; + if (s.equals("<") && next < input.length() && input.charAt(next) == '=') + return false; + if (s.equals(">") && next < input.length() && input.charAt(next) == '=') + return false; + } + pos += s.length(); + return true; + } + return false; + } + + /** + * Match a minus sign as a binary operator (not part of a number). + */ + boolean matchMinus() { + skipWhitespace(); + if (pos < input.length() && input.charAt(pos) == '-') { + pos++; + return true; + } + return false; + } + + boolean matchKeyword(final String keyword) { + skipWhitespace(); + if (pos + keyword.length() > input.length()) + return false; + if (!input.substring(pos, pos + keyword.length()).equalsIgnoreCase(keyword)) + return false; + // Make sure it's not part of a longer identifier + final int end = pos + keyword.length(); + if (end < input.length() && isIdentChar(input.charAt(end))) + return false; + pos = end; + return true; + } + + void expect(final String s) { + skipWhitespace(); + if (!input.startsWith(s, pos)) + throw new IllegalArgumentException("Expected '" + s + "' at position " + pos + ", found: '" + + input.substring(pos, Math.min(pos + 10, input.length())) + "'"); + pos += s.length(); + } + + String readIdent() { + skipWhitespace(); + final int start = pos; + while (pos < input.length() && isIdentChar(input.charAt(pos))) + pos++; + if (pos == start) + throw new IllegalArgumentException("Expected identifier at position " + pos); + return input.substring(start, pos); + } + + double readNumber() { + skipWhitespace(); + final int start = pos; + // Handle NaN and Inf + if (input.startsWith("NaN", pos)) { + pos += 3; + return Double.NaN; + } + if (input.startsWith("Inf", pos) || input.startsWith("+Inf", pos)) { + pos += input.charAt(pos) == '+' ? 4 : 3; + return Double.POSITIVE_INFINITY; + } + if (input.startsWith("-Inf", pos)) { + pos += 4; + return Double.NEGATIVE_INFINITY; + } + + while (pos < input.length() && (Character.isDigit(input.charAt(pos)) || input.charAt(pos) == '.' + || input.charAt(pos) == 'e' || input.charAt(pos) == 'E' + || ((input.charAt(pos) == '+' || input.charAt(pos) == '-') && pos > start + && (input.charAt(pos - 1) == 'e' || input.charAt(pos - 1) == 'E')))) + pos++; + + if (pos == start) + throw new IllegalArgumentException("Expected number at position " + pos); + return Double.parseDouble(input.substring(start, pos)); + } + + String readString() { + skipWhitespace(); + final char quote = input.charAt(pos); + if (quote != '"' && quote != '\'' && quote != '`') + throw new IllegalArgumentException("Expected string at position " + pos); + pos++; + final StringBuilder sb = new StringBuilder(); + while (pos < input.length() && input.charAt(pos) != quote) { + if (input.charAt(pos) == '\\' && pos + 1 < input.length()) { + pos++; + sb.append(switch (input.charAt(pos)) { + case 'n' -> '\n'; + case 't' -> '\t'; + case '\\' -> '\\'; + default -> input.charAt(pos); + }); + } else { + sb.append(input.charAt(pos)); + } + pos++; + } + if (pos >= input.length()) + throw new IllegalArgumentException("Unterminated string"); + pos++; // closing quote + return sb.toString(); + } + + MatchOp readMatchOp() { + skipWhitespace(); + if (match("=~")) + return MatchOp.RE; + if (match("!~")) + return MatchOp.NRE; + if (match("!=")) + return MatchOp.NEQ; + if (match("=")) + return MatchOp.EQ; + throw new IllegalArgumentException("Expected match operator at position " + pos); + } + + long readDuration() { + skipWhitespace(); + final int start = pos; + while (pos < input.length() && (Character.isDigit(input.charAt(pos)) || Character.isLetter(input.charAt(pos)))) + pos++; + if (pos == start) + throw new IllegalArgumentException("Expected duration at position " + pos); + return parseDuration(input.substring(start, pos)); + } + + private static boolean isIdentChar(final char c) { + return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || (c >= '0' && c <= '9') || c == '_' || c == ':'; + } + + private static boolean isOperatorChar(final char c) { + return c == '=' || c == '!' || c == '<' || c == '>'; + } + } +} diff --git a/engine/src/main/java/com/arcadedb/engine/timeseries/promql/PromQLResult.java b/engine/src/main/java/com/arcadedb/engine/timeseries/promql/PromQLResult.java new file mode 100644 index 0000000000..e648695c62 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/engine/timeseries/promql/PromQLResult.java @@ -0,0 +1,50 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries.promql; + +import java.util.List; +import java.util.Map; + +/** + * Sealed interface representing PromQL evaluation results. + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public sealed interface PromQLResult { + + record ScalarResult(double value, long timestampMs) implements PromQLResult { + } + + record VectorSample(Map labels, double value, long timestampMs) { + } + + record InstantVector(List samples) implements PromQLResult { + } + + record RangeSeries(Map labels, List values) { + } + + record RangeVector(List series) implements PromQLResult { + } + + record MatrixSeries(Map labels, List values) { + } + + record MatrixResult(List series) implements PromQLResult { + } +} diff --git a/engine/src/main/java/com/arcadedb/engine/timeseries/promql/ast/PromQLExpr.java b/engine/src/main/java/com/arcadedb/engine/timeseries/promql/ast/PromQLExpr.java new file mode 100644 index 0000000000..53e1c2b2b7 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/engine/timeseries/promql/ast/PromQLExpr.java @@ -0,0 +1,80 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries.promql.ast; + +import java.util.List; + +/** + * Sealed interface representing all PromQL AST node types. + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public sealed interface PromQLExpr { + + record NumberLiteral(double value) implements PromQLExpr { + } + + record StringLiteral(String value) implements PromQLExpr { + } + + record VectorSelector(String metricName, List matchers, long offsetMs) implements PromQLExpr { + } + + record MatrixSelector(VectorSelector selector, long rangeMs) implements PromQLExpr { + } + + record AggregationExpr(AggOp op, PromQLExpr expr, List groupLabels, boolean without, + PromQLExpr param) implements PromQLExpr { + } + + record FunctionCallExpr(String name, List args) implements PromQLExpr { + } + + record BinaryExpr(PromQLExpr left, BinaryOp op, PromQLExpr right) implements PromQLExpr { + } + + record UnaryExpr(char op, PromQLExpr expr) implements PromQLExpr { + } + + enum AggOp { + SUM, AVG, MIN, MAX, COUNT, TOPK, BOTTOMK + } + + enum BinaryOp { + ADD("+"), SUB("-"), MUL("*"), DIV("/"), MOD("%"), POW("^"), + EQ("=="), NEQ("!="), LT("<"), GT(">"), LTE("<="), GTE(">="), + AND("and"), OR("or"), UNLESS("unless"); + + private final String symbol; + + BinaryOp(final String symbol) { + this.symbol = symbol; + } + + public String symbol() { + return symbol; + } + } + + enum MatchOp { + EQ, NEQ, RE, NRE + } + + record LabelMatcher(String name, MatchOp op, String value) { + } +} diff --git a/engine/src/main/java/com/arcadedb/engine/timeseries/simd/ScalarTimeSeriesVectorOps.java b/engine/src/main/java/com/arcadedb/engine/timeseries/simd/ScalarTimeSeriesVectorOps.java new file mode 100644 index 0000000000..be308c0ccb --- /dev/null +++ b/engine/src/main/java/com/arcadedb/engine/timeseries/simd/ScalarTimeSeriesVectorOps.java @@ -0,0 +1,127 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries.simd; + +/** + * Pure Java scalar implementation of vector operations. Always available as fallback. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public final class ScalarTimeSeriesVectorOps implements TimeSeriesVectorOps { + + @Override + public double sum(final double[] data, final int offset, final int length) { + double s = 0; + for (int i = offset; i < offset + length; i++) + s += data[i]; + return s; + } + + @Override + public double min(final double[] data, final int offset, final int length) { + double m = Double.POSITIVE_INFINITY; + for (int i = offset; i < offset + length; i++) + if (data[i] < m) + m = data[i]; + return m; + } + + @Override + public double max(final double[] data, final int offset, final int length) { + double m = Double.NEGATIVE_INFINITY; + for (int i = offset; i < offset + length; i++) + if (data[i] > m) + m = data[i]; + return m; + } + + @Override + public long sumLong(final long[] data, final int offset, final int length) { + long s = 0; + for (int i = offset; i < offset + length; i++) + s += data[i]; + return s; + } + + @Override + public long minLong(final long[] data, final int offset, final int length) { + long m = Long.MAX_VALUE; + for (int i = offset; i < offset + length; i++) + if (data[i] < m) + m = data[i]; + return m; + } + + @Override + public long maxLong(final long[] data, final int offset, final int length) { + long m = Long.MIN_VALUE; + for (int i = offset; i < offset + length; i++) + if (data[i] > m) + m = data[i]; + return m; + } + + @Override + public double sumFiltered(final double[] data, final long[] bitmask, final int offset, final int length) { + double s = 0; + for (int i = 0; i < length; i++) { + final int maskWord = (offset + i) >> 6; + final int maskBit = (offset + i) & 63; + if ((bitmask[maskWord] & (1L << maskBit)) != 0) + s += data[offset + i]; + } + return s; + } + + @Override + public int countFiltered(final long[] bitmask, final int offset, final int length) { + int count = 0; + for (int i = 0; i < length; i++) { + final int maskWord = (offset + i) >> 6; + final int maskBit = (offset + i) & 63; + if ((bitmask[maskWord] & (1L << maskBit)) != 0) + count++; + } + return count; + } + + @Override + public void greaterThan(final double[] data, final double threshold, final long[] out, final int offset, final int length) { + for (int i = 0; i < length; i++) { + final int maskWord = (offset + i) >> 6; + final int maskBit = (offset + i) & 63; + if (data[offset + i] > threshold) + out[maskWord] |= (1L << maskBit); + else + out[maskWord] &= ~(1L << maskBit); + } + } + + @Override + public void bitmaskAnd(final long[] a, final long[] b, final long[] out, final int length) { + for (int i = 0; i < length; i++) + out[i] = a[i] & b[i]; + } + + @Override + public void bitmaskOr(final long[] a, final long[] b, final long[] out, final int length) { + for (int i = 0; i < length; i++) + out[i] = a[i] | b[i]; + } +} diff --git a/engine/src/main/java/com/arcadedb/engine/timeseries/simd/SimdTimeSeriesVectorOps.java b/engine/src/main/java/com/arcadedb/engine/timeseries/simd/SimdTimeSeriesVectorOps.java new file mode 100644 index 0000000000..307b18fec4 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/engine/timeseries/simd/SimdTimeSeriesVectorOps.java @@ -0,0 +1,181 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries.simd; + +import jdk.incubator.vector.DoubleVector; +import jdk.incubator.vector.LongVector; +import jdk.incubator.vector.VectorMask; +import jdk.incubator.vector.VectorOperators; +import jdk.incubator.vector.VectorSpecies; + +/** + * SIMD-accelerated implementation using the Java Vector API (jdk.incubator.vector). + * Uses SPECIES_PREFERRED for automatic lane width selection. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public final class SimdTimeSeriesVectorOps implements TimeSeriesVectorOps { + + private static final VectorSpecies DOUBLE_SPECIES = DoubleVector.SPECIES_PREFERRED; + private static final VectorSpecies LONG_SPECIES = LongVector.SPECIES_PREFERRED; + + @Override + public double sum(final double[] data, final int offset, final int length) { + final int lanes = DOUBLE_SPECIES.length(); + double s = 0; + int i = 0; + for (; i + lanes <= length; i += lanes) { + final DoubleVector v = DoubleVector.fromArray(DOUBLE_SPECIES, data, offset + i); + s += v.reduceLanes(VectorOperators.ADD); + } + for (; i < length; i++) + s += data[offset + i]; + return s; + } + + @Override + public double min(final double[] data, final int offset, final int length) { + final int lanes = DOUBLE_SPECIES.length(); + double m = Double.POSITIVE_INFINITY; + int i = 0; + for (; i + lanes <= length; i += lanes) { + final DoubleVector v = DoubleVector.fromArray(DOUBLE_SPECIES, data, offset + i); + final double laneMin = v.reduceLanes(VectorOperators.MIN); + if (laneMin < m) + m = laneMin; + } + for (; i < length; i++) + if (data[offset + i] < m) + m = data[offset + i]; + return m; + } + + @Override + public double max(final double[] data, final int offset, final int length) { + final int lanes = DOUBLE_SPECIES.length(); + double m = Double.NEGATIVE_INFINITY; + int i = 0; + for (; i + lanes <= length; i += lanes) { + final DoubleVector v = DoubleVector.fromArray(DOUBLE_SPECIES, data, offset + i); + final double laneMax = v.reduceLanes(VectorOperators.MAX); + if (laneMax > m) + m = laneMax; + } + for (; i < length; i++) + if (data[offset + i] > m) + m = data[offset + i]; + return m; + } + + @Override + public long sumLong(final long[] data, final int offset, final int length) { + final int lanes = LONG_SPECIES.length(); + long s = 0; + int i = 0; + for (; i + lanes <= length; i += lanes) { + final LongVector v = LongVector.fromArray(LONG_SPECIES, data, offset + i); + s += v.reduceLanes(VectorOperators.ADD); + } + for (; i < length; i++) + s += data[offset + i]; + return s; + } + + @Override + public long minLong(final long[] data, final int offset, final int length) { + final int lanes = LONG_SPECIES.length(); + long m = Long.MAX_VALUE; + int i = 0; + for (; i + lanes <= length; i += lanes) { + final LongVector v = LongVector.fromArray(LONG_SPECIES, data, offset + i); + final long laneMin = v.reduceLanes(VectorOperators.MIN); + if (laneMin < m) + m = laneMin; + } + for (; i < length; i++) + if (data[offset + i] < m) + m = data[offset + i]; + return m; + } + + @Override + public long maxLong(final long[] data, final int offset, final int length) { + final int lanes = LONG_SPECIES.length(); + long m = Long.MIN_VALUE; + int i = 0; + for (; i + lanes <= length; i += lanes) { + final LongVector v = LongVector.fromArray(LONG_SPECIES, data, offset + i); + final long laneMax = v.reduceLanes(VectorOperators.MAX); + if (laneMax > m) + m = laneMax; + } + for (; i < length; i++) + if (data[offset + i] > m) + m = data[offset + i]; + return m; + } + + // The scalar fallback instance used for bitmask-based operations that do not benefit from SIMD + // (bitmask layout with word/bit addressing doesn't map cleanly to SIMD masks) + private static final ScalarTimeSeriesVectorOps SCALAR = new ScalarTimeSeriesVectorOps(); + + @Override + public double sumFiltered(final double[] data, final long[] bitmask, final int offset, final int length) { + // Scalar fallback — bitmask operations are not SIMD-accelerated + return SCALAR.sumFiltered(data, bitmask, offset, length); + } + + @Override + public int countFiltered(final long[] bitmask, final int offset, final int length) { + // Scalar fallback — bitmask operations are not SIMD-accelerated + return SCALAR.countFiltered(bitmask, offset, length); + } + + @Override + public void greaterThan(final double[] data, final double threshold, final long[] out, final int offset, final int length) { + // Scalar fallback — bitmask output doesn't map cleanly to SIMD result masks + SCALAR.greaterThan(data, threshold, out, offset, length); + } + + @Override + public void bitmaskAnd(final long[] a, final long[] b, final long[] out, final int length) { + final int lanes = LONG_SPECIES.length(); + int i = 0; + for (; i + lanes <= length; i += lanes) { + final LongVector va = LongVector.fromArray(LONG_SPECIES, a, i); + final LongVector vb = LongVector.fromArray(LONG_SPECIES, b, i); + va.and(vb).intoArray(out, i); + } + for (; i < length; i++) + out[i] = a[i] & b[i]; + } + + @Override + public void bitmaskOr(final long[] a, final long[] b, final long[] out, final int length) { + final int lanes = LONG_SPECIES.length(); + int i = 0; + for (; i + lanes <= length; i += lanes) { + final LongVector va = LongVector.fromArray(LONG_SPECIES, a, i); + final LongVector vb = LongVector.fromArray(LONG_SPECIES, b, i); + va.or(vb).intoArray(out, i); + } + for (; i < length; i++) + out[i] = a[i] | b[i]; + } +} diff --git a/engine/src/main/java/com/arcadedb/engine/timeseries/simd/TimeSeriesVectorOps.java b/engine/src/main/java/com/arcadedb/engine/timeseries/simd/TimeSeriesVectorOps.java new file mode 100644 index 0000000000..38b463c798 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/engine/timeseries/simd/TimeSeriesVectorOps.java @@ -0,0 +1,66 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries.simd; + +/** + * Interface for vectorized aggregation operations on primitive arrays. + * Two implementations: scalar (pure Java loops) and SIMD (Java Vector API). + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public interface TimeSeriesVectorOps { + + double sum(double[] data, int offset, int length); + + double min(double[] data, int offset, int length); + + double max(double[] data, int offset, int length); + + long sumLong(long[] data, int offset, int length); + + long minLong(long[] data, int offset, int length); + + long maxLong(long[] data, int offset, int length); + + /** + * Sums only elements where the corresponding bitmask bit is set. + * Bitmask is a long[] where each long covers 64 elements. + */ + double sumFiltered(double[] data, long[] bitmask, int offset, int length); + + /** + * Counts elements where the corresponding bitmask bit is set. + */ + int countFiltered(long[] bitmask, int offset, int length); + + /** + * Produces a bitmask where data[i] > threshold. + */ + void greaterThan(double[] data, double threshold, long[] out, int offset, int length); + + /** + * Bitwise AND of two bitmasks. + */ + void bitmaskAnd(long[] a, long[] b, long[] out, int length); + + /** + * Bitwise OR of two bitmasks. + */ + void bitmaskOr(long[] a, long[] b, long[] out, int length); +} diff --git a/engine/src/main/java/com/arcadedb/engine/timeseries/simd/TimeSeriesVectorOpsProvider.java b/engine/src/main/java/com/arcadedb/engine/timeseries/simd/TimeSeriesVectorOpsProvider.java new file mode 100644 index 0000000000..a7d61acafd --- /dev/null +++ b/engine/src/main/java/com/arcadedb/engine/timeseries/simd/TimeSeriesVectorOpsProvider.java @@ -0,0 +1,59 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries.simd; + +import com.arcadedb.log.LogManager; + +import java.util.logging.Level; + +/** + * Singleton provider for {@link TimeSeriesVectorOps}. + * Tries to load the SIMD implementation at class init time; falls back to scalar. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public final class TimeSeriesVectorOpsProvider { + + private static final TimeSeriesVectorOps INSTANCE; + + static { + TimeSeriesVectorOps ops; + try { + ops = new SimdTimeSeriesVectorOps(); + // Quick smoke test — verify the implementation returns correct results + final double smokeResult = ops.sum(new double[] { 1.0, 2.0 }, 0, 2); + if (smokeResult != 3.0) + throw new IllegalStateException("SIMD smoke test failed: expected 3.0 but got " + smokeResult); + LogManager.instance().log(TimeSeriesVectorOpsProvider.class, Level.INFO, "TimeSeries SIMD vector ops enabled"); + } catch (final Exception | LinkageError t) { + ops = new ScalarTimeSeriesVectorOps(); + LogManager.instance() + .log(TimeSeriesVectorOpsProvider.class, Level.INFO, "TimeSeries SIMD not available, using scalar fallback: %s", + t.getMessage()); + } + INSTANCE = ops; + } + + private TimeSeriesVectorOpsProvider() { + } + + public static TimeSeriesVectorOps getInstance() { + return INSTANCE; + } +} diff --git a/engine/src/main/java/com/arcadedb/function/sql/DefaultSQLFunctionFactory.java b/engine/src/main/java/com/arcadedb/function/sql/DefaultSQLFunctionFactory.java index ee3a62828e..e3f3ceca3b 100644 --- a/engine/src/main/java/com/arcadedb/function/sql/DefaultSQLFunctionFactory.java +++ b/engine/src/main/java/com/arcadedb/function/sql/DefaultSQLFunctionFactory.java @@ -86,6 +86,20 @@ import com.arcadedb.function.sql.time.SQLFunctionDate; import com.arcadedb.function.sql.time.SQLFunctionDuration; import com.arcadedb.function.sql.time.SQLFunctionSysdate; +import com.arcadedb.function.sql.time.SQLFunctionTimeBucket; +import com.arcadedb.function.sql.time.SQLFunctionCorrelate; +import com.arcadedb.function.sql.time.SQLFunctionDelta; +import com.arcadedb.function.sql.time.SQLFunctionInterpolate; +import com.arcadedb.function.sql.time.SQLFunctionMovingAvg; +import com.arcadedb.function.sql.time.SQLFunctionRate; +import com.arcadedb.function.sql.time.SQLFunctionTsPercentile; +import com.arcadedb.function.sql.time.SQLFunctionLag; +import com.arcadedb.function.sql.time.SQLFunctionLead; +import com.arcadedb.function.sql.time.SQLFunctionRank; +import com.arcadedb.function.sql.time.SQLFunctionRowNumber; +import com.arcadedb.function.sql.time.SQLFunctionTsFirst; +import com.arcadedb.function.sql.time.SQLFunctionTsLast; +import com.arcadedb.function.sql.time.SQLFunctionPromQL; import com.arcadedb.function.sql.vector.SQLFunctionDenseVectorToSparse; import com.arcadedb.function.sql.vector.SQLFunctionMultiVectorScore; import com.arcadedb.function.sql.vector.SQLFunctionSparseVectorCreate; @@ -220,6 +234,22 @@ private DefaultSQLFunctionFactory() { register(SQLFunctionDate.NAME, new SQLFunctionDate()); register(SQLFunctionDuration.NAME, new SQLFunctionDuration()); register(SQLFunctionSysdate.NAME, SQLFunctionSysdate.class); + // TimeSeries (ts.* namespace) + register(SQLFunctionTimeBucket.NAME, new SQLFunctionTimeBucket()); + register(SQLFunctionCorrelate.NAME, SQLFunctionCorrelate.class); + register(SQLFunctionDelta.NAME, SQLFunctionDelta.class); + register(SQLFunctionTsFirst.NAME, SQLFunctionTsFirst.class); + register(SQLFunctionTsLast.NAME, SQLFunctionTsLast.class); + register(SQLFunctionInterpolate.NAME, SQLFunctionInterpolate.class); + register(SQLFunctionMovingAvg.NAME, SQLFunctionMovingAvg.class); + register(SQLFunctionRate.NAME, SQLFunctionRate.class); + register(SQLFunctionTsPercentile.NAME, SQLFunctionTsPercentile.class); + register(SQLFunctionPromQL.NAME, new SQLFunctionPromQL()); + // Window functions + register(SQLFunctionLag.NAME, SQLFunctionLag.class); + register(SQLFunctionLead.NAME, SQLFunctionLead.class); + register(SQLFunctionRowNumber.NAME, SQLFunctionRowNumber.class); + register(SQLFunctionRank.NAME, SQLFunctionRank.class); // Vectors // Basic Operations diff --git a/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionCorrelate.java b/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionCorrelate.java new file mode 100644 index 0000000000..3b05c24751 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionCorrelate.java @@ -0,0 +1,87 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.function.sql.time; + +import com.arcadedb.database.Identifiable; +import com.arcadedb.function.sql.SQLAggregatedFunction; +import com.arcadedb.query.sql.executor.CommandContext; + +/** + * Computes the Pearson correlation coefficient between two series. + * Syntax: correlate(value_a, value_b) + * Returns a value between -1.0 and 1.0, or null if fewer than 2 samples or zero variance. + * Uses Welford's online algorithm for numerical stability. + */ +public class SQLFunctionCorrelate extends SQLAggregatedFunction { + public static final String NAME = "ts.correlate"; + + private long n; + private double meanA; + private double meanB; + private double m2A; + private double m2B; + private double covAB; + + public SQLFunctionCorrelate() { + super(NAME); + } + + @Override + public Object execute(final Object self, final Identifiable currentRecord, final Object currentResult, final Object[] params, + final CommandContext context) { + if (params[0] == null || params[1] == null) + return null; + + final double a = ((Number) params[0]).doubleValue(); + final double b = ((Number) params[1]).doubleValue(); + + n++; + final double dA = a - meanA; + final double dB = b - meanB; + meanA += dA / n; + meanB += dB / n; + final double dA2 = a - meanA; + final double dB2 = b - meanB; + m2A += dA * dA2; + m2B += dB * dB2; + covAB += dA * dB2; + + return null; + } + + @Override + public boolean aggregateResults() { + return true; + } + + @Override + public Object getResult() { + if (n < 2) + return null; + final double denom = Math.sqrt(m2A * m2B); + if (denom == 0.0) + return null; + return covAB / denom; + } + + @Override + public String getSyntax() { + return NAME + "(, )"; + } +} diff --git a/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionDelta.java b/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionDelta.java new file mode 100644 index 0000000000..b8d73a85c2 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionDelta.java @@ -0,0 +1,81 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.function.sql.time; + +import com.arcadedb.database.Identifiable; +import com.arcadedb.function.sql.SQLAggregatedFunction; +import com.arcadedb.query.sql.executor.CommandContext; + +/** + * Computes the difference between the last and first value ordered by timestamp. + * Syntax: delta(value, timestamp) + */ +public class SQLFunctionDelta extends SQLAggregatedFunction { + public static final String NAME = "ts.delta"; + + private double firstValue; + private long firstTimestamp = Long.MAX_VALUE; + private double lastValue; + private long lastTimestamp = Long.MIN_VALUE; + private int count; + + public SQLFunctionDelta() { + super(NAME); + } + + @Override + public Object execute(final Object self, final Identifiable currentRecord, final Object currentResult, final Object[] params, + final CommandContext context) { + if (params[0] == null || params[1] == null) + return null; + + final double value = ((Number) params[0]).doubleValue(); + final long ts = SQLFunctionRate.toEpochMillis(params[1]); + count++; + + if (ts < firstTimestamp) { + firstTimestamp = ts; + firstValue = value; + } + if (ts > lastTimestamp) { + lastTimestamp = ts; + lastValue = value; + } + return null; + } + + @Override + public boolean aggregateResults() { + return true; + } + + @Override + public Object getResult() { + if (count == 0) + return null; + if (count == 1) + return 0.0; + return lastValue - firstValue; + } + + @Override + public String getSyntax() { + return NAME + "(, )"; + } +} diff --git a/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionInterpolate.java b/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionInterpolate.java new file mode 100644 index 0000000000..efd6b2b6a3 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionInterpolate.java @@ -0,0 +1,163 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.function.sql.time; + +import com.arcadedb.database.Identifiable; +import com.arcadedb.function.sql.SQLAggregatedFunction; +import com.arcadedb.query.sql.executor.CommandContext; + +import java.util.ArrayList; +import java.util.List; + +/** + * Fills null values in a series using the specified method. + * Syntax: interpolate(value, method [, timestamp]) + * Methods: 'prev' (carry forward), 'zero' (replace with 0), 'none' (leave nulls), + * 'linear' (linear interpolation between surrounding non-null values, requires timestamp parameter) + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public class SQLFunctionInterpolate extends SQLAggregatedFunction { + public static final String NAME = "ts.interpolate"; + + private final List values = new ArrayList<>(); + private final List timestamps = new ArrayList<>(); + private String method; + + public SQLFunctionInterpolate() { + super(NAME); + } + + @Override + public Object execute(final Object self, final Identifiable currentRecord, final Object currentResult, final Object[] params, + final CommandContext context) { + if (method == null && params.length > 1 && params[1] != null) + method = params[1].toString(); + + values.add(params[0]); + + // Capture timestamp if provided (needed for linear interpolation) + if (params.length > 2 && params[2] != null) + timestamps.add(SQLFunctionRate.toEpochMillis(params[2])); + else + timestamps.add((long) timestamps.size()); // Use index as fallback + + return null; + } + + @Override + public boolean aggregateResults() { + return true; + } + + @Override + public Object getResult() { + if (values.isEmpty()) + return new ArrayList<>(); + + final String m = method != null ? method : "none"; + final List result = new ArrayList<>(values.size()); + + switch (m) { + case "zero": + for (final Object v : values) + result.add(v != null ? v : 0.0); + break; + + case "prev": + Object lastNonNull = null; + for (final Object v : values) { + if (v != null) + lastNonNull = v; + result.add(lastNonNull); + } + break; + + case "linear": + applyLinearInterpolation(result); + break; + + default: // "none" + result.addAll(values); + break; + } + return result; + } + + private void applyLinearInterpolation(final List result) { + final int size = values.size(); + + // First pass: copy all values + for (final Object v : values) + result.add(v); + + // Second pass: interpolate nulls between non-null values + int i = 0; + while (i < size) { + if (result.get(i) != null) { + i++; + continue; + } + + // Find the previous non-null value + final int prevIdx = i - 1; + if (prevIdx < 0 || result.get(prevIdx) == null) { + // No previous value - leave null + i++; + continue; + } + + // Find the next non-null value + int nextIdx = i + 1; + while (nextIdx < size && values.get(nextIdx) == null) + nextIdx++; + + if (nextIdx >= size) { + // No next value - leave nulls + break; + } + + // Interpolate all nulls between prevIdx and nextIdx + final double prevVal = ((Number) result.get(prevIdx)).doubleValue(); + final double nextVal = ((Number) values.get(nextIdx)).doubleValue(); + final long prevTs = timestamps.get(prevIdx); + final long nextTs = timestamps.get(nextIdx); + final long tsDelta = nextTs - prevTs; + + if (tsDelta == 0) { + // Same timestamp - just use prev value + for (int j = i; j < nextIdx; j++) + result.set(j, prevVal); + } else { + for (int j = i; j < nextIdx; j++) { + final long currentTs = timestamps.get(j); + final double fraction = (double) (currentTs - prevTs) / tsDelta; + result.set(j, prevVal + fraction * (nextVal - prevVal)); + } + } + + i = nextIdx + 1; + } + } + + @Override + public String getSyntax() { + return NAME + "(, [, ])"; + } +} diff --git a/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionLag.java b/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionLag.java new file mode 100644 index 0000000000..afdc58c10e --- /dev/null +++ b/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionLag.java @@ -0,0 +1,91 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.function.sql.time; + +import com.arcadedb.database.Identifiable; +import com.arcadedb.function.sql.SQLAggregatedFunction; +import com.arcadedb.query.sql.executor.CommandContext; + +import java.util.ArrayList; +import java.util.Comparator; +import java.util.List; + +/** + * Returns the value from a previous row, ordered by timestamp. + * Syntax: ts.lag(value, offset, timestamp [, default]) + *
    + *
  • value — the field to retrieve from the lagging row
  • + *
  • offset — how many rows back (default 1)
  • + *
  • timestamp — the ordering field
  • + *
  • default — optional value returned when there is no previous row
  • + *
+ */ +public class SQLFunctionLag extends SQLAggregatedFunction { + public static final String NAME = "ts.lag"; + + private final List pairs = new ArrayList<>(); + private int offset = 1; + private Object defaultValue = null; + private boolean paramsRead = false; + + public SQLFunctionLag() { + super(NAME); + } + + @Override + public Object execute(final Object self, final Identifiable currentRecord, final Object currentResult, final Object[] params, + final CommandContext context) { + if (!paramsRead) { + if (params.length >= 2) + offset = ((Number) params[1]).intValue(); + if (params.length >= 4) + defaultValue = params[3]; + paramsRead = true; + } + + final Object value = params[0]; + final Object timestamp = params.length >= 3 ? params[2] : null; + pairs.add(new Object[] { value, timestamp }); + return null; + } + + @Override + public boolean aggregateResults() { + return true; + } + + @Override + public Object getResult() { + if (pairs.isEmpty()) + return new ArrayList<>(); + + pairs.sort(Comparator.comparing(p -> ((Comparable) p[1]))); + + final List result = new ArrayList<>(pairs.size()); + for (int i = 0; i < pairs.size(); i++) + result.add(i >= offset ? pairs.get(i - offset)[0] : defaultValue); + + return result; + } + + @Override + public String getSyntax() { + return NAME + "(, , [, ])"; + } +} diff --git a/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionLead.java b/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionLead.java new file mode 100644 index 0000000000..ecbf8cfe63 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionLead.java @@ -0,0 +1,92 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.function.sql.time; + +import com.arcadedb.database.Identifiable; +import com.arcadedb.function.sql.SQLAggregatedFunction; +import com.arcadedb.query.sql.executor.CommandContext; + +import java.util.ArrayList; +import java.util.Comparator; +import java.util.List; + +/** + * Returns the value from a subsequent row, ordered by timestamp. + * Syntax: ts.lead(value, offset, timestamp [, default]) + *
    + *
  • value — the field to retrieve from the leading row
  • + *
  • offset — how many rows forward (default 1)
  • + *
  • timestamp — the ordering field
  • + *
  • default — optional value returned when there is no subsequent row
  • + *
+ */ +public class SQLFunctionLead extends SQLAggregatedFunction { + public static final String NAME = "ts.lead"; + + private final List pairs = new ArrayList<>(); + private int offset = 1; + private Object defaultValue = null; + private boolean paramsRead = false; + + public SQLFunctionLead() { + super(NAME); + } + + @Override + public Object execute(final Object self, final Identifiable currentRecord, final Object currentResult, final Object[] params, + final CommandContext context) { + if (!paramsRead) { + if (params.length >= 2) + offset = ((Number) params[1]).intValue(); + if (params.length >= 4) + defaultValue = params[3]; + paramsRead = true; + } + + final Object value = params[0]; + final Object timestamp = params.length >= 3 ? params[2] : null; + pairs.add(new Object[] { value, timestamp }); + return null; + } + + @Override + public boolean aggregateResults() { + return true; + } + + @Override + public Object getResult() { + if (pairs.isEmpty()) + return new ArrayList<>(); + + pairs.sort(Comparator.comparing(p -> ((Comparable) p[1]))); + + final int size = pairs.size(); + final List result = new ArrayList<>(size); + for (int i = 0; i < size; i++) + result.add(i + offset < size ? pairs.get(i + offset)[0] : defaultValue); + + return result; + } + + @Override + public String getSyntax() { + return NAME + "(, , [, ])"; + } +} diff --git a/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionMovingAvg.java b/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionMovingAvg.java new file mode 100644 index 0000000000..f1ef1dced3 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionMovingAvg.java @@ -0,0 +1,83 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.function.sql.time; + +import com.arcadedb.database.Identifiable; +import com.arcadedb.function.sql.SQLAggregatedFunction; +import com.arcadedb.query.sql.executor.CommandContext; + +import java.util.ArrayList; +import java.util.List; + +/** + * Computes a sliding window moving average over accumulated values. + * Syntax: moving_avg(value, window_size) + * Returns a list of moving averages with the same length as the input. + */ +public class SQLFunctionMovingAvg extends SQLAggregatedFunction { + public static final String NAME = "ts.movingAvg"; + + private final List values = new ArrayList<>(); + private int windowSize = -1; + + public SQLFunctionMovingAvg() { + super(NAME); + } + + @Override + public Object execute(final Object self, final Identifiable currentRecord, final Object currentResult, final Object[] params, + final CommandContext context) { + if (windowSize < 0) + windowSize = ((Number) params[1]).intValue(); + + if (params[0] instanceof Number number) + values.add(number.doubleValue()); + + return null; + } + + @Override + public boolean aggregateResults() { + return true; + } + + @Override + public Object getResult() { + if (values.isEmpty()) + return new ArrayList<>(); + + final int w = Math.max(1, windowSize); + final List result = new ArrayList<>(values.size()); + double windowSum = 0; + + for (int i = 0; i < values.size(); i++) { + windowSum += values.get(i); + if (i >= w) + windowSum -= values.get(i - w); + final int count = Math.min(i + 1, w); + result.add(windowSum / count); + } + return result; + } + + @Override + public String getSyntax() { + return NAME + "(, )"; + } +} diff --git a/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionPromQL.java b/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionPromQL.java new file mode 100644 index 0000000000..e466fca83c --- /dev/null +++ b/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionPromQL.java @@ -0,0 +1,107 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.function.sql.time; + +import com.arcadedb.database.DatabaseInternal; +import com.arcadedb.database.Identifiable; +import com.arcadedb.engine.timeseries.promql.PromQLEvaluator; +import com.arcadedb.engine.timeseries.promql.PromQLParser; +import com.arcadedb.engine.timeseries.promql.PromQLResult; +import com.arcadedb.function.sql.SQLFunctionConfigurableAbstract; +import com.arcadedb.query.sql.executor.CommandContext; + +import java.util.ArrayList; +import java.util.LinkedHashMap; +import java.util.List; +import java.util.Map; + +/** + * SQL function: {@code promql(expr [, evalTimeMs])} + *

+ * Evaluates a PromQL instant-vector expression over the current database's TimeSeries data + * and returns the result as a list of maps. Each map contains the metric labels + * plus a special {@code "__value__"} key holding the numeric sample value. + *

+ * Examples: + *

+ *   SELECT promql('cpu_usage') FROM #1:0
+ *   SELECT promql('sum(cpu_usage) by (host)', 1700000000000) FROM #1:0
+ *   SELECT promql('rate(http_requests[5m])', System.currentTimeMillis()) FROM #1:0
+ * 
+ * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public class SQLFunctionPromQL extends SQLFunctionConfigurableAbstract { + + public static final String NAME = "promql"; + + public SQLFunctionPromQL() { + super(NAME); + } + + @Override + public Object execute(final Object self, final Identifiable currentRecord, final Object currentResult, + final Object[] params, final CommandContext context) { + if (params.length < 1 || params[0] == null) + throw new IllegalArgumentException("promql() requires at least 1 parameter: the PromQL expression string"); + + final String expr = params[0].toString(); + final long evalTimeMs; + if (params.length >= 2 && params[1] != null) + evalTimeMs = params[1] instanceof Number n ? n.longValue() : Long.parseLong(params[1].toString()); + else + evalTimeMs = System.currentTimeMillis(); + + final DatabaseInternal database = (DatabaseInternal) context.getDatabase(); + final PromQLEvaluator evaluator = new PromQLEvaluator(database); + final PromQLResult result = evaluator.evaluateInstant(new PromQLParser(expr).parse(), evalTimeMs); + return toList(result); + } + + private static List> toList(final PromQLResult result) { + final List> list = new ArrayList<>(); + if (result instanceof PromQLResult.InstantVector iv) { + for (final PromQLResult.VectorSample sample : iv.samples()) { + final Map entry = new LinkedHashMap<>(sample.labels()); + entry.put("__value__", sample.value()); + list.add(entry); + } + } else if (result instanceof PromQLResult.ScalarResult sr) { + final Map entry = new LinkedHashMap<>(); + entry.put("__value__", sr.value()); + list.add(entry); + } + return list; + } + + @Override + public int getMinArgs() { + return 1; + } + + @Override + public int getMaxArgs() { + return 2; + } + + @Override + public String getSyntax() { + return "promql( [,])"; + } +} diff --git a/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionRank.java b/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionRank.java new file mode 100644 index 0000000000..2e15968206 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionRank.java @@ -0,0 +1,80 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.function.sql.time; + +import com.arcadedb.database.Identifiable; +import com.arcadedb.function.sql.SQLAggregatedFunction; +import com.arcadedb.query.sql.executor.CommandContext; + +import java.util.ArrayList; +import java.util.Comparator; +import java.util.List; +import java.util.Objects; + +/** + * Returns the rank of each row based on value, ordered by timestamp. + * Same values receive the same rank; the next distinct value skips ranks. + * Syntax: ts.rank(value, timestamp) + */ +public class SQLFunctionRank extends SQLAggregatedFunction { + public static final String NAME = "ts.rank"; + + private final List pairs = new ArrayList<>(); + + public SQLFunctionRank() { + super(NAME); + } + + @Override + public Object execute(final Object self, final Identifiable currentRecord, final Object currentResult, final Object[] params, + final CommandContext context) { + pairs.add(new Object[] { params[0], params[1] }); + return null; + } + + @Override + public boolean aggregateResults() { + return true; + } + + @Override + public Object getResult() { + if (pairs.isEmpty()) + return new ArrayList<>(); + + pairs.sort(Comparator.comparing(p -> ((Comparable) p[1]))); + + final List result = new ArrayList<>(pairs.size()); + result.add(1); + + for (int i = 1; i < pairs.size(); i++) { + if (Objects.equals(pairs.get(i)[0], pairs.get(i - 1)[0])) + result.add(result.get(i - 1)); + else + result.add(i + 1); + } + + return result; + } + + @Override + public String getSyntax() { + return NAME + "(, )"; + } +} diff --git a/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionRate.java b/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionRate.java new file mode 100644 index 0000000000..115bbe0253 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionRate.java @@ -0,0 +1,122 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.function.sql.time; + +import com.arcadedb.database.Identifiable; +import com.arcadedb.function.sql.SQLAggregatedFunction; +import com.arcadedb.query.sql.executor.CommandContext; + +import java.util.ArrayList; +import java.util.Comparator; +import java.util.Date; +import java.util.List; + +/** + * Computes the per-second rate of change. + *

+ * Syntax: ts.rate(value, timestamp [, counterResetDetection]) + *

    + *
  • Default (no 3rd param or false): simple rate = (last - first) / time_delta
  • + *
  • counterResetDetection = true: detects counter resets (where value decreases) + * and treats the post-reset value as an increment from 0, matching Prometheus rate() semantics
  • + *
+ * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public class SQLFunctionRate extends SQLAggregatedFunction { + public static final String NAME = "ts.rate"; + + private final List samples = new ArrayList<>(); // [timestamp, Double.doubleToRawLongBits(value)] + private boolean counterResetDetection; + private boolean counterResetDetectionSet; + + public SQLFunctionRate() { + super(NAME); + } + + @Override + public Object execute(final Object self, final Identifiable currentRecord, final Object currentResult, final Object[] params, + final CommandContext context) { + if (params[0] == null || params[1] == null) + return null; + + if (!counterResetDetectionSet && params.length > 2 && params[2] != null) { + counterResetDetection = Boolean.TRUE.equals(params[2]) || "true".equalsIgnoreCase(params[2].toString()); + counterResetDetectionSet = true; + } + + final double value = ((Number) params[0]).doubleValue(); + final long ts = toEpochMillis(params[1]); + samples.add(new long[] { ts, Double.doubleToRawLongBits(value) }); + return null; + } + + @Override + public boolean aggregateResults() { + return true; + } + + @Override + public Object getResult() { + if (samples.size() < 2) + return null; + + // Sort by timestamp + samples.sort(Comparator.comparingLong(a -> a[0])); + + final long firstTs = samples.getFirst()[0]; + final long lastTs = samples.getLast()[0]; + if (lastTs == firstTs) + return null; + + if (counterResetDetection) { + // Compute total increase, accounting for counter resets + double totalIncrease = 0.0; + double prevValue = Double.longBitsToDouble(samples.getFirst()[1]); + + for (int i = 1; i < samples.size(); i++) { + final double currentValue = Double.longBitsToDouble(samples.get(i)[1]); + if (currentValue < prevValue) + // Counter reset detected: treat the current value as the increase from 0 + totalIncrease += currentValue; + else + totalIncrease += (currentValue - prevValue); + prevValue = currentValue; + } + + return totalIncrease / ((lastTs - firstTs) / 1000.0); + } else { + // Simple rate: (last - first) / time_delta + final double firstValue = Double.longBitsToDouble(samples.getFirst()[1]); + final double lastValue = Double.longBitsToDouble(samples.getLast()[1]); + return (lastValue - firstValue) / ((lastTs - firstTs) / 1000.0); + } + } + + @Override + public String getSyntax() { + return NAME + "(, [, ])"; + } + + static long toEpochMillis(final Object ts) { + if (ts instanceof Date date) + return date.getTime(); + return ((Number) ts).longValue(); + } +} diff --git a/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionRowNumber.java b/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionRowNumber.java new file mode 100644 index 0000000000..de0f348234 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionRowNumber.java @@ -0,0 +1,65 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.function.sql.time; + +import com.arcadedb.database.Identifiable; +import com.arcadedb.function.sql.SQLAggregatedFunction; +import com.arcadedb.query.sql.executor.CommandContext; + +import java.util.ArrayList; +import java.util.List; + +/** + * Returns a sequential 1-based row number ordered by timestamp. + * Syntax: ts.rowNumber(timestamp) + */ +public class SQLFunctionRowNumber extends SQLAggregatedFunction { + public static final String NAME = "ts.rowNumber"; + + private int count = 0; + + public SQLFunctionRowNumber() { + super(NAME); + } + + @Override + public Object execute(final Object self, final Identifiable currentRecord, final Object currentResult, final Object[] params, + final CommandContext context) { + count++; + return null; + } + + @Override + public boolean aggregateResults() { + return true; + } + + @Override + public Object getResult() { + final List result = new ArrayList<>(count); + for (int i = 1; i <= count; i++) + result.add(i); + return result; + } + + @Override + public String getSyntax() { + return NAME + "()"; + } +} diff --git a/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionTimeBucket.java b/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionTimeBucket.java new file mode 100644 index 0000000000..e7ae82a520 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionTimeBucket.java @@ -0,0 +1,117 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.function.sql.time; + +import com.arcadedb.database.Identifiable; +import com.arcadedb.function.sql.SQLFunctionConfigurableAbstract; +import com.arcadedb.query.sql.executor.CommandContext; +import java.time.Instant; +import java.time.LocalDateTime; +import java.time.ZoneOffset; +import java.util.Date; + +/** + * SQL function: time_bucket(interval_string, timestamp) + * Returns the start of the time bucket containing the given timestamp. + *

+ * Intervals: '1s', '5s', '1m', '5m', '1h', '1d', '1w' + *

+ * Example: SELECT time_bucket('1h', ts) AS hour, avg(temperature) FROM SensorData GROUP BY hour + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public class SQLFunctionTimeBucket extends SQLFunctionConfigurableAbstract { + public static final String NAME = "ts.timeBucket"; + + public SQLFunctionTimeBucket() { + super(NAME); + } + + @Override + public Object execute(final Object self, final Identifiable currentRecord, final Object currentResult, final Object[] params, + final CommandContext context) { + if (params.length < 2) + throw new IllegalArgumentException("time_bucket() requires 2 parameters: interval and timestamp"); + + final String interval = params[0].toString(); + final long intervalMs = parseInterval(interval); + + final long timestampMs = toEpochMs(params[1]); + + // Truncate to bucket boundary + final long bucketStart = (timestampMs / intervalMs) * intervalMs; + + return new Date(bucketStart); + } + + public static long parseInterval(final String interval) { + if (interval == null || interval.isEmpty()) + throw new IllegalArgumentException("Invalid time_bucket interval: empty"); + + // Parse numeric part and unit suffix + int unitStart = 0; + for (int i = 0; i < interval.length(); i++) { + if (!Character.isDigit(interval.charAt(i))) { + unitStart = i; + break; + } + } + + if (unitStart == 0) + throw new IllegalArgumentException("Invalid time_bucket interval: '" + interval + "'"); + + final long value = Long.parseLong(interval.substring(0, unitStart)); + final String unit = interval.substring(unitStart).trim().toLowerCase(); + + return switch (unit) { + case "s" -> value * 1000L; + case "m" -> value * 60_000L; + case "h" -> value * 3_600_000L; + case "d" -> value * 86_400_000L; + case "w" -> value * 7 * 86_400_000L; + default -> throw new IllegalArgumentException("Unknown time_bucket unit: '" + unit + "'. Supported: s, m, h, d, w"); + }; + } + + private static long toEpochMs(final Object value) { + if (value instanceof Long l) + return l; + if (value instanceof Date d) + return d.getTime(); + if (value instanceof Instant i) + return i.toEpochMilli(); + if (value instanceof LocalDateTime ldt) + return ldt.toInstant(ZoneOffset.UTC).toEpochMilli(); + if (value instanceof Number n) + return n.longValue(); + if (value instanceof String s) { + try { + return Instant.parse(s).toEpochMilli(); + } catch (final Exception e) { + throw new IllegalArgumentException("Cannot parse timestamp for time_bucket: '" + s + "'", e); + } + } + throw new IllegalArgumentException("Unsupported timestamp type for time_bucket: " + value.getClass().getName()); + } + + @Override + public String getSyntax() { + return "time_bucket(, )"; + } +} diff --git a/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionTsFirst.java b/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionTsFirst.java new file mode 100644 index 0000000000..54eaf3be36 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionTsFirst.java @@ -0,0 +1,67 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.function.sql.time; + +import com.arcadedb.database.Identifiable; +import com.arcadedb.function.sql.SQLAggregatedFunction; +import com.arcadedb.query.sql.executor.CommandContext; + +/** + * Returns the value associated with the earliest timestamp. + * Syntax: ts_first(value, timestamp) + */ +public class SQLFunctionTsFirst extends SQLAggregatedFunction { + public static final String NAME = "ts.first"; + + private Object firstValue; + private long minTimestamp = Long.MAX_VALUE; + + public SQLFunctionTsFirst() { + super(NAME); + } + + @Override + public Object execute(final Object self, final Identifiable currentRecord, final Object currentResult, final Object[] params, + final CommandContext context) { + if (params[0] == null || params[1] == null) + return firstValue; + + final long ts = SQLFunctionRate.toEpochMillis(params[1]); + if (ts < minTimestamp) { + minTimestamp = ts; + firstValue = params[0]; + } + return firstValue; + } + + @Override + public boolean aggregateResults() { + return true; + } + + @Override + public Object getResult() { + return firstValue; + } + + @Override + public String getSyntax() { + return NAME + "(, )"; + } +} diff --git a/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionTsLast.java b/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionTsLast.java new file mode 100644 index 0000000000..44f16a1d72 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionTsLast.java @@ -0,0 +1,67 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.function.sql.time; + +import com.arcadedb.database.Identifiable; +import com.arcadedb.function.sql.SQLAggregatedFunction; +import com.arcadedb.query.sql.executor.CommandContext; + +/** + * Returns the value associated with the latest timestamp. + * Syntax: ts_last(value, timestamp) + */ +public class SQLFunctionTsLast extends SQLAggregatedFunction { + public static final String NAME = "ts.last"; + + private Object lastValue; + private long maxTimestamp = Long.MIN_VALUE; + + public SQLFunctionTsLast() { + super(NAME); + } + + @Override + public Object execute(final Object self, final Identifiable currentRecord, final Object currentResult, final Object[] params, + final CommandContext context) { + if (params[0] == null || params[1] == null) + return lastValue; + + final long ts = SQLFunctionRate.toEpochMillis(params[1]); + if (ts > maxTimestamp) { + maxTimestamp = ts; + lastValue = params[0]; + } + return lastValue; + } + + @Override + public boolean aggregateResults() { + return true; + } + + @Override + public Object getResult() { + return lastValue; + } + + @Override + public String getSyntax() { + return NAME + "(, )"; + } +} diff --git a/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionTsPercentile.java b/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionTsPercentile.java new file mode 100644 index 0000000000..e314c23edb --- /dev/null +++ b/engine/src/main/java/com/arcadedb/function/sql/time/SQLFunctionTsPercentile.java @@ -0,0 +1,98 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.function.sql.time; + +import com.arcadedb.database.Identifiable; +import com.arcadedb.function.sql.SQLAggregatedFunction; +import com.arcadedb.query.sql.executor.CommandContext; + +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; + +/** + * Computes the approximate percentile of a numeric series. + * Uses exact sorting for the collected values and linear interpolation + * between ranks. + *

+ * Syntax: ts.percentile(value, percentile) + * where percentile is 0.0..1.0 (e.g. 0.95 for p95, 0.99 for p99) + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public class SQLFunctionTsPercentile extends SQLAggregatedFunction { + public static final String NAME = "ts.percentile"; + + private final List values = new ArrayList<>(); + private double percentile; + private boolean percentileSet; + + public SQLFunctionTsPercentile() { + super(NAME); + } + + @Override + public Object execute(final Object self, final Identifiable currentRecord, final Object currentResult, final Object[] params, + final CommandContext context) { + if (params[0] == null) + return null; + + if (!percentileSet && params.length > 1 && params[1] != null) { + percentile = ((Number) params[1]).doubleValue(); + if (percentile < 0.0 || percentile > 1.0) + throw new IllegalArgumentException("Percentile must be between 0.0 and 1.0, got: " + percentile); + percentileSet = true; + } + + values.add(((Number) params[0]).doubleValue()); + return null; + } + + @Override + public boolean aggregateResults() { + return true; + } + + @Override + public Object getResult() { + if (values.isEmpty()) + return null; + + Collections.sort(values); + + final int n = values.size(); + + // Use the "exclusive" interpolation method (same as NumPy default) + final double index = percentile * (n - 1); + final int lower = (int) Math.floor(index); + final int upper = (int) Math.ceil(index); + + if (lower == upper || upper >= n) + return values.get(lower); + + // Linear interpolation between the two closest ranks + final double fraction = index - lower; + return values.get(lower) + fraction * (values.get(upper) - values.get(lower)); + } + + @Override + public String getSyntax() { + return NAME + "(, )"; + } +} diff --git a/engine/src/main/java/com/arcadedb/query/sql/antlr/SQLASTBuilder.java b/engine/src/main/java/com/arcadedb/query/sql/antlr/SQLASTBuilder.java index 14469df2bf..c9b896a236 100644 --- a/engine/src/main/java/com/arcadedb/query/sql/antlr/SQLASTBuilder.java +++ b/engine/src/main/java/com/arcadedb/query/sql/antlr/SQLASTBuilder.java @@ -19,6 +19,7 @@ package com.arcadedb.query.sql.antlr; import com.arcadedb.database.Identifiable; +import com.arcadedb.engine.timeseries.DownsamplingTier; import com.arcadedb.exception.CommandSQLParsingException; import com.arcadedb.index.lsm.LSMTreeIndexAbstract; import com.arcadedb.query.sql.executor.CommandContext; @@ -34,6 +35,7 @@ import java.util.Collections; import java.util.List; import java.util.Map; +import java.util.Set; /** * ANTLR4 visitor that builds ArcadeDB's internal AST from the SQL parse tree. @@ -46,6 +48,13 @@ */ public class SQLASTBuilder extends SQLParserBaseVisitor { + /** + * Known function namespace prefixes. When the parser sees {@code namespace.method(args)} and the namespace + * is in this set, the AST builder produces a {@link FunctionCall} node with the qualified name + * (e.g., "ts.first") instead of an identifier chain with a method modifier. + */ + private static final Set FUNCTION_NAMESPACES = Set.of("ts"); + private int positionalParamCounter = 0; // ENTRY POINTS @@ -3078,6 +3087,36 @@ public BaseExpression visitIdentifierChain(final SQLParser.IdentifierChainContex baseExpr.identifier = baseId; } + // Check for namespaced function call pattern: namespace.method(args) + // e.g., ts.first(value, ts) → builds FunctionCall with name "ts.first" + if (ctx.identifier().size() == 1 + && ctx.methodCall() != null && ctx.methodCall().size() == 1 + && (ctx.arraySelector() == null || ctx.arraySelector().isEmpty()) + && (ctx.modifier() == null || ctx.modifier().isEmpty())) { + final String baseIdName = ctx.identifier(0).getText(); + + if (FUNCTION_NAMESPACES.contains(baseIdName)) { + final SQLParser.MethodCallContext methodCtx = ctx.methodCall(0); + final String qualifiedName = baseIdName + "." + methodCtx.identifier().getText(); + + final FunctionCall funcCall = new FunctionCall(-1); + funcCall.name = new Identifier(qualifiedName); + funcCall.params = new ArrayList<>(); + if (methodCtx.expression() != null) + for (final SQLParser.ExpressionContext exprCtx : methodCtx.expression()) + funcCall.params.add((Expression) visit(exprCtx)); + + final LevelZeroIdentifier levelZero = new LevelZeroIdentifier(-1); + levelZero.functionCall = funcCall; + + final BaseIdentifier baseId2 = new BaseIdentifier(-1); + baseId2.levelZero = levelZero; + + baseExpr.identifier = baseId2; + return baseExpr; + } + } + // Build modifier chain from additional identifiers, methodCalls, arraySelectors and modifiers Modifier firstModifier = null; Modifier currentModifier = null; @@ -5722,6 +5761,180 @@ else if (unitCtx.HOUR() != null) return stmt; } + @Override + public CreateTimeSeriesTypeStatement visitCreateTimeSeriesTypeStmt( + final SQLParser.CreateTimeSeriesTypeStmtContext ctx) { + final CreateTimeSeriesTypeStatement stmt = new CreateTimeSeriesTypeStatement(-1); + final SQLParser.CreateTimeSeriesTypeBodyContext bodyCtx = ctx.createTimeSeriesTypeBody(); + + stmt.name = (Identifier) visit(bodyCtx.identifier(0)); + stmt.ifNotExists = bodyCtx.IF() != null && bodyCtx.NOT() != null && bodyCtx.EXISTS() != null; + + // TIMESTAMP column + if (bodyCtx.TIMESTAMP() != null && bodyCtx.identifier().size() > 1) + stmt.timestampColumn = (Identifier) visit(bodyCtx.identifier(1)); + + // TAGS (name type, ...) + if (bodyCtx.TAGS() != null) { + for (final SQLParser.TsTagColumnDefContext colCtx : bodyCtx.tsTagColumnDef()) { + final Identifier colName = (Identifier) visit(colCtx.identifier(0)); + final Identifier colType = (Identifier) visit(colCtx.identifier(1)); + stmt.tags.add(new CreateTimeSeriesTypeStatement.ColumnDef(colName, colType)); + } + } + + // FIELDS (name type, ...) + if (bodyCtx.FIELDS() != null) { + for (final SQLParser.TsFieldColumnDefContext colCtx : bodyCtx.tsFieldColumnDef()) { + final Identifier colName = (Identifier) visit(colCtx.identifier(0)); + final Identifier colType = (Identifier) visit(colCtx.identifier(1)); + stmt.fields.add(new CreateTimeSeriesTypeStatement.ColumnDef(colName, colType)); + } + } + + // SHARDS count + if (bodyCtx.SHARDS() != null) { + for (int i = 0; i < bodyCtx.children.size(); i++) { + if (bodyCtx.children.get(i) instanceof org.antlr.v4.runtime.tree.TerminalNode tn + && tn.getSymbol().getType() == SQLParser.SHARDS) { + // Next INTEGER_LITERAL + for (int j = i + 1; j < bodyCtx.children.size(); j++) { + if (bodyCtx.children.get(j) instanceof org.antlr.v4.runtime.tree.TerminalNode tn2 + && tn2.getSymbol().getType() == SQLParser.INTEGER_LITERAL) { + stmt.shards = new PInteger(-1); + stmt.shards.setValue(Integer.parseInt(tn2.getText())); + break; + } + } + break; + } + } + } + + // RETENTION value with optional time unit + if (bodyCtx.RETENTION() != null) { + long retentionValue = 0; + for (int i = 0; i < bodyCtx.children.size(); i++) { + if (bodyCtx.children.get(i) instanceof org.antlr.v4.runtime.tree.TerminalNode tn + && tn.getSymbol().getType() == SQLParser.RETENTION) { + for (int j = i + 1; j < bodyCtx.children.size(); j++) { + if (bodyCtx.children.get(j) instanceof org.antlr.v4.runtime.tree.TerminalNode tn2 + && tn2.getSymbol().getType() == SQLParser.INTEGER_LITERAL) { + retentionValue = Long.parseLong(tn2.getText()); + break; + } + } + break; + } + } + + // Determine time unit by looking at tokens after RETENTION + INTEGER_LITERAL + long multiplier = 86400000L; // default: DAYS + boolean foundRetention = false; + boolean foundValue = false; + for (int i = 0; i < bodyCtx.children.size(); i++) { + if (bodyCtx.children.get(i) instanceof org.antlr.v4.runtime.tree.TerminalNode tn) { + if (tn.getSymbol().getType() == SQLParser.RETENTION) + foundRetention = true; + else if (foundRetention && tn.getSymbol().getType() == SQLParser.INTEGER_LITERAL) + foundValue = true; + else if (foundRetention && foundValue) { + if (tn.getSymbol().getType() == SQLParser.HOURS) + multiplier = 3600000L; + else if (tn.getSymbol().getType() == SQLParser.MINUTES) + multiplier = 60000L; + break; + } + } + } + + stmt.retentionMs = retentionValue * multiplier; + } + + // COMPACTION_INTERVAL value with optional time unit + if (bodyCtx.COMPACTION_INTERVAL() != null) { + long compactionValue = 0; + for (int i = 0; i < bodyCtx.children.size(); i++) { + if (bodyCtx.children.get(i) instanceof org.antlr.v4.runtime.tree.TerminalNode tn + && tn.getSymbol().getType() == SQLParser.COMPACTION_INTERVAL) { + for (int j = i + 1; j < bodyCtx.children.size(); j++) { + if (bodyCtx.children.get(j) instanceof org.antlr.v4.runtime.tree.TerminalNode tn2 + && tn2.getSymbol().getType() == SQLParser.INTEGER_LITERAL) { + compactionValue = Long.parseLong(tn2.getText()); + break; + } + } + break; + } + } + + // Determine time unit (default: HOURS for compaction interval) + long multiplier = 3600000L; // HOURS + // Check for unit keywords AFTER the COMPACTION_INTERVAL token + // We need to look at the remaining children after the integer literal + boolean foundCompaction = false; + for (int i = 0; i < bodyCtx.children.size(); i++) { + if (bodyCtx.children.get(i) instanceof org.antlr.v4.runtime.tree.TerminalNode tn + && tn.getSymbol().getType() == SQLParser.COMPACTION_INTERVAL) + foundCompaction = true; + else if (foundCompaction && bodyCtx.children.get(i) instanceof org.antlr.v4.runtime.tree.TerminalNode tn) { + if (tn.getSymbol().getType() == SQLParser.DAYS) { + multiplier = 86400000L; + break; + } else if (tn.getSymbol().getType() == SQLParser.HOURS) { + multiplier = 3600000L; + break; + } else if (tn.getSymbol().getType() == SQLParser.MINUTES) { + multiplier = 60000L; + break; + } + } + } + + stmt.compactionIntervalMs = compactionValue * multiplier; + } + + return stmt; + } + + @Override + public AlterTimeSeriesTypeStatement visitAlterTimeSeriesTypeStmt( + final SQLParser.AlterTimeSeriesTypeStmtContext ctx) { + final AlterTimeSeriesTypeStatement stmt = new AlterTimeSeriesTypeStatement(-1); + final SQLParser.AlterTimeSeriesTypeBodyContext bodyCtx = ctx.alterTimeSeriesTypeBody(); + + stmt.name = (Identifier) visit(bodyCtx.identifier()); + + if (bodyCtx.ADD() != null) { + stmt.addPolicy = true; + for (final SQLParser.DownsamplingTierClauseContext tierCtx : bodyCtx.downsamplingTierClause()) { + final long afterValue = Long.parseLong(tierCtx.INTEGER_LITERAL(0).getText()); + final long afterMs = afterValue * parseTimeUnitMs(tierCtx.tsTimeUnit(0)); + + final long granValue = Long.parseLong(tierCtx.INTEGER_LITERAL(1).getText()); + final long granMs = granValue * parseTimeUnitMs(tierCtx.tsTimeUnit(1)); + + stmt.tiers.add(new DownsamplingTier(afterMs, granMs)); + } + // Sort tiers by afterMs ascending + stmt.tiers.sort((a, b) -> Long.compare(a.afterMs(), b.afterMs())); + } else { + stmt.addPolicy = false; + } + + return stmt; + } + + private static long parseTimeUnitMs(final SQLParser.TsTimeUnitContext unitCtx) { + if (unitCtx.DAYS() != null) + return 86400000L; + if (unitCtx.HOURS() != null || unitCtx.HOUR() != null) + return 3600000L; + if (unitCtx.MINUTES() != null || unitCtx.MINUTE() != null) + return 60000L; + return 86400000L; // default to days + } + @Override public DropMaterializedViewStatement visitDropMaterializedViewStmt( final SQLParser.DropMaterializedViewStmtContext ctx) { @@ -5769,6 +5982,41 @@ else if (unitCtx.HOUR() != null) return stmt; } + // ========================================================================= + // CONTINUOUS AGGREGATE MANAGEMENT + // ========================================================================= + + @Override + public CreateContinuousAggregateStatement visitCreateContinuousAggregateStmt( + final SQLParser.CreateContinuousAggregateStmtContext ctx) { + final CreateContinuousAggregateStatement stmt = new CreateContinuousAggregateStatement(-1); + final SQLParser.CreateContinuousAggregateBodyContext bodyCtx = ctx.createContinuousAggregateBody(); + + stmt.ifNotExists = bodyCtx.IF() != null && bodyCtx.NOT() != null && bodyCtx.EXISTS() != null; + stmt.name = (Identifier) visit(bodyCtx.identifier()); + stmt.selectStatement = (SelectStatement) visit(bodyCtx.selectStatement()); + + return stmt; + } + + @Override + public DropContinuousAggregateStatement visitDropContinuousAggregateStmt( + final SQLParser.DropContinuousAggregateStmtContext ctx) { + final DropContinuousAggregateStatement stmt = new DropContinuousAggregateStatement(-1); + final SQLParser.DropContinuousAggregateBodyContext bodyCtx = ctx.dropContinuousAggregateBody(); + stmt.name = (Identifier) visit(bodyCtx.identifier()); + stmt.ifExists = bodyCtx.IF() != null && bodyCtx.EXISTS() != null; + return stmt; + } + + @Override + public RefreshContinuousAggregateStatement visitRefreshContinuousAggregateStmt( + final SQLParser.RefreshContinuousAggregateStmtContext ctx) { + final RefreshContinuousAggregateStatement stmt = new RefreshContinuousAggregateStatement(-1); + stmt.name = (Identifier) visit(ctx.refreshContinuousAggregateBody().identifier()); + return stmt; + } + /** * Visit trigger timing (BEFORE or AFTER). */ diff --git a/engine/src/main/java/com/arcadedb/query/sql/executor/AggregateFromTimeSeriesStep.java b/engine/src/main/java/com/arcadedb/query/sql/executor/AggregateFromTimeSeriesStep.java new file mode 100644 index 0000000000..b070d7ea98 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/query/sql/executor/AggregateFromTimeSeriesStep.java @@ -0,0 +1,172 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.query.sql.executor; + +import com.arcadedb.engine.timeseries.AggregationMetrics; +import com.arcadedb.engine.timeseries.MultiColumnAggregationRequest; +import com.arcadedb.engine.timeseries.MultiColumnAggregationResult; +import com.arcadedb.engine.timeseries.TagFilter; +import com.arcadedb.engine.timeseries.TimeSeriesEngine; +import com.arcadedb.exception.CommandExecutionException; +import com.arcadedb.exception.TimeoutException; +import com.arcadedb.schema.LocalTimeSeriesType; + +import java.io.IOException; +import java.util.Date; +import java.util.Iterator; +import java.util.List; +import java.util.Map; + +/** + * Push-down execution step that performs aggregation directly in the TimeSeries engine. + * Replaces the combination of FetchFromTimeSeriesStep + ProjectionCalculationStep + AggregateProjectionCalculationStep + * for eligible queries with ts.timeBucket GROUP BY and simple aggregate functions. + */ +public class AggregateFromTimeSeriesStep extends AbstractExecutionStep { + + private final LocalTimeSeriesType tsType; + private final long fromTs; + private final long toTs; + private final List requests; + private final long bucketIntervalMs; + private final String timeBucketAlias; + private final Map requestAliasToOutputAlias; + private final TagFilter tagFilter; + private Iterator resultIterator; + private boolean fetched = false; + private AggregationMetrics aggregationMetrics; + + public AggregateFromTimeSeriesStep(final LocalTimeSeriesType tsType, final long fromTs, final long toTs, + final List requests, final long bucketIntervalMs, final String timeBucketAlias, + final Map requestAliasToOutputAlias, final CommandContext context) { + this(tsType, fromTs, toTs, requests, bucketIntervalMs, timeBucketAlias, requestAliasToOutputAlias, null, context); + } + + public AggregateFromTimeSeriesStep(final LocalTimeSeriesType tsType, final long fromTs, final long toTs, + final List requests, final long bucketIntervalMs, final String timeBucketAlias, + final Map requestAliasToOutputAlias, final TagFilter tagFilter, final CommandContext context) { + super(context); + this.tsType = tsType; + this.fromTs = fromTs; + this.toTs = toTs; + this.requests = requests; + this.bucketIntervalMs = bucketIntervalMs; + this.timeBucketAlias = timeBucketAlias; + this.requestAliasToOutputAlias = requestAliasToOutputAlias; + this.tagFilter = tagFilter; + } + + @Override + public ResultSet syncPull(final CommandContext context, final int nRecords) throws TimeoutException { + final long begin = context.isProfiling() ? System.nanoTime() : 0; + try { + if (!fetched) { + try { + final TimeSeriesEngine engine = tsType.getEngine(); + if (engine == null) + throw new CommandExecutionException( + "TimeSeries engine for type '" + tsType.getName() + "' is not initialized"); + if (context.isProfiling()) + aggregationMetrics = new AggregationMetrics(); + final MultiColumnAggregationResult aggResult = engine.aggregateMulti(fromTs, toTs, requests, bucketIntervalMs, tagFilter, aggregationMetrics); + + // Lazy conversion: wrap the bucket timestamp iterator instead of materializing all rows + final Iterator bucketIterator = aggResult.getBucketTimestamps().iterator(); + resultIterator = new Iterator<>() { + @Override + public boolean hasNext() { + return bucketIterator.hasNext(); + } + + @Override + public ResultInternal next() { + final long bucketTs = bucketIterator.next(); + final ResultInternal row = new ResultInternal(context.getDatabase()); + row.setProperty(timeBucketAlias, new Date(bucketTs)); + for (int i = 0; i < requests.size(); i++) { + final MultiColumnAggregationRequest req = requests.get(i); + final String outputAlias = requestAliasToOutputAlias.getOrDefault(req.alias(), req.alias()); + row.setProperty(outputAlias, aggResult.getValue(bucketTs, i)); + } + rowCount++; + return row; + } + }; + fetched = true; + } catch (final CommandExecutionException e) { + throw e; + } catch (final IOException e) { + throw new CommandExecutionException("Error in TimeSeries push-down aggregation", e); + } + } + + return new ResultSet() { + private int count = 0; + + @Override + public boolean hasNext() { + return count < nRecords && resultIterator.hasNext(); + } + + @Override + public Result next() { + if (!hasNext()) + throw new IllegalStateException("No more results"); + count++; + return resultIterator.next(); + } + + @Override + public void close() { + // no-op + } + }; + } finally { + if (context.isProfiling()) + cost += (System.nanoTime() - begin); + } + } + + @Override + public String prettyPrint(final int depth, final int indent) { + final String spaces = ExecutionStepInternal.getIndent(depth, indent); + final StringBuilder sb = new StringBuilder(); + sb.append(spaces).append("+ AGGREGATE FROM TIMESERIES ").append(tsType.getName()); + sb.append(" [").append(fromTs).append(" - ").append(toTs).append("] bucket=").append(bucketIntervalMs).append("ms"); + sb.append("\n").append(spaces).append(" "); + for (int i = 0; i < requests.size(); i++) { + if (i > 0) + sb.append(", "); + final MultiColumnAggregationRequest req = requests.get(i); + sb.append(req.type().name().toLowerCase()).append("(col").append(req.columnIndex()).append(")"); + } + if (context.isProfiling()) { + sb.append("\n").append(spaces).append(" (").append(getCostFormatted()).append(", ").append(getRowCountFormatted()).append(")"); + if (aggregationMetrics != null) + sb.append("\n").append(spaces).append(" ").append(aggregationMetrics); + } + return sb.toString(); + } + + @Override + public ExecutionStep copy(final CommandContext context) { + return new AggregateFromTimeSeriesStep(tsType, fromTs, toTs, requests, bucketIntervalMs, timeBucketAlias, + requestAliasToOutputAlias, tagFilter, context); + } +} diff --git a/engine/src/main/java/com/arcadedb/query/sql/executor/FetchFromSchemaContinuousAggregatesStep.java b/engine/src/main/java/com/arcadedb/query/sql/executor/FetchFromSchemaContinuousAggregatesStep.java new file mode 100644 index 0000000000..7f6c13c39f --- /dev/null +++ b/engine/src/main/java/com/arcadedb/query/sql/executor/FetchFromSchemaContinuousAggregatesStep.java @@ -0,0 +1,115 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.query.sql.executor; + +import com.arcadedb.exception.TimeoutException; +import com.arcadedb.schema.ContinuousAggregate; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Comparator; +import java.util.List; +import java.util.stream.Collectors; + +public class FetchFromSchemaContinuousAggregatesStep extends AbstractExecutionStep { + + private final List result = new ArrayList<>(); + + private int cursor = 0; + + public FetchFromSchemaContinuousAggregatesStep(final CommandContext context) { + super(context); + } + + @Override + public ResultSet syncPull(final CommandContext context, final int nRecords) throws TimeoutException { + pullPrevious(context, nRecords); + + if (cursor == 0) { + final long begin = context.isProfiling() ? System.nanoTime() : 0; + try { + final ContinuousAggregate[] aggregates = context.getDatabase().getSchema().getContinuousAggregates(); + + final List ordered = Arrays.stream(aggregates) + .sorted(Comparator.comparing(ContinuousAggregate::getName, String::compareToIgnoreCase)) + .collect(Collectors.toList()); + + for (final ContinuousAggregate ca : ordered) { + final ResultInternal r = new ResultInternal(context.getDatabase()); + result.add(r); + + r.setProperty("name", ca.getName()); + r.setProperty("query", ca.getQuery()); + r.setProperty("backingType", ca.getBackingType().getName()); + r.setProperty("sourceType", ca.getSourceTypeName()); + r.setProperty("bucketIntervalMs", ca.getBucketIntervalMs()); + r.setProperty("bucketColumn", ca.getBucketColumn()); + r.setProperty("timestampColumn", ca.getTimestampColumn()); + r.setProperty("watermarkTs", ca.getWatermarkTs()); + r.setProperty("lastRefreshTime", ca.getLastRefreshTime()); + r.setProperty("status", ca.getStatus()); + + // Runtime metrics + r.setProperty("refreshCount", ca.getRefreshCount()); + r.setProperty("refreshTotalTimeMs", ca.getRefreshTotalTimeMs()); + r.setProperty("refreshMinTimeMs", ca.getRefreshMinTimeMs()); + r.setProperty("refreshMaxTimeMs", ca.getRefreshMaxTimeMs()); + final long count = ca.getRefreshCount(); + r.setProperty("refreshAvgTimeMs", count > 0 ? ca.getRefreshTotalTimeMs() / count : 0L); + r.setProperty("errorCount", ca.getErrorCount()); + r.setProperty("lastRefreshDurationMs", ca.getLastRefreshDurationMs()); + + context.setVariable("current", r); + } + } finally { + if (context.isProfiling()) + cost += (System.nanoTime() - begin); + } + } + return new ResultSet() { + @Override + public boolean hasNext() { + return cursor < result.size(); + } + + @Override + public Result next() { + return result.get(cursor++); + } + + @Override + public void close() { + } + + @Override + public void reset() { + cursor = 0; + } + }; + } + + @Override + public String prettyPrint(final int depth, final int indent) { + final String spaces = ExecutionStepInternal.getIndent(depth, indent); + String result = spaces + "+ FETCH DATABASE METADATA CONTINUOUS AGGREGATES"; + if (context.isProfiling()) + result += " (" + getCostFormatted() + ")"; + return result; + } +} diff --git a/engine/src/main/java/com/arcadedb/query/sql/executor/FetchFromSchemaTypesStep.java b/engine/src/main/java/com/arcadedb/query/sql/executor/FetchFromSchemaTypesStep.java index ec434a426c..308a55e549 100644 --- a/engine/src/main/java/com/arcadedb/query/sql/executor/FetchFromSchemaTypesStep.java +++ b/engine/src/main/java/com/arcadedb/query/sql/executor/FetchFromSchemaTypesStep.java @@ -19,11 +19,16 @@ package com.arcadedb.query.sql.executor; import com.arcadedb.database.Document; +import com.arcadedb.engine.timeseries.ColumnDefinition; +import com.arcadedb.engine.timeseries.DownsamplingTier; +import com.arcadedb.engine.timeseries.TimeSeriesEngine; +import com.arcadedb.engine.timeseries.TimeSeriesShard; import com.arcadedb.exception.TimeoutException; import com.arcadedb.graph.Edge; import com.arcadedb.graph.Vertex; import com.arcadedb.index.Index; import com.arcadedb.schema.DocumentType; +import com.arcadedb.schema.LocalTimeSeriesType; import com.arcadedb.schema.Schema; import java.util.*; @@ -63,9 +68,12 @@ public ResultSet syncPull(final CommandContext context, final int nRecords) thro r.setProperty("name", type.getName()); + final boolean isTimeSeries = type instanceof LocalTimeSeriesType; String t = "?"; - if (type.getType() == Document.RECORD_TYPE) + if (isTimeSeries) + t = LocalTimeSeriesType.KIND_CODE; + else if (type.getType() == Document.RECORD_TYPE) t = "document"; else if (type.getType() == Vertex.RECORD_TYPE) t = "vertex"; @@ -73,7 +81,11 @@ else if (type.getType() == Edge.RECORD_TYPE) t = "edge"; r.setProperty("type", t); - r.setProperty("records", context.getDatabase().countType(typeName, false)); + + if (isTimeSeries) + populateTimeSeriesMetadata(r, (LocalTimeSeriesType) type); + else + r.setProperty("records", context.getDatabase().countType(typeName, false)); r.setProperty("buckets", type.getBuckets(false).stream().map((b) -> b.getName()).collect(Collectors.toList())); r.setProperty("bucketSelectionStrategy", type.getBucketSelectionStrategy().getName()); @@ -164,6 +176,102 @@ public void reset() { }; } + private void populateTimeSeriesMetadata(final ResultInternal r, final LocalTimeSeriesType tsType) { + r.setProperty("timestampColumn", tsType.getTimestampColumn()); + r.setProperty("shardCount", tsType.getShardCount()); + r.setProperty("retentionMs", tsType.getRetentionMs()); + r.setProperty("compactionBucketIntervalMs", tsType.getCompactionBucketIntervalMs()); + + // Column definitions + final List tsColResults = new ArrayList<>(); + for (final ColumnDefinition col : tsType.getTsColumns()) { + final ResultInternal colR = new ResultInternal(); + colR.setProperty("name", col.getName()); + colR.setProperty("dataType", col.getDataType().name()); + colR.setProperty("role", col.getRole().name()); + tsColResults.add(colR); + } + r.setProperty("tsColumns", tsColResults); + + // Downsampling tiers + final List tiers = tsType.getDownsamplingTiers(); + if (tiers != null && !tiers.isEmpty()) { + final List tierResults = new ArrayList<>(); + for (final DownsamplingTier tier : tiers) { + final ResultInternal tierR = new ResultInternal(); + tierR.setProperty("afterMs", tier.afterMs()); + tierR.setProperty("granularityMs", tier.granularityMs()); + tierResults.add(tierR); + } + r.setProperty("downsamplingTiers", tierResults); + } + + // Engine runtime stats (per-shard diagnostics) + final TimeSeriesEngine engine = tsType.getEngine(); + if (engine != null) { + long totalSamples = 0; + long globalMin = Long.MAX_VALUE; + long globalMax = Long.MIN_VALUE; + + final List shardStats = new ArrayList<>(); + for (int s = 0; s < engine.getShardCount(); s++) { + final TimeSeriesShard shard = engine.getShard(s); + final ResultInternal shardR = new ResultInternal(); + shardR.setProperty("shard", s); + + try { + final long sealedSamples = shard.getSealedStore().getTotalSampleCount(); + final long mutableSamples = shard.getMutableBucket().getSampleCount(); + final int sealedBlocks = shard.getSealedStore().getBlockCount(); + + shardR.setProperty("sealedBlocks", sealedBlocks); + shardR.setProperty("sealedSamples", sealedSamples); + shardR.setProperty("mutableSamples", mutableSamples); + shardR.setProperty("totalSamples", sealedSamples + mutableSamples); + + totalSamples += sealedSamples + mutableSamples; + + if (sealedBlocks > 0) { + final long min = shard.getSealedStore().getGlobalMinTimestamp(); + final long max = shard.getSealedStore().getGlobalMaxTimestamp(); + shardR.setProperty("minTimestamp", min); + shardR.setProperty("maxTimestamp", max); + if (min < globalMin) + globalMin = min; + if (max > globalMax) + globalMax = max; + } + + if (mutableSamples > 0) { + final long mMin = shard.getMutableBucket().getMinTimestamp(); + final long mMax = shard.getMutableBucket().getMaxTimestamp(); + shardR.setProperty("mutableMinTimestamp", mMin); + shardR.setProperty("mutableMaxTimestamp", mMax); + if (mMin < globalMin) + globalMin = mMin; + if (mMax > globalMax) + globalMax = mMax; + } + + } catch (final Exception e) { + shardR.setProperty("error", e.getMessage()); + } + + shardStats.add(shardR); + } + + r.setProperty("records", totalSamples); + r.setProperty("shards", shardStats); + + if (globalMin != Long.MAX_VALUE) + r.setProperty("globalMinTimestamp", globalMin); + if (globalMax != Long.MIN_VALUE) + r.setProperty("globalMaxTimestamp", globalMax); + } else { + r.setProperty("records", 0L); + } + } + @Override public String prettyPrint(final int depth, final int indent) { final String spaces = ExecutionStepInternal.getIndent(depth, indent); diff --git a/engine/src/main/java/com/arcadedb/query/sql/executor/FetchFromTimeSeriesStep.java b/engine/src/main/java/com/arcadedb/query/sql/executor/FetchFromTimeSeriesStep.java new file mode 100644 index 0000000000..c96761a4f9 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/query/sql/executor/FetchFromTimeSeriesStep.java @@ -0,0 +1,153 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.query.sql.executor; + +import com.arcadedb.engine.timeseries.ColumnDefinition; +import com.arcadedb.engine.timeseries.TagFilter; +import com.arcadedb.engine.timeseries.TimeSeriesEngine; +import com.arcadedb.exception.CommandExecutionException; +import com.arcadedb.exception.TimeoutException; +import com.arcadedb.schema.LocalTimeSeriesType; + +import java.io.IOException; +import java.util.Date; +import java.util.Iterator; +import java.util.List; + +/** + * Execution step that fetches data from a TimeSeries engine. + * Supports profiling via the standard {@code context.isProfiling()} mechanism. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public class FetchFromTimeSeriesStep extends AbstractExecutionStep { + + private final LocalTimeSeriesType tsType; + private final long fromTs; + private final long toTs; + private final TagFilter tagFilter; + private Iterator resultIterator; + private boolean fetched = false; + + public FetchFromTimeSeriesStep(final LocalTimeSeriesType tsType, final long fromTs, final long toTs, + final CommandContext context) { + this(tsType, fromTs, toTs, null, context); + } + + public FetchFromTimeSeriesStep(final LocalTimeSeriesType tsType, final long fromTs, final long toTs, + final TagFilter tagFilter, final CommandContext context) { + super(context); + this.tsType = tsType; + this.fromTs = fromTs; + this.toTs = toTs; + this.tagFilter = tagFilter; + } + + @Override + public ResultSet syncPull(final CommandContext context, final int nRecords) throws TimeoutException { + final long begin = context.isProfiling() ? System.nanoTime() : 0; + try { + if (!fetched) { + try { + final TimeSeriesEngine engine = tsType.getEngine(); + if (engine == null) + throw new CommandExecutionException( + "TimeSeries engine for type '" + tsType.getName() + "' is not initialized"); + resultIterator = engine.iterateQuery(fromTs, toTs, null, tagFilter); + fetched = true; + } catch (final CommandExecutionException e) { + throw e; + } catch (final IOException e) { + throw new CommandExecutionException("Error querying TimeSeries engine", e); + } + } + + final List columns = tsType.getTsColumns(); + + return new ResultSet() { + private int count = 0; + + @Override + public boolean hasNext() { + final long begin1 = context.isProfiling() ? System.nanoTime() : 0; + try { + return count < nRecords && resultIterator.hasNext(); + } finally { + if (context.isProfiling()) + cost += (System.nanoTime() - begin1); + } + } + + @Override + public Result next() { + final long begin1 = context.isProfiling() ? System.nanoTime() : 0; + try { + if (!hasNext()) + throw new IllegalStateException("No more results"); + + count++; + final Object[] row = resultIterator.next(); + final ResultInternal result = new ResultInternal(context.getDatabase()); + + for (int i = 0; i < columns.size() && i < row.length; i++) { + final ColumnDefinition col = columns.get(i); + Object value = row[i]; + + // Convert timestamp long to Date for SQL compatibility + if (col.getRole() == ColumnDefinition.ColumnRole.TIMESTAMP && value instanceof Long) + value = new Date((Long) value); + + result.setProperty(col.getName(), value); + } + + rowCount++; + return result; + } finally { + if (context.isProfiling()) + cost += (System.nanoTime() - begin1); + } + } + + @Override + public void close() { + // no-op + } + }; + } finally { + if (context.isProfiling()) + cost += (System.nanoTime() - begin); + } + } + + @Override + public String prettyPrint(final int depth, final int indent) { + final String spaces = ExecutionStepInternal.getIndent(depth, indent); + final StringBuilder sb = new StringBuilder(); + sb.append(spaces).append("+ FETCH FROM TIMESERIES ").append(tsType.getName()); + sb.append(" [").append(fromTs).append(" - ").append(toTs).append("]"); + if (context.isProfiling()) + sb.append(" (").append(getCostFormatted()).append(", ").append(getRowCountFormatted()).append(")"); + return sb.toString(); + } + + @Override + public ExecutionStep copy(final CommandContext context) { + return new FetchFromTimeSeriesStep(tsType, fromTs, toTs, tagFilter, context); + } +} diff --git a/engine/src/main/java/com/arcadedb/query/sql/executor/SaveElementStep.java b/engine/src/main/java/com/arcadedb/query/sql/executor/SaveElementStep.java index ccbfacc5a3..f9d6bea536 100644 --- a/engine/src/main/java/com/arcadedb/query/sql/executor/SaveElementStep.java +++ b/engine/src/main/java/com/arcadedb/query/sql/executor/SaveElementStep.java @@ -20,8 +20,26 @@ import com.arcadedb.database.Document; import com.arcadedb.database.MutableDocument; +import com.arcadedb.database.TransactionContext; +import com.arcadedb.engine.timeseries.ColumnDefinition; +import com.arcadedb.engine.timeseries.TimeSeriesEngine; +import com.arcadedb.exception.CommandExecutionException; import com.arcadedb.exception.TimeoutException; +import com.arcadedb.log.LogManager; import com.arcadedb.query.sql.parser.Identifier; +import com.arcadedb.schema.ContinuousAggregate; +import com.arcadedb.schema.ContinuousAggregateImpl; +import com.arcadedb.schema.ContinuousAggregateRefresher; +import com.arcadedb.schema.LocalSchema; +import com.arcadedb.schema.LocalTimeSeriesType; +import com.arcadedb.schema.Type; + +import java.io.IOException; +import java.time.Instant; +import java.time.ZoneId; +import java.util.Date; +import java.util.List; +import java.util.logging.Level; /** * @author Luigi Dell'Aquila (luigi.dellaquila-(at)-gmail.com) @@ -54,17 +72,15 @@ public Result next() { if (doc == null) throw new IllegalArgumentException("Cannot save a null document"); - final MutableDocument modifiableDoc; -// if (createAlways) { -// // STRIPE OFF ANY IDENTITY TO FORCE AN INSERT. THIS IS NECESSARY IF THE RECORD IS COMING FROM A SELECT -// if (doc instanceof Vertex) -// modifiableDoc = context.getDatabase().newVertex(doc.getTypeName()).fromMap(doc.toMap(false)); -// else if (doc instanceof Edge) -// throw new IllegalArgumentException("Cannot duplicate an edge"); -// else -// modifiableDoc = context.getDatabase().newDocument(doc.getTypeName()).fromMap(doc.toMap(false)); -// } else - modifiableDoc = doc.modify(); + // Check if this is a TimeSeries type — route to TimeSeriesEngine + final var docType = context.getDatabase().getSchema().getType(doc.getTypeName()); + if (docType instanceof LocalTimeSeriesType tsType && tsType.getEngine() != null) { + saveToTimeSeries(tsType, doc, context); + scheduleContinuousAggregateRefresh(context, tsType); + return result; + } + + final MutableDocument modifiableDoc = doc.modify(); if (bucket == null) modifiableDoc.save(); @@ -81,6 +97,103 @@ public void close() { }; } + private void saveToTimeSeries(final LocalTimeSeriesType tsType, final Document doc, final CommandContext context) { + final TimeSeriesEngine engine = tsType.getEngine(); + final List columns = tsType.getTsColumns(); + final ZoneId zoneId = context.getDatabase().getSchema().getZoneId(); + + final long[] timestamps = new long[1]; + int nonTsCount = 0; + for (final ColumnDefinition col : columns) + if (col.getRole() != ColumnDefinition.ColumnRole.TIMESTAMP) + nonTsCount++; + final Object[][] columnValues = new Object[nonTsCount][1]; + + int colIdx = 0; + for (int i = 0; i < columns.size(); i++) { + final ColumnDefinition col = columns.get(i); + final Object value = doc.get(col.getName()); + + if (col.getRole() == ColumnDefinition.ColumnRole.TIMESTAMP) { + timestamps[0] = toEpochMs(value, zoneId); + } else { + columnValues[colIdx][0] = convertValue(value, col.getDataType()); + colIdx++; + } + } + + try { + engine.appendSamples(timestamps, columnValues); + } catch (final IOException e) { + throw new CommandExecutionException("Error appending to TimeSeries engine", e); + } + } + + private void scheduleContinuousAggregateRefresh(final CommandContext context, final LocalTimeSeriesType tsType) { + final LocalSchema schema = (LocalSchema) context.getDatabase().getSchema(); + final ContinuousAggregate[] aggregates = schema.getContinuousAggregates(); + if (aggregates.length == 0) + return; + + final String typeName = tsType.getName(); + final TransactionContext tx = context.getDatabase().getTransaction(); + + for (final ContinuousAggregate ca : aggregates) { + if (typeName.equals(ca.getSourceTypeName())) { + final String callbackKey = "ca-refresh:" + ca.getName(); + final ContinuousAggregateImpl caImpl = (ContinuousAggregateImpl) ca; + tx.addAfterCommitCallbackIfAbsent(callbackKey, () -> { + try { + ContinuousAggregateRefresher.incrementalRefresh(context.getDatabase(), caImpl); + } catch (final Exception e) { + LogManager.instance().log(SaveElementStep.class, Level.WARNING, + "Error refreshing continuous aggregate '%s' after commit: %s", e, ca.getName(), e.getMessage()); + } + }); + } + } + } + + private static long toEpochMs(final Object value, final ZoneId zoneId) { + if (value instanceof Long l) + return l; + if (value instanceof Date d) + return d.getTime(); + if (value instanceof Instant i) + return i.toEpochMilli(); + if (value instanceof Number n) + return n.longValue(); + if (value instanceof java.time.LocalDateTime ldt) + return ldt.atZone(zoneId).toInstant().toEpochMilli(); + if (value instanceof java.time.LocalDate ld) + return ld.atStartOfDay(zoneId).toInstant().toEpochMilli(); + if (value instanceof String s) { + try { + return Instant.parse(s).toEpochMilli(); + } catch (final Exception e) { + try { + return java.time.LocalDate.parse(s).atStartOfDay(zoneId).toInstant().toEpochMilli(); + } catch (final Exception e2) { + throw new CommandExecutionException("Cannot parse timestamp: '" + s + "'", e); + } + } + } + throw new CommandExecutionException("Cannot convert to timestamp: " + (value != null ? value.getClass().getName() : "null")); + } + + private static Object convertValue(final Object value, final Type targetType) { + if (value == null) + return null; + return switch (targetType) { + case DOUBLE -> value instanceof Number n ? n.doubleValue() : Double.parseDouble(value.toString()); + case LONG -> value instanceof Number n ? n.longValue() : Long.parseLong(value.toString()); + case INTEGER -> value instanceof Number n ? n.intValue() : Integer.parseInt(value.toString()); + case FLOAT -> value instanceof Number n ? n.floatValue() : Float.parseFloat(value.toString()); + case SHORT -> value instanceof Number n ? n.shortValue() : Short.parseShort(value.toString()); + default -> value; + }; + } + @Override public String prettyPrint(final int depth, final int indent) { final String spaces = ExecutionStepInternal.getIndent(depth, indent); diff --git a/engine/src/main/java/com/arcadedb/query/sql/executor/SelectExecutionPlanner.java b/engine/src/main/java/com/arcadedb/query/sql/executor/SelectExecutionPlanner.java index 51a2ffb8e5..80538b3e3e 100644 --- a/engine/src/main/java/com/arcadedb/query/sql/executor/SelectExecutionPlanner.java +++ b/engine/src/main/java/com/arcadedb/query/sql/executor/SelectExecutionPlanner.java @@ -33,6 +33,7 @@ import com.arcadedb.query.sql.parser.BaseExpression; import com.arcadedb.query.sql.parser.BinaryCompareOperator; import com.arcadedb.query.sql.parser.BinaryCondition; +import com.arcadedb.query.sql.parser.BetweenCondition; import com.arcadedb.query.sql.parser.BooleanExpression; import com.arcadedb.query.sql.parser.Bucket; import com.arcadedb.query.sql.parser.ContainsTextCondition; @@ -68,8 +69,16 @@ import com.arcadedb.query.sql.parser.SubQueryCollector; import com.arcadedb.query.sql.parser.Timeout; import com.arcadedb.query.sql.parser.WhereClause; +import com.arcadedb.engine.timeseries.AggregationType; +import com.arcadedb.engine.timeseries.ColumnDefinition; +import com.arcadedb.engine.timeseries.MultiColumnAggregationRequest; +import com.arcadedb.engine.timeseries.TagFilter; +import com.arcadedb.function.sql.time.SQLFunctionTimeBucket; +import com.arcadedb.query.sql.parser.BaseIdentifier; +import com.arcadedb.query.sql.parser.LevelZeroIdentifier; import com.arcadedb.schema.DocumentType; import com.arcadedb.schema.LocalDocumentType; +import com.arcadedb.schema.LocalTimeSeriesType; import com.arcadedb.schema.Property; import com.arcadedb.schema.Schema; import com.arcadedb.schema.Type; @@ -1459,6 +1468,7 @@ private void handleSchemaAsTarget(final SelectExecutionPlan plan, final SchemaId case "database" -> plan.chain(new FetchFromSchemaDatabaseStep(context)); case "buckets" -> plan.chain(new FetchFromSchemaBucketsStep(context)); case "materializedviews" -> plan.chain(new FetchFromSchemaMaterializedViewsStep(context)); + case "continuousaggregates" -> plan.chain(new FetchFromSchemaContinuousAggregatesStep(context)); case "stats" -> plan.chain(new FetchFromSchemaStatsStep(context)); case "dictionary" -> plan.chain(new FetchFromSchemaDictionaryStep(context)); default -> { @@ -1621,6 +1631,41 @@ private void handleTypeAsTarget(final SelectExecutionPlan plan, final Set filterClusters, final FromClause from, final QueryPlanningInfo info, final CommandContext context) { final Identifier identifier = from.getItem().getIdentifier(); + + // Check if this is a TimeSeries type — use the engine for range queries + final DocumentType docType = context.getDatabase().getSchema().getType(identifier.getStringValue()); + if (docType instanceof LocalTimeSeriesType tsType && tsType.getEngine() != null) { + // Extract time range from WHERE clause (if available) + long fromTs = Long.MIN_VALUE; + long toTs = Long.MAX_VALUE; + + if (info.flattenedWhereClause != null) { + for (final AndBlock andBlock : info.flattenedWhereClause) { + for (final BooleanExpression expr : andBlock.getSubBlocks()) { + final long[] range = extractTimeRange(expr, tsType.getTimestampColumn(), context); + if (range != null) { + // Tighten bounds: take the most restrictive range + if (range[0] != Long.MIN_VALUE) + fromTs = Math.max(fromTs, range[0]); + if (range[1] != Long.MAX_VALUE) + toTs = Math.min(toTs, range[1]); + } + } + } + } + + // Extract tag filter from WHERE clause + final TagFilter tagFilter = extractTagFilter(info.flattenedWhereClause, tsType.getTsColumns(), + tsType.getTimestampColumn(), context); + + // Try push-down aggregation before falling back to full row fetch + if (tryTimeSeriesAggregationPushDown(plan, tsType, fromTs, toTs, info, context)) + return; + + plan.chain(new FetchFromTimeSeriesStep(tsType, fromTs, toTs, tagFilter, context)); + return; + } + if (handleTypeAsTargetWithIndexedFunction(plan, filterClusters, identifier, info, context)) { plan.chain(new FilterByTypeStep(identifier, context)); return; @@ -1650,6 +1695,329 @@ else if (isOrderByRidDesc(info)) plan.chain(fetcher); } + /** + * Extracts a time range from a BETWEEN or comparison expression on the timestamp column. + * Returns [fromTs, toTs] or null if not a matching expression. + * Supports: BETWEEN, >, >=, <, <=, = operators. + */ + private long[] extractTimeRange(final BooleanExpression expr, final String timestampColumn, final CommandContext context) { + if (expr instanceof BetweenCondition between) { + final String fieldName = between.getFirst() != null ? between.getFirst().toString().trim() : null; + if (timestampColumn.equals(fieldName)) { + final Object fromVal = between.getSecond().execute((Identifiable) null, context); + final Object toVal = between.getThird().execute((Identifiable) null, context); + return new long[] { toEpochMs(fromVal), toEpochMs(toVal) }; + } + } else if (expr instanceof BinaryCondition binary) { + // Check if one side is the timestamp column and the other is a value + final String leftStr = binary.left != null ? binary.left.toString().trim() : null; + final String rightStr = binary.right != null ? binary.right.toString().trim() : null; + final boolean leftIsTs = timestampColumn.equals(leftStr); + final boolean rightIsTs = timestampColumn.equals(rightStr); + + if (leftIsTs || rightIsTs) { + final Expression valueExpr = leftIsTs ? binary.right : binary.left; + final Object rawVal = valueExpr.execute((Identifiable) null, context); + final long val = toEpochMs(rawVal); + if (val == Long.MIN_VALUE) + return null; + + final BinaryCompareOperator op = binary.operator; + // When field is on the right side, invert the operator semantics + if (leftIsTs) { + if (op instanceof GtOperator) + return new long[] { val + 1, Long.MAX_VALUE }; + if (op instanceof GeOperator) + return new long[] { val, Long.MAX_VALUE }; + if (op instanceof LtOperator) + return new long[] { Long.MIN_VALUE, val - 1 }; + if (op instanceof LeOperator) + return new long[] { Long.MIN_VALUE, val }; + if (op instanceof EqualsCompareOperator) + return new long[] { val, val }; + } else { + // timestamp is on the right: "value > ts" means "ts < value" + if (op instanceof GtOperator) + return new long[] { Long.MIN_VALUE, val - 1 }; + if (op instanceof GeOperator) + return new long[] { Long.MIN_VALUE, val }; + if (op instanceof LtOperator) + return new long[] { val + 1, Long.MAX_VALUE }; + if (op instanceof LeOperator) + return new long[] { val, Long.MAX_VALUE }; + if (op instanceof EqualsCompareOperator) + return new long[] { val, val }; + } + } + } + return null; + } + + private static long toEpochMs(final Object value) { + if (value instanceof Long l) + return l; + if (value instanceof java.util.Date d) + return d.getTime(); + if (value instanceof Number n) + return n.longValue(); + if (value instanceof String s) { + try { + return java.time.Instant.parse(s).toEpochMilli(); + } catch (final Exception e) { + // Try parsing as ISO date without time (assumes UTC) + try { + return java.time.LocalDate.parse(s).atStartOfDay(java.time.ZoneOffset.UTC).toInstant().toEpochMilli(); + } catch (final Exception e2) { + throw new CommandExecutionException("Cannot parse timestamp: '" + s + "'", e); + } + } + } + return Long.MIN_VALUE; + } + + /** + * Extracts a TagFilter from the flattened WHERE clause by matching equality predicates on TAG columns. + * Only simple equality conditions (column = 'value') on TAG columns are extracted. + */ + private static TagFilter extractTagFilter(final List flattenedWhere, final List columns, + final String timestampColumn, final CommandContext context) { + if (flattenedWhere == null) + return null; + + TagFilter filter = null; + for (final AndBlock andBlock : flattenedWhere) { + for (final BooleanExpression expr : andBlock.getSubBlocks()) { + if (!(expr instanceof BinaryCondition binary)) + continue; + if (!(binary.operator instanceof EqualsCompareOperator)) + continue; + final String leftStr = binary.left != null ? binary.left.toString().trim() : null; + final String rightStr = binary.right != null ? binary.right.toString().trim() : null; + if (leftStr == null || rightStr == null) + continue; + // Skip timestamp predicates — already handled by time range extraction + if (timestampColumn.equals(leftStr) || timestampColumn.equals(rightStr)) + continue; + + // Determine which side is the column name and which is the value + for (int i = 0; i < columns.size(); i++) { + final ColumnDefinition col = columns.get(i); + if (col.getRole() != ColumnDefinition.ColumnRole.TAG) + continue; + final boolean leftIsCol = col.getName().equals(leftStr); + final boolean rightIsCol = col.getName().equals(rightStr); + if (!leftIsCol && !rightIsCol) + continue; + final Expression valueExpr = leftIsCol ? binary.right : binary.left; + final Object value = valueExpr.execute((Identifiable) null, context); + if (value == null) + continue; + // Column index for TagFilter is the non-timestamp column index + int nonTsIdx = -1; + for (int j = 0; j <= i; j++) + if (columns.get(j).getRole() != ColumnDefinition.ColumnRole.TIMESTAMP) + nonTsIdx++; + filter = filter == null ? TagFilter.eq(nonTsIdx, value.toString()) : filter.and(nonTsIdx, value.toString()); + break; + } + } + } + return filter; + } + + /** + * Returns true if the WHERE clause contains conditions that are NOT consumed by time-series + * push-down (i.e., not time-range predicates and not tag equality filters). + */ + private static boolean hasNonPushDownConditions(final List flattenedWhere, + final List columns, final String timestampColumn) { + for (final AndBlock andBlock : flattenedWhere) { + for (final BooleanExpression expr : andBlock.getSubBlocks()) { + if (expr instanceof BetweenCondition between) { + final String fieldName = between.getFirst() != null ? between.getFirst().toString().trim() : null; + if (timestampColumn.equals(fieldName)) + continue; // consumed by time-range extraction + return true; // BETWEEN on a non-timestamp field — not consumed + } + if (!(expr instanceof BinaryCondition binary)) + return true; // unknown condition type — not consumed + final String leftStr = binary.left != null ? binary.left.toString().trim() : null; + final String rightStr = binary.right != null ? binary.right.toString().trim() : null; + // Time range predicate on timestamp column + if (timestampColumn.equals(leftStr) || timestampColumn.equals(rightStr)) + continue; + // Tag equality predicate + if (binary.operator instanceof EqualsCompareOperator) { + boolean isTagPredicate = false; + for (final ColumnDefinition col : columns) + if (col.getRole() == ColumnDefinition.ColumnRole.TAG && (col.getName().equals(leftStr) || col.getName().equals(rightStr))) { + isTagPredicate = true; + break; + } + if (isTagPredicate) + continue; + } + return true; // anything else is not consumed by push-down + } + } + return false; + } + + /** + * Attempts to push down aggregation into the TimeSeries engine. + * Eligible queries have: ts.timeBucket GROUP BY, simple aggregate functions (avg, max, min, sum, count), + * no DISTINCT, no HAVING, no UNWIND, no LET. + */ + private boolean tryTimeSeriesAggregationPushDown(final SelectExecutionPlan plan, final LocalTimeSeriesType tsType, + final long fromTs, final long toTs, final QueryPlanningInfo info, final CommandContext context) { + // Must have aggregate projection (set by splitProjectionsForGroupBy) + if (info.aggregateProjection == null) + return false; + + // No DISTINCT + if (info.distinct) + return false; + + // Must have exactly one GROUP BY + if (info.groupBy == null || info.groupBy.getItems() == null || info.groupBy.getItems().size() != 1) + return false; + + // No unsupported clauses + if (info.unwind != null || info.perRecordLetClause != null || info.globalLetPresent) + return false; + + // The original projection from the statement (before splitting) + final Projection originalProjection = statement.getProjection(); + if (originalProjection == null || originalProjection.getItems() == null) + return false; + + // Find the timeBucket item and aggregate items + String timeBucketAlias = null; + String intervalStr = null; + final List requests = new ArrayList<>(); + final Map requestAliasToOutputAlias = new HashMap<>(); + final List columns = tsType.getTsColumns(); + + for (final ProjectionItem item : originalProjection.getItems()) { + final FunctionCall funcCall = extractFunctionCall(item.expression); + if (funcCall == null) + return false; // not a simple function call — bail out + + final String funcName = funcCall.getName().getStringValue(); + + if ("ts.timeBucket".equalsIgnoreCase(funcName)) { + // This is the time bucket function + if (timeBucketAlias != null) + return false; // duplicate timeBucket + timeBucketAlias = item.getProjectionAliasAsString(); + // Extract interval from first parameter + if (funcCall.getParams().size() < 2) + return false; + final Object intervalVal = funcCall.getParams().get(0).execute((Identifiable) null, context); + if (!(intervalVal instanceof String)) + return false; + intervalStr = (String) intervalVal; + } else { + // Must be an aggregate function + final String aggFuncName = funcName.toLowerCase(); + final AggregationType aggType = switch (aggFuncName) { + case "avg" -> AggregationType.AVG; + case "max" -> AggregationType.MAX; + case "min" -> AggregationType.MIN; + case "sum" -> AggregationType.SUM; + case "count" -> AggregationType.COUNT; + default -> null; + }; + if (aggType == null) + return false; // unsupported aggregate + + // For COUNT(*), columnIndex doesn't matter + int columnIndex = 0; + if (aggType != AggregationType.COUNT) { + // Extract field name from first parameter + if (funcCall.getParams().isEmpty()) + return false; + final String fieldName = funcCall.getParams().get(0).toString().trim(); + columnIndex = findColumnIndex(columns, fieldName); + if (columnIndex < 0) + return false; // field not found in timeseries columns + } + + final String alias = item.getProjectionAliasAsString(); + requests.add(new MultiColumnAggregationRequest(columnIndex, aggType, alias)); + requestAliasToOutputAlias.put(alias, alias); + } + } + + // Must have found both timeBucket and at least one aggregate + if (timeBucketAlias == null || intervalStr == null || requests.isEmpty()) + return false; + + // Verify GROUP BY references the timeBucket alias + final String groupByStr = info.groupBy.getItems().get(0).toString().trim(); + if (!groupByStr.equals(timeBucketAlias)) + return false; + + // Parse interval + final long bucketIntervalMs; + try { + bucketIntervalMs = SQLFunctionTimeBucket.parseInterval(intervalStr); + } catch (final IllegalArgumentException e) { + return false; + } + + // Extract tag filter from WHERE clause for push-down + final TagFilter tagFilter = extractTagFilter(info.flattenedWhereClause, columns, tsType.getTimestampColumn(), context); + + // Verify all WHERE conditions are consumed by push-down (time-range or tag equality). + // If any field-value predicate remains (e.g., WHERE value > 100), bail out to avoid + // silently dropping it — the standard filter step will handle it instead. + if (info.flattenedWhereClause != null && hasNonPushDownConditions(info.flattenedWhereClause, columns, tsType.getTimestampColumn())) + return false; + + // Chain the push-down step + plan.chain(new AggregateFromTimeSeriesStep(tsType, fromTs, toTs, requests, bucketIntervalMs, + timeBucketAlias, requestAliasToOutputAlias, tagFilter, context)); + + // Null out the aggregate projections so handleProjections doesn't add duplicate steps + info.preAggregateProjection = null; + info.aggregateProjection = null; + info.groupBy = null; + info.projectionsCalculated = true; + // The time range and tag filters are consumed by the push-down step + info.whereClause = null; + info.flattenedWhereClause = null; + + return true; + } + + /** + * Extracts a FunctionCall from an Expression if it's a simple function call. + * Returns null if the expression is not a simple function call. + */ + private static FunctionCall extractFunctionCall(final Expression expr) { + if (expr == null || expr.mathExpression == null) + return null; + if (!(expr.mathExpression instanceof BaseExpression base)) + return null; + if (base.identifier == null) + return null; + if (base.identifier.levelZero != null && base.identifier.levelZero.functionCall != null) + return base.identifier.levelZero.functionCall; + return null; + } + + /** + * Finds the index of a column by name in the timeseries column definitions. + * Returns -1 if not found. + */ + private static int findColumnIndex(final List columns, final String fieldName) { + for (int i = 0; i < columns.size(); i++) + if (columns.get(i).getName().equals(fieldName)) + return i; + return -1; + } + private boolean handleTypeAsTargetWithIndexedFunction(final SelectExecutionPlan plan, final Set filterClusters, final Identifier queryTarget, final QueryPlanningInfo info, final CommandContext context) { if (queryTarget == null) diff --git a/engine/src/main/java/com/arcadedb/query/sql/parser/AlterTimeSeriesTypeStatement.java b/engine/src/main/java/com/arcadedb/query/sql/parser/AlterTimeSeriesTypeStatement.java new file mode 100644 index 0000000000..10a9c93769 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/query/sql/parser/AlterTimeSeriesTypeStatement.java @@ -0,0 +1,106 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.query.sql.parser; + +import com.arcadedb.engine.timeseries.DownsamplingTier; +import com.arcadedb.exception.CommandExecutionException; +import com.arcadedb.query.sql.executor.CommandContext; +import com.arcadedb.query.sql.executor.InternalResultSet; +import com.arcadedb.query.sql.executor.ResultInternal; +import com.arcadedb.query.sql.executor.ResultSet; +import com.arcadedb.schema.DocumentType; +import com.arcadedb.schema.LocalSchema; +import com.arcadedb.schema.LocalTimeSeriesType; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; +import java.util.Objects; + +/** + * SQL statement: ALTER TIMESERIES TYPE + */ +public class AlterTimeSeriesTypeStatement extends DDLStatement { + + public Identifier name; + public boolean addPolicy; + public List tiers = new ArrayList<>(); + + public AlterTimeSeriesTypeStatement(final int id) { + super(id); + } + + @Override + public ResultSet executeDDL(final CommandContext context) { + final DocumentType type = context.getDatabase().getSchema().getType(name.getStringValue()); + if (!(type instanceof LocalTimeSeriesType tsType)) + throw new CommandExecutionException("Type '" + name.getStringValue() + "' is not a TimeSeries type"); + + if (addPolicy) + tsType.setDownsamplingTiers(tiers); + else + tsType.setDownsamplingTiers(new ArrayList<>()); + + ((LocalSchema) context.getDatabase().getSchema()).saveConfiguration(); + + final ResultInternal result = new ResultInternal(context.getDatabase()); + result.setProperty("operation", addPolicy ? "add downsampling policy" : "drop downsampling policy"); + result.setProperty("typeName", name.getStringValue()); + return new InternalResultSet(result); + } + + @Override + public void toString(final Map params, final StringBuilder builder) { + builder.append("ALTER TIMESERIES TYPE "); + name.toString(params, builder); + + if (addPolicy) { + builder.append(" ADD DOWNSAMPLING POLICY"); + for (final DownsamplingTier tier : tiers) { + builder.append(" AFTER ").append(tier.afterMs()); + builder.append(" GRANULARITY ").append(tier.granularityMs()); + } + } else + builder.append(" DROP DOWNSAMPLING POLICY"); + } + + @Override + public AlterTimeSeriesTypeStatement copy() { + final AlterTimeSeriesTypeStatement result = new AlterTimeSeriesTypeStatement(-1); + result.name = name == null ? null : name.copy(); + result.addPolicy = addPolicy; + result.tiers = new ArrayList<>(tiers); + return result; + } + + @Override + public boolean equals(final Object o) { + if (this == o) + return true; + if (o == null || getClass() != o.getClass()) + return false; + final AlterTimeSeriesTypeStatement that = (AlterTimeSeriesTypeStatement) o; + return addPolicy == that.addPolicy && Objects.equals(name, that.name) && Objects.equals(tiers, that.tiers); + } + + @Override + public int hashCode() { + return Objects.hash(name, addPolicy, tiers); + } +} diff --git a/engine/src/main/java/com/arcadedb/query/sql/parser/CreateContinuousAggregateStatement.java b/engine/src/main/java/com/arcadedb/query/sql/parser/CreateContinuousAggregateStatement.java new file mode 100644 index 0000000000..5357654e88 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/query/sql/parser/CreateContinuousAggregateStatement.java @@ -0,0 +1,64 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.query.sql.parser; + +import com.arcadedb.database.Database; +import com.arcadedb.query.sql.executor.CommandContext; +import com.arcadedb.query.sql.executor.InternalResultSet; +import com.arcadedb.query.sql.executor.ResultInternal; +import com.arcadedb.query.sql.executor.ResultSet; + +public class CreateContinuousAggregateStatement extends DDLStatement { + public Identifier name; + public SelectStatement selectStatement; + public boolean ifNotExists = false; + + public CreateContinuousAggregateStatement(final int id) { + super(id); + } + + @Override + public ResultSet executeDDL(final CommandContext context) { + final Database database = context.getDatabase(); + final String caName = name.getStringValue(); + + database.getSchema().buildContinuousAggregate() + .withName(caName) + .withQuery(selectStatement.toString()) + .withIgnoreIfExists(ifNotExists) + .create(); + + final InternalResultSet result = new InternalResultSet(); + final ResultInternal r = new ResultInternal(); + r.setProperty("operation", "create continuous aggregate"); + r.setProperty("name", caName); + result.add(r); + return result; + } + + @Override + public String toString() { + final StringBuilder sb = new StringBuilder("CREATE CONTINUOUS AGGREGATE "); + if (ifNotExists) + sb.append("IF NOT EXISTS "); + sb.append(name); + sb.append(" AS ").append(selectStatement); + return sb.toString(); + } +} diff --git a/engine/src/main/java/com/arcadedb/query/sql/parser/CreateTimeSeriesTypeStatement.java b/engine/src/main/java/com/arcadedb/query/sql/parser/CreateTimeSeriesTypeStatement.java new file mode 100644 index 0000000000..78196ea9c1 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/query/sql/parser/CreateTimeSeriesTypeStatement.java @@ -0,0 +1,193 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.query.sql.parser; + +import com.arcadedb.exception.CommandExecutionException; +import com.arcadedb.query.sql.executor.CommandContext; +import com.arcadedb.query.sql.executor.InternalResultSet; +import com.arcadedb.query.sql.executor.ResultInternal; +import com.arcadedb.query.sql.executor.ResultSet; +import com.arcadedb.schema.Schema; +import com.arcadedb.schema.TimeSeriesTypeBuilder; +import com.arcadedb.schema.Type; + +import java.util.ArrayList; +import java.util.List; +import java.util.Map; +import java.util.Objects; + +/** + * SQL statement: CREATE TIMESERIES TYPE + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public class CreateTimeSeriesTypeStatement extends DDLStatement { + + public Identifier name; + public boolean ifNotExists; + public Identifier timestampColumn; + public PInteger shards; + public long retentionMs; + public long compactionIntervalMs; + + public List tags = new ArrayList<>(); + public List fields = new ArrayList<>(); + + public CreateTimeSeriesTypeStatement(final int id) { + super(id); + } + + @Override + public ResultSet executeDDL(final CommandContext context) { + final Schema schema = context.getDatabase().getSchema(); + + if (schema.existsType(name.getStringValue())) { + if (ifNotExists) + return new InternalResultSet(); + else + throw new CommandExecutionException("Type '" + name.getStringValue() + "' already exists"); + } + + TimeSeriesTypeBuilder builder = schema.buildTimeSeriesType().withName(name.getStringValue()); + + if (timestampColumn != null) + builder = builder.withTimestamp(timestampColumn.getStringValue()); + + for (final ColumnDef tag : tags) + builder = builder.withTag(tag.name.getStringValue(), Type.getTypeByName(tag.type.getStringValue())); + + for (final ColumnDef field : fields) + builder = builder.withField(field.name.getStringValue(), Type.getTypeByName(field.type.getStringValue())); + + if (shards != null) + builder = builder.withShards(shards.getValue().intValue()); + + if (retentionMs > 0) + builder = builder.withRetention(retentionMs); + + if (compactionIntervalMs > 0) + builder = builder.withCompactionBucketInterval(compactionIntervalMs); + + builder.create(); + + final ResultInternal result = new ResultInternal(context.getDatabase()); + result.setProperty("operation", "create timeseries type"); + result.setProperty("typeName", name.getStringValue()); + return new InternalResultSet(result); + } + + @Override + public void toString(final Map params, final StringBuilder builder) { + builder.append("CREATE TIMESERIES TYPE "); + name.toString(params, builder); + + if (ifNotExists) + builder.append(" IF NOT EXISTS"); + + if (timestampColumn != null) { + builder.append(" TIMESTAMP "); + timestampColumn.toString(params, builder); + } + + if (!tags.isEmpty()) { + builder.append(" TAGS ("); + for (int i = 0; i < tags.size(); i++) { + if (i > 0) + builder.append(", "); + tags.get(i).name.toString(params, builder); + builder.append(" "); + tags.get(i).type.toString(params, builder); + } + builder.append(")"); + } + + if (!fields.isEmpty()) { + builder.append(" FIELDS ("); + for (int i = 0; i < fields.size(); i++) { + if (i > 0) + builder.append(", "); + fields.get(i).name.toString(params, builder); + builder.append(" "); + fields.get(i).type.toString(params, builder); + } + builder.append(")"); + } + + if (shards != null) { + builder.append(" SHARDS "); + shards.toString(params, builder); + } + + if (retentionMs > 0) { + builder.append(" RETENTION "); + builder.append(retentionMs); + } + + if (compactionIntervalMs > 0) { + builder.append(" COMPACTION_INTERVAL "); + builder.append(compactionIntervalMs); + } + } + + @Override + public CreateTimeSeriesTypeStatement copy() { + final CreateTimeSeriesTypeStatement result = new CreateTimeSeriesTypeStatement(-1); + result.name = name == null ? null : name.copy(); + result.ifNotExists = ifNotExists; + result.timestampColumn = timestampColumn == null ? null : timestampColumn.copy(); + result.shards = shards == null ? null : shards.copy(); + result.retentionMs = retentionMs; + result.compactionIntervalMs = compactionIntervalMs; + result.tags = new ArrayList<>(tags.size()); + for (final ColumnDef cd : tags) + result.tags.add(new ColumnDef(cd.name == null ? null : cd.name.copy(), cd.type == null ? null : cd.type.copy())); + result.fields = new ArrayList<>(fields.size()); + for (final ColumnDef cd : fields) + result.fields.add(new ColumnDef(cd.name == null ? null : cd.name.copy(), cd.type == null ? null : cd.type.copy())); + return result; + } + + @Override + public boolean equals(final Object o) { + if (this == o) + return true; + if (o == null || getClass() != o.getClass()) + return false; + final CreateTimeSeriesTypeStatement that = (CreateTimeSeriesTypeStatement) o; + return ifNotExists == that.ifNotExists && retentionMs == that.retentionMs + && compactionIntervalMs == that.compactionIntervalMs && Objects.equals(name, that.name) + && Objects.equals(timestampColumn, that.timestampColumn) && Objects.equals(shards, that.shards) + && Objects.equals(tags, that.tags) && Objects.equals(fields, that.fields); + } + + @Override + public int hashCode() { + return Objects.hash(name, ifNotExists, timestampColumn, shards, retentionMs, compactionIntervalMs, tags, fields); + } + + public static class ColumnDef { + public Identifier name; + public Identifier type; + + public ColumnDef(final Identifier name, final Identifier type) { + this.name = name; + this.type = type; + } + } +} diff --git a/engine/src/main/java/com/arcadedb/query/sql/parser/DropContinuousAggregateStatement.java b/engine/src/main/java/com/arcadedb/query/sql/parser/DropContinuousAggregateStatement.java new file mode 100644 index 0000000000..4a32c73183 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/query/sql/parser/DropContinuousAggregateStatement.java @@ -0,0 +1,69 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.query.sql.parser; + +import com.arcadedb.database.Database; +import com.arcadedb.exception.CommandExecutionException; +import com.arcadedb.query.sql.executor.CommandContext; +import com.arcadedb.query.sql.executor.InternalResultSet; +import com.arcadedb.query.sql.executor.ResultInternal; +import com.arcadedb.query.sql.executor.ResultSet; + +public class DropContinuousAggregateStatement extends DDLStatement { + public Identifier name; + public boolean ifExists = false; + + public DropContinuousAggregateStatement(final int id) { + super(id); + } + + @Override + public ResultSet executeDDL(final CommandContext context) { + final Database database = context.getDatabase(); + final String caName = name.getStringValue(); + + if (!database.getSchema().existsContinuousAggregate(caName)) { + if (ifExists) { + final InternalResultSet result = new InternalResultSet(); + final ResultInternal r = new ResultInternal(); + r.setProperty("operation", "drop continuous aggregate"); + r.setProperty("name", caName); + r.setProperty("dropped", false); + result.add(r); + return result; + } + throw new CommandExecutionException("Continuous aggregate '" + caName + "' does not exist"); + } + + database.getSchema().dropContinuousAggregate(caName); + + final InternalResultSet result = new InternalResultSet(); + final ResultInternal r = new ResultInternal(); + r.setProperty("operation", "drop continuous aggregate"); + r.setProperty("name", caName); + r.setProperty("dropped", true); + result.add(r); + return result; + } + + @Override + public String toString() { + return "DROP CONTINUOUS AGGREGATE " + (ifExists ? "IF EXISTS " : "") + name; + } +} diff --git a/engine/src/main/java/com/arcadedb/query/sql/parser/RefreshContinuousAggregateStatement.java b/engine/src/main/java/com/arcadedb/query/sql/parser/RefreshContinuousAggregateStatement.java new file mode 100644 index 0000000000..c7f3046c89 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/query/sql/parser/RefreshContinuousAggregateStatement.java @@ -0,0 +1,53 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.query.sql.parser; + +import com.arcadedb.database.Database; +import com.arcadedb.query.sql.executor.CommandContext; +import com.arcadedb.query.sql.executor.InternalResultSet; +import com.arcadedb.query.sql.executor.ResultInternal; +import com.arcadedb.query.sql.executor.ResultSet; + +public class RefreshContinuousAggregateStatement extends DDLStatement { + public Identifier name; + + public RefreshContinuousAggregateStatement(final int id) { + super(id); + } + + @Override + public ResultSet executeDDL(final CommandContext context) { + final Database database = context.getDatabase(); + final String caName = name.getStringValue(); + + database.getSchema().getContinuousAggregate(caName).refresh(); + + final InternalResultSet result = new InternalResultSet(); + final ResultInternal r = new ResultInternal(); + r.setProperty("operation", "refresh continuous aggregate"); + r.setProperty("name", caName); + result.add(r); + return result; + } + + @Override + public String toString() { + return "REFRESH CONTINUOUS AGGREGATE " + name; + } +} diff --git a/engine/src/main/java/com/arcadedb/schema/ContinuousAggregate.java b/engine/src/main/java/com/arcadedb/schema/ContinuousAggregate.java new file mode 100644 index 0000000000..92ecc462c6 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/schema/ContinuousAggregate.java @@ -0,0 +1,62 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.schema; + +import com.arcadedb.serializer.json.JSONObject; + +public interface ContinuousAggregate { + String getName(); + + String getQuery(); + + DocumentType getBackingType(); + + String getSourceTypeName(); + + String getStatus(); + + long getWatermarkTs(); + + long getBucketIntervalMs(); + + String getBucketColumn(); + + String getTimestampColumn(); + + long getLastRefreshTime(); + + void refresh(); + + void drop(); + + JSONObject toJSON(); + + // Runtime metrics (not persisted) + long getRefreshCount(); + + long getRefreshTotalTimeMs(); + + long getRefreshMinTimeMs(); + + long getRefreshMaxTimeMs(); + + long getErrorCount(); + + long getLastRefreshDurationMs(); +} diff --git a/engine/src/main/java/com/arcadedb/schema/ContinuousAggregateBuilder.java b/engine/src/main/java/com/arcadedb/schema/ContinuousAggregateBuilder.java new file mode 100644 index 0000000000..e5ae933a64 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/schema/ContinuousAggregateBuilder.java @@ -0,0 +1,224 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.schema; + +import com.arcadedb.database.DatabaseInternal; +import com.arcadedb.exception.SchemaException; +import com.arcadedb.log.LogManager; +import com.arcadedb.query.sql.parser.FromClause; +import com.arcadedb.query.sql.parser.FromItem; +import com.arcadedb.query.sql.parser.SelectStatement; +import com.arcadedb.query.sql.parser.Statement; + +import java.util.logging.Level; +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +public class ContinuousAggregateBuilder { + private static final Pattern TIME_BUCKET_PATTERN = Pattern.compile( + "ts\\.timeBucket\\s*\\(\\s*'([^']+)'\\s*,\\s*(\\w+)\\s*\\)", + Pattern.CASE_INSENSITIVE); + + private static final Pattern ALIAS_PATTERN = Pattern.compile( + "ts\\.timeBucket\\s*\\([^)]+\\)\\s+(?:AS\\s+)?(\\w+)", + Pattern.CASE_INSENSITIVE); + + private final DatabaseInternal database; + private String name; + private String query; + private boolean ifNotExists = false; + + public ContinuousAggregateBuilder(final DatabaseInternal database) { + this.database = database; + } + + public ContinuousAggregateBuilder withName(final String name) { + this.name = name; + return this; + } + + public ContinuousAggregateBuilder withQuery(final String query) { + this.query = query; + return this; + } + + public ContinuousAggregateBuilder withIgnoreIfExists(final boolean ignore) { + this.ifNotExists = ignore; + return this; + } + + public ContinuousAggregate create() { + if (name == null || name.isEmpty()) + throw new IllegalArgumentException("Continuous aggregate name is required"); + if (name.contains("`")) + throw new IllegalArgumentException("Continuous aggregate name must not contain backtick characters"); + if (query == null || query.isEmpty()) + throw new IllegalArgumentException("Continuous aggregate query is required"); + + final LocalSchema schema = (LocalSchema) database.getSchema(); + + if (schema.existsContinuousAggregate(name)) { + if (ifNotExists) + return schema.getContinuousAggregate(name); + throw new SchemaException("Continuous aggregate '" + name + "' already exists"); + } + + if (schema.existsType(name)) + throw new SchemaException("Cannot create continuous aggregate '" + name + + "': a type with the same name already exists"); + + // Parse and validate the query + final String sourceTypeName = extractSourceType(query); + if (sourceTypeName == null) + throw new SchemaException("Continuous aggregate query must SELECT FROM a single type"); + + if (!schema.existsType(sourceTypeName)) + throw new SchemaException("Source type '" + sourceTypeName + "' does not exist"); + + final DocumentType sourceType = schema.getType(sourceTypeName); + if (!(sourceType instanceof LocalTimeSeriesType)) + throw new SchemaException("Source type '" + sourceTypeName + "' is not a TimeSeries type. " + + "Continuous aggregates can only be created on TimeSeries types."); + + // Extract ts.timeBucket parameters + final Matcher bucketMatcher = TIME_BUCKET_PATTERN.matcher(query); + if (!bucketMatcher.find()) + throw new SchemaException("Continuous aggregate query must include ts.timeBucket(interval, timestamp) " + + "in the projection. Example: SELECT ts.timeBucket('1h', ts) AS hour, ..."); + + final String intervalStr = bucketMatcher.group(1); + final String tsColumnInQuery = bucketMatcher.group(2); + final long bucketIntervalMs = parseInterval(intervalStr); + + // Validate that the timestamp column name doesn't contain backticks (SQL injection prevention) + if (tsColumnInQuery.contains("`")) + throw new SchemaException("Timestamp column name must not contain backtick characters: '" + tsColumnInQuery + "'"); + + // Extract the alias for the time bucket column + final Matcher aliasMatcher = ALIAS_PATTERN.matcher(query); + String bucketAlias = null; + if (aliasMatcher.find()) + bucketAlias = aliasMatcher.group(1); + if (bucketAlias == null) + throw new SchemaException("The ts.timeBucket() projection must have an alias. " + + "Example: ts.timeBucket('1h', ts) AS hour"); + if (bucketAlias.contains("`")) + throw new SchemaException("Bucket alias must not contain backtick characters: '" + bucketAlias + "'"); + + // Validate GROUP BY is present + if (!query.toUpperCase().contains("GROUP BY")) + throw new SchemaException("Continuous aggregate query must include a GROUP BY clause"); + + // Validate query structure: buildFilteredQuery uses string manipulation so it cannot + // handle subqueries, CTEs, or inline comments. Reject unsupported patterns at creation time. + validateQueryStructure(query); + + final String finalBucketAlias = bucketAlias; + final String finalTsColumn = tsColumnInQuery; + + return schema.recordFileChanges(() -> { + // Create backing document type + schema.buildDocumentType().withName(name).create(); + + // Create and register the continuous aggregate + final ContinuousAggregateImpl ca = new ContinuousAggregateImpl( + database, name, query, name, sourceTypeName, + bucketIntervalMs, finalBucketAlias, finalTsColumn); + ca.setStatus(MaterializedViewStatus.BUILDING); + schema.continuousAggregates.put(name, ca); + schema.saveConfiguration(); + + // Perform initial full refresh (watermark=0 means all data) + try { + ContinuousAggregateRefresher.incrementalRefresh(database, ca); + } catch (final Exception e) { + schema.continuousAggregates.remove(name); + try { + schema.dropType(name); + } catch (final Exception dropEx) { + LogManager.instance().log(ContinuousAggregateBuilder.class, Level.WARNING, + "Failed to clean up backing type '%s' after continuous aggregate creation failure: %s", + dropEx, name, dropEx.getMessage()); + } + throw e; + } + schema.saveConfiguration(); + + return ca; + }); + } + + private static void validateQueryStructure(final String query) { + final String upper = query.toUpperCase().trim(); + // Reject CTEs (WITH ... AS) + if (upper.startsWith("WITH ")) + throw new SchemaException("Continuous aggregate queries must not use CTEs (WITH ... AS). " + + "Use a simple SELECT ... FROM ... GROUP BY query."); + // Reject subqueries in FROM clause + if (upper.contains("(SELECT ") || upper.contains("( SELECT ")) + throw new SchemaException("Continuous aggregate queries must not contain subqueries. " + + "Use a simple SELECT ... FROM ... GROUP BY query."); + // Reject UNION/INTERSECT/EXCEPT + for (final String keyword : new String[] { " UNION ", " INTERSECT ", " EXCEPT " }) + if (upper.contains(keyword)) + throw new SchemaException("Continuous aggregate queries must not use " + keyword.trim() + ". " + + "Use a simple SELECT ... FROM ... GROUP BY query."); + } + + private String extractSourceType(final String sql) { + final Statement parsed = database.getStatementCache().get(sql); + if (parsed instanceof SelectStatement select) { + final FromClause from = select.getTarget(); + if (from != null) { + final FromItem item = from.getItem(); + if (item != null && item.getIdentifier() != null) + return item.getIdentifier().getStringValue(); + } + } + return null; + } + + static long parseInterval(final String interval) { + if (interval == null || interval.isEmpty()) + throw new IllegalArgumentException("Invalid interval: empty"); + + int unitStart = 0; + for (int i = 0; i < interval.length(); i++) { + if (!Character.isDigit(interval.charAt(i))) { + unitStart = i; + break; + } + } + + if (unitStart == 0) + throw new IllegalArgumentException("Invalid interval: '" + interval + "'"); + + final long value = Long.parseLong(interval.substring(0, unitStart)); + final String unit = interval.substring(unitStart).trim().toLowerCase(); + + return switch (unit) { + case "s" -> value * 1000L; + case "m" -> value * 60_000L; + case "h" -> value * 3_600_000L; + case "d" -> value * 86_400_000L; + case "w" -> value * 7 * 86_400_000L; + default -> throw new IllegalArgumentException("Unknown interval unit: '" + unit + "'. Supported: s, m, h, d, w"); + }; + } +} diff --git a/engine/src/main/java/com/arcadedb/schema/ContinuousAggregateImpl.java b/engine/src/main/java/com/arcadedb/schema/ContinuousAggregateImpl.java new file mode 100644 index 0000000000..d951ce0ac9 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/schema/ContinuousAggregateImpl.java @@ -0,0 +1,262 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.schema; + +import com.arcadedb.database.Database; +import com.arcadedb.serializer.json.JSONObject; + +import java.util.Objects; +import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicLong; + +public class ContinuousAggregateImpl implements ContinuousAggregate { + private final Database database; + private final String name; + private final String query; + private final String backingTypeName; + private final String sourceTypeName; + private final long bucketIntervalMs; + private final String bucketColumn; + private final String timestampColumn; + private volatile long watermarkTs; + private volatile long lastRefreshTime; + private volatile MaterializedViewStatus status; + private final AtomicBoolean refreshInProgress = new AtomicBoolean(false); + + // Runtime metrics (not persisted) + private final AtomicLong refreshCount = new AtomicLong(0); + private final AtomicLong refreshTotalTimeMs = new AtomicLong(0); + private final AtomicLong refreshMinTimeMs = new AtomicLong(Long.MAX_VALUE); + private final AtomicLong refreshMaxTimeMs = new AtomicLong(0); + private final AtomicLong errorCount = new AtomicLong(0); + private final AtomicLong lastRefreshDurationMs = new AtomicLong(0); + + public ContinuousAggregateImpl(final Database database, final String name, final String query, + final String backingTypeName, final String sourceTypeName, + final long bucketIntervalMs, final String bucketColumn, final String timestampColumn) { + this.database = database; + this.name = name; + this.query = query; + this.backingTypeName = backingTypeName; + this.sourceTypeName = sourceTypeName; + this.bucketIntervalMs = bucketIntervalMs; + this.bucketColumn = bucketColumn; + this.timestampColumn = timestampColumn; + this.watermarkTs = 0; + this.lastRefreshTime = 0; + this.status = MaterializedViewStatus.VALID; + } + + @Override + public String getName() { + return name; + } + + @Override + public String getQuery() { + return query; + } + + @Override + public DocumentType getBackingType() { + return database.getSchema().getType(backingTypeName); + } + + public String getBackingTypeName() { + return backingTypeName; + } + + @Override + public String getSourceTypeName() { + return sourceTypeName; + } + + @Override + public String getStatus() { + return status.name(); + } + + @Override + public long getWatermarkTs() { + return watermarkTs; + } + + @Override + public long getBucketIntervalMs() { + return bucketIntervalMs; + } + + @Override + public String getBucketColumn() { + return bucketColumn; + } + + @Override + public String getTimestampColumn() { + return timestampColumn; + } + + @Override + public long getLastRefreshTime() { + return lastRefreshTime; + } + + public void setStatus(final MaterializedViewStatus status) { + this.status = status; + } + + public void setWatermarkTs(final long watermarkTs) { + this.watermarkTs = watermarkTs; + } + + public void setLastRefreshTime(final long lastRefreshTime) { + this.lastRefreshTime = lastRefreshTime; + } + + public void updateLastRefreshTime() { + this.lastRefreshTime = System.currentTimeMillis(); + } + + public boolean tryBeginRefresh() { + return refreshInProgress.compareAndSet(false, true); + } + + public void endRefresh() { + refreshInProgress.set(false); + } + + @Override + public long getRefreshCount() { + return refreshCount.get(); + } + + @Override + public long getRefreshTotalTimeMs() { + return refreshTotalTimeMs.get(); + } + + @Override + public long getRefreshMinTimeMs() { + final long v = refreshMinTimeMs.get(); + return v == Long.MAX_VALUE ? 0 : v; + } + + @Override + public long getRefreshMaxTimeMs() { + return refreshMaxTimeMs.get(); + } + + @Override + public long getErrorCount() { + return errorCount.get(); + } + + @Override + public long getLastRefreshDurationMs() { + return lastRefreshDurationMs.get(); + } + + public void recordRefreshSuccess(final long durationMs) { + refreshCount.incrementAndGet(); + refreshTotalTimeMs.addAndGet(durationMs); + lastRefreshDurationMs.set(durationMs); + long prev; + do { + prev = refreshMinTimeMs.get(); + if (durationMs >= prev) + break; + } while (!refreshMinTimeMs.compareAndSet(prev, durationMs)); + do { + prev = refreshMaxTimeMs.get(); + if (durationMs <= prev) + break; + } while (!refreshMaxTimeMs.compareAndSet(prev, durationMs)); + } + + public void recordRefreshError() { + errorCount.incrementAndGet(); + } + + @Override + public void refresh() { + ContinuousAggregateRefresher.incrementalRefresh(database, this); + } + + @Override + public void drop() { + database.getSchema().dropContinuousAggregate(name); + } + + @Override + public JSONObject toJSON() { + final JSONObject json = new JSONObject(); + json.put("name", name); + json.put("query", query); + json.put("backingType", backingTypeName); + json.put("sourceType", sourceTypeName); + json.put("bucketIntervalMs", bucketIntervalMs); + json.put("bucketColumn", bucketColumn); + json.put("timestampColumn", timestampColumn); + json.put("watermarkTs", watermarkTs); + json.put("lastRefreshTime", lastRefreshTime); + json.put("status", status.name()); + return json; + } + + public static ContinuousAggregateImpl fromJSON(final Database database, final JSONObject json) { + final String loadedName = json.getString("name"); + if (loadedName != null && loadedName.contains("`")) + throw new IllegalArgumentException("Continuous aggregate name loaded from schema contains illegal backtick character: " + loadedName); + + final ContinuousAggregateImpl ca = new ContinuousAggregateImpl( + database, + loadedName, + json.getString("query"), + json.getString("backingType"), + json.getString("sourceType"), + json.getLong("bucketIntervalMs", 0), + json.getString("bucketColumn"), + json.getString("timestampColumn")); + ca.watermarkTs = json.getLong("watermarkTs", 0); + ca.lastRefreshTime = json.getLong("lastRefreshTime", 0); + ca.status = MaterializedViewStatus.valueOf(json.getString("status", "VALID")); + return ca; + } + + @Override + public boolean equals(final Object o) { + if (this == o) + return true; + if (o == null || getClass() != o.getClass()) + return false; + final ContinuousAggregateImpl that = (ContinuousAggregateImpl) o; + return Objects.equals(name, that.name); + } + + @Override + public int hashCode() { + return Objects.hash(name); + } + + @Override + public String toString() { + return "ContinuousAggregate{name='" + name + "', status=" + status + + ", watermarkTs=" + watermarkTs + ", bucketColumn='" + bucketColumn + "'}"; + } +} diff --git a/engine/src/main/java/com/arcadedb/schema/ContinuousAggregateRefresher.java b/engine/src/main/java/com/arcadedb/schema/ContinuousAggregateRefresher.java new file mode 100644 index 0000000000..2e82d509df --- /dev/null +++ b/engine/src/main/java/com/arcadedb/schema/ContinuousAggregateRefresher.java @@ -0,0 +1,253 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.schema; + +import com.arcadedb.database.Database; +import com.arcadedb.database.MutableDocument; +import com.arcadedb.log.LogManager; +import com.arcadedb.query.sql.executor.Result; +import com.arcadedb.query.sql.executor.ResultSet; + +import java.util.Date; +import java.util.Locale; +import java.util.logging.Level; + +public class ContinuousAggregateRefresher { + + public static void incrementalRefresh(final Database database, final ContinuousAggregateImpl ca) { + if (!ca.tryBeginRefresh()) { + LogManager.instance().log(ContinuousAggregateRefresher.class, Level.FINE, + "Skipping concurrent refresh for continuous aggregate '%s' — already in progress", null, ca.getName()); + return; + } + ca.setStatus(MaterializedViewStatus.BUILDING); + final long startNs = System.nanoTime(); + try { + final String backingTypeName = ca.getBackingTypeName(); + final String bucketColumn = ca.getBucketColumn(); + final long watermark = ca.getWatermarkTs(); + + // Validate interpolated names to prevent backtick injection + if (!SAFE_COLUMN_NAME.matcher(backingTypeName).matches()) + throw new IllegalArgumentException("Unsafe backing type name: '" + backingTypeName + "'"); + if (!SAFE_COLUMN_NAME.matcher(bucketColumn).matches()) + throw new IllegalArgumentException("Unsafe bucket column name: '" + bucketColumn + "'"); + + database.transaction(() -> { + // Delete rows in the current (possibly incomplete) bucket and all newer buckets + if (watermark > 0) + database.command("sql", "DELETE FROM `" + backingTypeName + "` WHERE `" + bucketColumn + "` >= ?", + new Date(watermark)); + + // Build the filtered query: append WHERE clause with watermark filter on the source timestamp + final String filteredQuery = buildFilteredQuery(ca, watermark); + + // Execute and insert results + long maxBucketTs = watermark; + try (final ResultSet rs = database.query("sql", filteredQuery)) { + while (rs.hasNext()) { + final Result result = rs.next(); + final MutableDocument doc = database.newDocument(backingTypeName); + for (final String prop : result.getPropertyNames()) { + if (!prop.startsWith("@")) + doc.set(prop, result.getProperty(prop)); + } + doc.save(); + + // Track maximum bucket timestamp for advancing watermark + final Object bucketVal = result.getProperty(bucketColumn); + if (bucketVal != null) { + final long bucketMs = toEpochMs(bucketVal); + if (bucketMs > maxBucketTs) + maxBucketTs = bucketMs; + } + } + } + + // Advance watermark to the max bucket boundary found + if (maxBucketTs > watermark) + ca.setWatermarkTs(maxBucketTs); + }); + + final long durationMs = (System.nanoTime() - startNs) / 1_000_000; + ca.recordRefreshSuccess(durationMs); + ca.updateLastRefreshTime(); + ca.setStatus(MaterializedViewStatus.VALID); + + // Persist updated watermark only if it actually advanced. + // If saveConfiguration fails, revert the in-memory watermark to the original value + // so the next refresh re-processes the same window (delete-first design makes it safe). + if (ca.getWatermarkTs() > watermark) { + final LocalSchema schema = (LocalSchema) database.getSchema(); + try { + schema.saveConfiguration(); + } catch (final Exception saveEx) { + ca.setWatermarkTs(watermark); + throw saveEx; + } + } + + } catch (final Exception e) { + ca.recordRefreshError(); + ca.setStatus(MaterializedViewStatus.ERROR); + LogManager.instance().log(ContinuousAggregateRefresher.class, Level.SEVERE, + "Error refreshing continuous aggregate '%s': %s", e, ca.getName(), e.getMessage()); + throw e; + } finally { + ca.endRefresh(); + } + } + + // Allows letters, digits, and underscores only — consistent with ArcadeDB identifier rules. + // Backtick, dot, hyphen, and other injection-enabling characters are excluded. + private static final java.util.regex.Pattern SAFE_COLUMN_NAME = java.util.regex.Pattern.compile("[A-Za-z0-9_]+"); + + static String buildFilteredQuery(final ContinuousAggregateImpl ca, final long watermark) { + if (watermark <= 0) + return ca.getQuery(); + + final String query = ca.getQuery(); + final String tsColumn = ca.getTimestampColumn(); + + // Validate column name to prevent backtick injection + if (!SAFE_COLUMN_NAME.matcher(tsColumn).matches()) + throw new IllegalArgumentException("Unsafe timestamp column name: '" + tsColumn + "'"); + + // Find WHERE clause position at the outermost level (case-insensitive). + // Note: CTEs and subqueries with their own WHERE clauses are not supported + // in continuous-aggregate queries. + final String upperQuery = query.toUpperCase(Locale.ROOT); + final int whereIdx = findWhereIndex(upperQuery); + + if (whereIdx >= 0) { + // Insert the watermark filter right after WHERE + final String before = query.substring(0, whereIdx + 5); // "WHERE" is 5 chars + final String after = query.substring(whereIdx + 5); + return before + " `" + tsColumn + "` >= " + watermark + " AND " + after.stripLeading(); + } else { + // No WHERE clause — insert before GROUP BY, ORDER BY, LIMIT, or at end + int insertIdx = findKeywordIndex(upperQuery, "GROUP BY"); + if (insertIdx < 0) + insertIdx = findKeywordIndex(upperQuery, "ORDER BY"); + if (insertIdx < 0) + insertIdx = findKeywordIndex(upperQuery, "LIMIT"); + if (insertIdx >= 0) { + final String before = query.substring(0, insertIdx); + final String after = query.substring(insertIdx); + return before + "WHERE `" + tsColumn + "` >= " + watermark + " " + after; + } + return query + " WHERE `" + tsColumn + "` >= " + watermark; + } + } + + private static int findWhereIndex(final String upperQuery) { + // Find standalone WHERE keyword at the outermost nesting level (depth 0), + // skipping over string literals (single or double quoted), block comments (/* */), + // line comments (--), and parenthesized subqueries so that WHERE keywords inside + // them are not mistaken for the top-level WHERE. + // E.g.: SELECT func('(foo)') FROM t WHERE ts > 0 + // SELECT /* WHERE not here */ * FROM t WHERE ts > 0 + int depth = 0; + int idx = 0; + final int len = upperQuery.length(); + while (idx < len) { + final char ch = upperQuery.charAt(idx); + // Skip over block comments: /* ... */ + if (ch == '/' && idx + 1 < len && upperQuery.charAt(idx + 1) == '*') { + idx += 2; + while (idx + 1 < len && !(upperQuery.charAt(idx) == '*' && upperQuery.charAt(idx + 1) == '/')) + idx++; + idx += 2; // skip closing */ + continue; + } + // Skip over line comments: -- ... \n + if (ch == '-' && idx + 1 < len && upperQuery.charAt(idx + 1) == '-') { + idx += 2; + while (idx < len && upperQuery.charAt(idx) != '\n') + idx++; + continue; + } + // Skip over quoted string literals to avoid counting parens inside them + if (ch == '\'' || ch == '"') { + final char quote = ch; + idx++; + while (idx < len) { + final char c2 = upperQuery.charAt(idx); + idx++; + if (c2 == '\\') { + idx++; // skip escaped character + } else if (c2 == quote) { + break; + } + } + continue; + } + if (ch == '(') { + depth++; + idx++; + continue; + } + if (ch == ')') { + depth--; + idx++; + continue; + } + if (depth > 0) { + idx++; + continue; + } + if (ch == 'W' && upperQuery.startsWith("WHERE", idx)) { + final boolean leftBound = idx == 0 || !Character.isLetterOrDigit(upperQuery.charAt(idx - 1)); + final boolean rightBound = idx + 5 >= len || !Character.isLetterOrDigit(upperQuery.charAt(idx + 5)); + if (leftBound && rightBound) + return idx; + idx += 5; + continue; + } + idx++; + } + return -1; + } + + private static int findKeywordIndex(final String upperQuery, final String keyword) { + int idx = 0; + while (idx < upperQuery.length()) { + final int found = upperQuery.indexOf(keyword, idx); + if (found < 0) + return -1; + final boolean leftBound = found == 0 || !Character.isLetterOrDigit(upperQuery.charAt(found - 1)); + final boolean rightBound = found + keyword.length() >= upperQuery.length() + || !Character.isLetterOrDigit(upperQuery.charAt(found + keyword.length())); + if (leftBound && rightBound) + return found; + idx = found + keyword.length(); + } + return -1; + } + + private static long toEpochMs(final Object value) { + if (value instanceof Date d) + return d.getTime(); + if (value instanceof Long l) + return l; + if (value instanceof Number n) + return n.longValue(); + return 0; + } +} diff --git a/engine/src/main/java/com/arcadedb/schema/LocalDocumentType.java b/engine/src/main/java/com/arcadedb/schema/LocalDocumentType.java index 7d8d8071f4..307eb29c51 100644 --- a/engine/src/main/java/com/arcadedb/schema/LocalDocumentType.java +++ b/engine/src/main/java/com/arcadedb/schema/LocalDocumentType.java @@ -1051,7 +1051,9 @@ public JSONObject toJSON() { final JSONObject type = new JSONObject(); final String kind; - if (this instanceof LocalVertexType) + if (this instanceof LocalTimeSeriesType) + kind = "t"; + else if (this instanceof LocalVertexType) kind = "v"; else if (this instanceof LocalEdgeType) kind = "e"; diff --git a/engine/src/main/java/com/arcadedb/schema/LocalSchema.java b/engine/src/main/java/com/arcadedb/schema/LocalSchema.java index 32701082fb..a9ed55dcef 100644 --- a/engine/src/main/java/com/arcadedb/schema/LocalSchema.java +++ b/engine/src/main/java/com/arcadedb/schema/LocalSchema.java @@ -35,6 +35,8 @@ import com.arcadedb.engine.ComponentFile; import com.arcadedb.engine.Dictionary; import com.arcadedb.engine.LocalBucket; +import com.arcadedb.engine.timeseries.TimeSeriesBucket; +import com.arcadedb.engine.timeseries.TimeSeriesMaintenanceScheduler; import com.arcadedb.event.*; import com.arcadedb.exception.ConfigurationException; import com.arcadedb.exception.DatabaseMetadataException; @@ -93,6 +95,7 @@ public class LocalSchema implements Schema { protected final Map indexMap = new HashMap<>(); protected final Map triggers = new HashMap<>(); protected final Map materializedViews = new LinkedHashMap<>(); + protected final Map continuousAggregates = new LinkedHashMap<>(); private final Map triggerAdapters = new HashMap<>(); private final String databasePath; private final File configurationFile; @@ -110,6 +113,7 @@ public class LocalSchema implements Schema { private final Map functionLibraries = new ConcurrentHashMap<>(); private final Map migratedFileIds = new ConcurrentHashMap<>(); private MaterializedViewScheduler materializedViewScheduler; + private TimeSeriesMaintenanceScheduler timeSeriesMaintenanceScheduler; public LocalSchema(final DatabaseInternal database, final String databasePath, final SecurityManager security) { this.database = database; @@ -128,6 +132,7 @@ public LocalSchema(final DatabaseInternal database, final String databasePath, f componentFactory.registerComponent(LSMTreeIndexCompacted.NOTUNIQUE_INDEX_EXT, new LSMTreeIndex.PaginatedComponentFactoryHandlerNotUnique()); componentFactory.registerComponent(LSMVectorIndex.FILE_EXT, new LSMVectorIndex.PaginatedComponentFactoryHandlerUnique()); + componentFactory.registerComponent(TimeSeriesBucket.BUCKET_EXT, new TimeSeriesBucket.PaginatedComponentFactoryHandler()); // Note: LSMVectorIndexGraphFile is NOT registered here - it's a sub-component discovered by its parent LSMVectorIndex indexFactory.register(INDEX_TYPE.LSM_TREE.name(), new LSMTreeIndex.LSMTreeIndexFactoryHandler()); @@ -266,6 +271,13 @@ public Component getFileByIdIfExists(final int id) { return files.get(id); } + public Component getFileByName(final String name) { + for (final Component f : files) + if (f != null && name.equals(f.getName())) + return f; + return null; + } + public void removeFile(final int fileId) { if (fileId >= files.size()) return; @@ -655,6 +667,48 @@ public MaterializedViewBuilder buildMaterializedView() { return new MaterializedViewBuilder((DatabaseInternal) database); } + // -- Continuous Aggregate management -- + + @Override + public synchronized boolean existsContinuousAggregate(final String name) { + return continuousAggregates.containsKey(name); + } + + @Override + public synchronized ContinuousAggregate getContinuousAggregate(final String name) { + final ContinuousAggregateImpl ca = continuousAggregates.get(name); + if (ca == null) + throw new SchemaException("Continuous aggregate '" + name + "' not found"); + return ca; + } + + @Override + public synchronized ContinuousAggregate[] getContinuousAggregates() { + return continuousAggregates.values().toArray(new ContinuousAggregate[0]); + } + + @Override + public synchronized void dropContinuousAggregate(final String name) { + final ContinuousAggregateImpl ca = continuousAggregates.get(name); + if (ca == null) + throw new SchemaException("Continuous aggregate '" + name + "' not found"); + + recordFileChanges(() -> { + continuousAggregates.remove(name); + + if (existsType(ca.getBackingTypeName())) + dropType(ca.getBackingTypeName()); + + saveConfiguration(); + return null; + }); + } + + @Override + public ContinuousAggregateBuilder buildContinuousAggregate() { + return new ContinuousAggregateBuilder((DatabaseInternal) database); + } + /** * Register a trigger as an event listener on the appropriate type. */ @@ -868,9 +922,19 @@ public void close() { materializedViewScheduler = null; } + if (timeSeriesMaintenanceScheduler != null) { + timeSeriesMaintenanceScheduler.shutdown(); + timeSeriesMaintenanceScheduler = null; + } + writeStatisticsFile(); materializedViews.clear(); + continuousAggregates.clear(); files.clear(); + for (final DocumentType type : types.values()) { + if (type instanceof LocalTimeSeriesType tsType) + tsType.close(); + } types.clear(); bucketMap.clear(); indexMap.clear(); @@ -884,6 +948,12 @@ public synchronized MaterializedViewScheduler getMaterializedViewScheduler() { return materializedViewScheduler; } + public synchronized TimeSeriesMaintenanceScheduler getTimeSeriesMaintenanceScheduler() { + if (timeSeriesMaintenanceScheduler == null) + timeSeriesMaintenanceScheduler = new TimeSeriesMaintenanceScheduler(); + return timeSeriesMaintenanceScheduler; + } + private void readStatisticsFile() { try { boolean legacyFile = false; @@ -989,7 +1059,7 @@ public boolean existsType(final String typeName) { public void dropType(final String typeName) { database.checkPermissionsOnDatabase(SecurityDatabaseUser.DATABASE_ACCESS.UPDATE_SCHEMA); - // Prevent dropping a type that is a backing type or source type for a materialized view + // Prevent dropping a type that is a backing type or source type for a materialized view or continuous aggregate synchronized (this) { for (final MaterializedViewImpl view : materializedViews.values()) { if (view.getBackingTypeName().equals(typeName)) @@ -1001,6 +1071,16 @@ public void dropType(final String typeName) { "Cannot drop type '" + typeName + "' because it is a source type for materialized view '" + view.getName() + "'. " + "Drop the materialized view first with: DROP MATERIALIZED VIEW " + view.getName()); } + for (final ContinuousAggregateImpl ca : continuousAggregates.values()) { + if (ca.getBackingTypeName().equals(typeName)) + throw new SchemaException( + "Cannot drop type '" + typeName + "' because it is the backing type for continuous aggregate '" + ca.getName() + "'. " + + "Drop the continuous aggregate first with: DROP CONTINUOUS AGGREGATE " + ca.getName()); + if (ca.getSourceTypeName().equals(typeName)) + throw new SchemaException( + "Cannot drop type '" + typeName + "' because it is the source type for continuous aggregate '" + ca.getName() + "'. " + + "Drop the continuous aggregate first with: DROP CONTINUOUS AGGREGATE " + ca.getName()); + } } recordFileChanges(() -> { @@ -1037,6 +1117,9 @@ public void dropType(final String typeName) { dropBucket(b.getName()); } + if (type instanceof LocalTimeSeriesType tsType) + tsType.close(); + if (types.remove(typeName) == null) throw new SchemaException("Type '" + typeName + "' not found"); } finally { @@ -1228,7 +1311,22 @@ public TypeBuilder buildEdgeType() { return new TypeBuilder<>(database, EdgeType.class); } + @Override + public TimeSeriesTypeBuilder buildTimeSeriesType() { + return new TimeSeriesTypeBuilder(database); + } + protected synchronized void readConfiguration() { + for (final DocumentType type : types.values()) { + if (type instanceof LocalTimeSeriesType tsType) { + try { + tsType.close(); + } catch (final Exception e) { + LogManager.instance().log(this, Level.WARNING, "Error closing TimeSeries type '%s' during schema reload: %s", null, + tsType.getName(), e.getMessage()); + } + } + } types.clear(); loadInRamCompleted = false; @@ -1286,6 +1384,18 @@ protected synchronized void readConfiguration() { case "v" -> new LocalVertexType(this, typeName); case "e" -> new LocalEdgeType(this, typeName, !schemaType.has("bidirectional") || schemaType.getBoolean("bidirectional")); case "d" -> new LocalDocumentType(this, typeName); + case "t" -> { + final LocalTimeSeriesType tsType = new LocalTimeSeriesType(this, typeName); + tsType.fromJSON(schemaType); + try { + tsType.initEngine(); + } catch (final IOException e) { + throw new ConfigurationException("Error initializing TimeSeries engine for type '" + typeName + "'", e); + } + // Schedule automatic retention/downsampling if policies are defined + getTimeSeriesMaintenanceScheduler().schedule(database, tsType); + yield tsType; + } case null, default -> throw new ConfigurationException("Type '" + kind + "' is not supported"); }; @@ -1524,6 +1634,21 @@ protected synchronized void readConfiguration() { } } + // Load continuous aggregates + continuousAggregates.clear(); + if (root.has("continuousAggregates")) { + final JSONObject caJSON = root.getJSONObject("continuousAggregates"); + for (final String caName : caJSON.keySet()) { + final JSONObject caDef = caJSON.getJSONObject(caName); + final ContinuousAggregateImpl ca = ContinuousAggregateImpl.fromJSON(database, caDef); + continuousAggregates.put(caName, ca); + + // Crash recovery: if status is BUILDING, it was interrupted + if (MaterializedViewStatus.BUILDING.name().equals(ca.getStatus())) + ca.setStatus(MaterializedViewStatus.STALE); + } + } + } catch (final Exception e) { LogManager.instance().log(this, Level.SEVERE, "Error on loading schema. The schema will be reset", e); } finally { @@ -1592,9 +1717,19 @@ public synchronized JSONObject toJSON() { mvJSON.put(entry.getKey(), entry.getValue().toJSON()); root.put("materializedViews", mvJSON); + // Serialize continuous aggregates + final JSONObject caJSON = new JSONObject(); + for (final Map.Entry entry : continuousAggregates.entrySet()) + caJSON.put(entry.getKey(), entry.getValue().toJSON()); + root.put("continuousAggregates", caJSON); + return root; } + void registerType(final LocalDocumentType type) { + types.put(type.getName(), type); + } + public void registerFile(final Component file) { final int fileId = file.getFileId(); diff --git a/engine/src/main/java/com/arcadedb/schema/LocalTimeSeriesType.java b/engine/src/main/java/com/arcadedb/schema/LocalTimeSeriesType.java new file mode 100644 index 0000000000..cacb6169b7 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/schema/LocalTimeSeriesType.java @@ -0,0 +1,217 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.schema; + +import com.arcadedb.database.DatabaseInternal; +import com.arcadedb.engine.timeseries.ColumnDefinition; +import com.arcadedb.engine.timeseries.DownsamplingTier; +import com.arcadedb.engine.timeseries.TimeSeriesBucket; +import com.arcadedb.engine.timeseries.TimeSeriesEngine; +import com.arcadedb.engine.timeseries.TimeSeriesSealedStore; +import com.arcadedb.log.LogManager; +import com.arcadedb.serializer.json.JSONArray; +import com.arcadedb.serializer.json.JSONObject; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.logging.Level; + +/** + * Schema type for TimeSeries data. Extends LocalDocumentType and + * owns a TimeSeriesEngine for managing sharded time-series storage. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public class LocalTimeSeriesType extends LocalDocumentType { + + public static final String KIND_CODE = "t"; + + private String timestampColumn; + private int shardCount; + private long retentionMs; + private long compactionBucketIntervalMs; + private int sealedFormatVersion; + private int mutableFormatVersion; + private final List tsColumns = new ArrayList<>(); + private List downsamplingTiers = new ArrayList<>(); + private volatile TimeSeriesEngine engine; + + public LocalTimeSeriesType(final LocalSchema schema, final String name) { + super(schema, name); + } + + /** + * Initializes the TimeSeriesEngine. Called after all column definitions are set. + * Thread-safe: double-checked locking on the volatile {@code engine} field. + */ + public synchronized void initEngine() throws IOException { + if (engine != null) + return; + engine = new TimeSeriesEngine((DatabaseInternal) schema.getDatabase(), name, tsColumns, shardCount > 0 ? shardCount : 1, + compactionBucketIntervalMs); + } + + public TimeSeriesEngine getEngine() { + return engine; + } + + public void close() { + if (engine != null) { + try { + engine.close(); + } catch (final IOException e) { + LogManager.instance().log(this, Level.WARNING, "Error closing TimeSeriesEngine for type '%s': %s", e, name, e.getMessage()); + } + engine = null; + } + } + + public String getTimestampColumn() { + return timestampColumn; + } + + public void setTimestampColumn(final String timestampColumn) { + this.timestampColumn = timestampColumn; + } + + public int getShardCount() { + return shardCount; + } + + public void setShardCount(final int shardCount) { + this.shardCount = shardCount; + } + + public long getRetentionMs() { + return retentionMs; + } + + public void setRetentionMs(final long retentionMs) { + this.retentionMs = retentionMs; + } + + public long getCompactionBucketIntervalMs() { + return compactionBucketIntervalMs; + } + + public void setCompactionBucketIntervalMs(final long compactionBucketIntervalMs) { + this.compactionBucketIntervalMs = compactionBucketIntervalMs; + } + + public List getTsColumns() { + return tsColumns; + } + + public void addTsColumn(final ColumnDefinition column) { + tsColumns.add(column); + } + + public List getDownsamplingTiers() { + return downsamplingTiers; + } + + public void setDownsamplingTiers(final List tiers) { + this.downsamplingTiers = tiers != null ? new ArrayList<>(tiers) : new ArrayList<>(); + } + + @Override + public JSONObject toJSON() { + final JSONObject json = super.toJSON(); + // Override kind to "t" + json.put("type", KIND_CODE); + + // TimeSeries-specific fields + json.put("timestampColumn", timestampColumn); + json.put("shardCount", shardCount); + json.put("retentionMs", retentionMs); + if (compactionBucketIntervalMs > 0) + json.put("compactionBucketIntervalMs", compactionBucketIntervalMs); + json.put("sealedFormatVersion", TimeSeriesSealedStore.CURRENT_VERSION); + json.put("mutableFormatVersion", TimeSeriesBucket.CURRENT_VERSION); + + final JSONArray colArray = new JSONArray(); + for (final ColumnDefinition col : tsColumns) { + final JSONObject colJson = new JSONObject(); + colJson.put("name", col.getName()); + colJson.put("dataType", col.getDataType().name()); + colJson.put("role", col.getRole().name()); + colArray.put(colJson); + } + json.put("tsColumns", colArray); + + if (!downsamplingTiers.isEmpty()) { + final JSONArray tierArray = new JSONArray(); + for (final DownsamplingTier tier : downsamplingTiers) { + final JSONObject tierJson = new JSONObject(); + tierJson.put("afterMs", tier.afterMs()); + tierJson.put("granularityMs", tier.granularityMs()); + tierArray.put(tierJson); + } + json.put("downsamplingTiers", tierArray); + } + + return json; + } + + /** + * Restores TimeSeries-specific fields from schema JSON. + */ + public void fromJSON(final JSONObject json) { + timestampColumn = json.getString("timestampColumn", null); + shardCount = json.getInt("shardCount", 1); + retentionMs = json.getLong("retentionMs", 0L); + compactionBucketIntervalMs = json.getLong("compactionBucketIntervalMs", 0L); + sealedFormatVersion = json.getInt("sealedFormatVersion", 0); + if (sealedFormatVersion != TimeSeriesSealedStore.CURRENT_VERSION) + throw new IllegalStateException( + "Unsupported sealed store format version " + sealedFormatVersion + " (expected " + + TimeSeriesSealedStore.CURRENT_VERSION + ") for TimeSeries type '" + name + "'"); + mutableFormatVersion = json.getInt("mutableFormatVersion", 0); + if (mutableFormatVersion != TimeSeriesBucket.CURRENT_VERSION) + throw new IllegalStateException( + "Unsupported mutable bucket format version " + mutableFormatVersion + " (expected " + + TimeSeriesBucket.CURRENT_VERSION + ") for TimeSeries type '" + name + "'"); + + tsColumns.clear(); + final JSONArray colArray = json.getJSONArray("tsColumns", null); + if (colArray != null) { + for (int i = 0; i < colArray.length(); i++) { + final JSONObject colJson = colArray.getJSONObject(i); + tsColumns.add(new ColumnDefinition( + colJson.getString("name"), + Type.getTypeByName(colJson.getString("dataType")), + ColumnDefinition.ColumnRole.valueOf(colJson.getString("role")) + )); + } + } + + downsamplingTiers.clear(); + final JSONArray tierArray = json.getJSONArray("downsamplingTiers", null); + if (tierArray != null) { + for (int i = 0; i < tierArray.length(); i++) { + final JSONObject tierJson = tierArray.getJSONObject(i); + downsamplingTiers.add(new DownsamplingTier( + tierJson.getLong("afterMs"), + tierJson.getLong("granularityMs") + )); + } + } + } +} diff --git a/engine/src/main/java/com/arcadedb/schema/Schema.java b/engine/src/main/java/com/arcadedb/schema/Schema.java index 7f941952d3..b53cabe50b 100644 --- a/engine/src/main/java/com/arcadedb/schema/Schema.java +++ b/engine/src/main/java/com/arcadedb/schema/Schema.java @@ -201,12 +201,26 @@ Index createManualIndex(Schema.INDEX_TYPE indexType, boolean unique, String inde MaterializedViewBuilder buildMaterializedView(); + // -- Continuous Aggregate management -- + + boolean existsContinuousAggregate(String name); + + ContinuousAggregate getContinuousAggregate(String name); + + ContinuousAggregate[] getContinuousAggregates(); + + void dropContinuousAggregate(String name); + + ContinuousAggregateBuilder buildContinuousAggregate(); + TypeBuilder buildDocumentType(); TypeBuilder buildVertexType(); TypeBuilder buildEdgeType(); + TimeSeriesTypeBuilder buildTimeSeriesType(); + /** * Creates a new document type with the default settings of buckets. * This is the same as using `buildDocumentType().withName(typeName).create()`. diff --git a/engine/src/main/java/com/arcadedb/schema/TimeSeriesTypeBuilder.java b/engine/src/main/java/com/arcadedb/schema/TimeSeriesTypeBuilder.java new file mode 100644 index 0000000000..c53ec6d7c9 --- /dev/null +++ b/engine/src/main/java/com/arcadedb/schema/TimeSeriesTypeBuilder.java @@ -0,0 +1,134 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.schema; + +import com.arcadedb.GlobalConfiguration; +import com.arcadedb.database.DatabaseInternal; +import com.arcadedb.engine.timeseries.ColumnDefinition; +import com.arcadedb.engine.timeseries.DownsamplingTier; +import com.arcadedb.exception.SchemaException; + +import java.util.ArrayList; +import java.util.List; + +/** + * Fluent builder for creating TimeSeries types. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public class TimeSeriesTypeBuilder { + + private final DatabaseInternal database; + private String typeName; + private String timestampColumn; + private int shards = 0; // 0 = default (async worker threads) + private long retentionMs = 0; + private long compactionBucketIntervalMs = 0; + private List downsamplingTiers = new ArrayList<>(); + private final List columns = new ArrayList<>(); + + public TimeSeriesTypeBuilder(final DatabaseInternal database) { + this.database = database; + } + + public TimeSeriesTypeBuilder withName(final String name) { + this.typeName = name; + return this; + } + + public TimeSeriesTypeBuilder withTimestamp(final String name) { + this.timestampColumn = name; + this.columns.add(new ColumnDefinition(name, Type.LONG, ColumnDefinition.ColumnRole.TIMESTAMP)); + return this; + } + + public TimeSeriesTypeBuilder withTag(final String name, final Type type) { + this.columns.add(new ColumnDefinition(name, type, ColumnDefinition.ColumnRole.TAG)); + return this; + } + + public TimeSeriesTypeBuilder withField(final String name, final Type type) { + this.columns.add(new ColumnDefinition(name, type, ColumnDefinition.ColumnRole.FIELD)); + return this; + } + + public TimeSeriesTypeBuilder withShards(final int shards) { + this.shards = shards; + return this; + } + + public TimeSeriesTypeBuilder withRetention(final long retentionMs) { + this.retentionMs = retentionMs; + return this; + } + + public TimeSeriesTypeBuilder withCompactionBucketInterval(final long compactionBucketIntervalMs) { + this.compactionBucketIntervalMs = compactionBucketIntervalMs; + return this; + } + + public TimeSeriesTypeBuilder withDownsamplingTiers(final List tiers) { + this.downsamplingTiers = tiers != null ? new ArrayList<>(tiers) : new ArrayList<>(); + return this; + } + + public LocalTimeSeriesType create() { + if (typeName == null || typeName.isEmpty()) + throw new SchemaException("TimeSeries type name is required"); + if (timestampColumn == null) + throw new SchemaException("TimeSeries type requires a TIMESTAMP column"); + + final LocalSchema schema = (LocalSchema) database.getSchema(); + if (schema.existsType(typeName)) + throw new SchemaException("Type '" + typeName + "' already exists"); + + final LocalTimeSeriesType type = new LocalTimeSeriesType(schema, typeName); + type.setTimestampColumn(timestampColumn); + type.setShardCount(shards > 0 ? shards : database.getConfiguration().getValueAsInteger(GlobalConfiguration.ASYNC_WORKER_THREADS)); + type.setRetentionMs(retentionMs); + type.setCompactionBucketIntervalMs(compactionBucketIntervalMs); + type.setDownsamplingTiers(downsamplingTiers); + + for (final ColumnDefinition col : columns) + type.addTsColumn(col); + + // Register properties for each column + for (final ColumnDefinition col : columns) + type.createProperty(col.getName(), col.getDataType()); + + try { + database.begin(); + type.initEngine(); + database.commit(); + } catch (final Exception e) { + if (database.isTransactionActive()) + database.rollback(); + throw new SchemaException("Failed to initialize TimeSeries engine for type '" + typeName + "'", e); + } + + // Register the type with the schema only after successful engine initialization + schema.registerType(type); + + // Schedule automatic retention/downsampling if policies are defined + schema.getTimeSeriesMaintenanceScheduler().schedule(database, type); + + schema.saveConfiguration(); + return type; + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/PageManagerFlushQueueRaceTest.java b/engine/src/test/java/com/arcadedb/engine/PageManagerFlushQueueRaceTest.java new file mode 100644 index 0000000000..2bba02ad81 --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/PageManagerFlushQueueRaceTest.java @@ -0,0 +1,99 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine; + +import com.arcadedb.GlobalConfiguration; +import com.arcadedb.TestHelper; +import com.arcadedb.schema.LocalTimeSeriesType; +import org.junit.jupiter.api.Test; + +import java.util.concurrent.atomic.AtomicLong; + +import static org.assertj.core.api.Assertions.assertThat; + +/** + * Regression test for flush-queue race condition where pages polled from the queue + * but not yet flushed to disk become invisible to getMostRecentVersionOfPage(), + * causing spurious ConcurrentModificationException (MVCC conflicts). + *

+ * The fix ensures {@link PageManagerFlushThread#getCachedPageFromMutablePageInQueue} + * also checks the {@code nextPagesToFlush} entry (currently being flushed), and that + * the entry is published immediately after polling from the queue. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +class PageManagerFlushQueueRaceTest extends TestHelper { + + private static final int PARALLEL_LEVEL = 4; + private static final int BATCH_SIZE = 5_000; + private static final int TOTAL_BATCHES = 20; + private static final int NUM_SENSORS = 50; + + @Override + protected void beginTest() { + // Use a small page cache to force frequent evictions, which is what triggers + // the race condition (evicted pages must be found in the flush queue). + database.command("sql", + "CREATE TIMESERIES TYPE SensorData TIMESTAMP ts TAGS (sensor_id STRING) FIELDS (temperature DOUBLE) SHARDS " + + PARALLEL_LEVEL); + } + + @Test + void testAsyncAppendDoesNotCauseMVCCErrors() throws Exception { + final AtomicLong errors = new AtomicLong(0); + + database.async().setParallelLevel(PARALLEL_LEVEL); + database.async().setCommitEvery(5); + database.async().setBackPressure(50); + database.setReadYourWrites(false); + + database.async().onError(exception -> errors.incrementAndGet()); + + final long baseTimestamp = System.currentTimeMillis() - (long) TOTAL_BATCHES * BATCH_SIZE * 100; + + for (int batch = 0; batch < TOTAL_BATCHES; batch++) { + final long batchStart = baseTimestamp + (long) batch * BATCH_SIZE * 100; + final long[] timestamps = new long[BATCH_SIZE]; + final Object[] sensorIds = new Object[BATCH_SIZE]; + final Object[] temperatures = new Object[BATCH_SIZE]; + + for (int i = 0; i < BATCH_SIZE; i++) { + timestamps[i] = batchStart + i * 100L; + sensorIds[i] = "sensor_" + (i % NUM_SENSORS); + temperatures[i] = 20.0 + (Math.random() * 15.0); + } + + database.async().appendSamples("SensorData", timestamps, sensorIds, temperatures); + } + + database.async().waitCompletion(); + + assertThat(errors.get()).as("No MVCC errors should occur during async ingestion").isEqualTo(0); + + // Compact and verify data integrity + ((LocalTimeSeriesType) database.getSchema().getType("SensorData")).getEngine().compactAll(); + + final long expectedTotal = (long) TOTAL_BATCHES * BATCH_SIZE; + try (final var rs = database.query("sql", "SELECT count(*) AS cnt FROM SensorData")) { + assertThat(rs.hasNext()).isTrue(); + final long count = ((Number) rs.next().getProperty("cnt")).longValue(); + assertThat(count).as("All samples should be stored").isEqualTo(expectedTotal); + } + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/AggregationMetricsTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/AggregationMetricsTest.java new file mode 100644 index 0000000000..58a7ae4620 --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/AggregationMetricsTest.java @@ -0,0 +1,119 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.database.Database; +import com.arcadedb.database.DatabaseFactory; +import com.arcadedb.schema.LocalTimeSeriesType; +import com.arcadedb.utility.FileUtils; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; + +import java.io.File; +import java.util.List; + +import static org.assertj.core.api.Assertions.assertThat; + +/** + * Tests for {@link AggregationMetrics} instrumentation. + */ +class AggregationMetricsTest { + + private static final String DB_PATH = "target/databases/AggregationMetricsTest"; + private Database database; + + @BeforeEach + void setUp() { + FileUtils.deleteRecursively(new File(DB_PATH)); + database = new DatabaseFactory(DB_PATH).create(); + database.command("sql", + "CREATE TIMESERIES TYPE Sensor TIMESTAMP ts TAGS (id STRING) FIELDS (value DOUBLE) SHARDS 1"); + } + + @AfterEach + void tearDown() { + if (database != null && database.isOpen()) + database.close(); + FileUtils.deleteRecursively(new File(DB_PATH)); + } + + @Test + void metricsCountersArePopulated() throws Exception { + final TimeSeriesEngine engine = ((LocalTimeSeriesType) database.getSchema().getType("Sensor")).getEngine(); + + // Insert enough data to create sealed blocks + final int batchSize = 10_000; + final long baseTs = 1_000_000_000L; + final long[] timestamps = new long[batchSize]; + final Object[] ids = new Object[batchSize]; + final Object[] values = new Object[batchSize]; + for (int i = 0; i < batchSize; i++) { + timestamps[i] = baseTs + i * 100L; + ids[i] = "s1"; + values[i] = 10.0 + i; + } + + database.begin(); + engine.appendSamples(timestamps, ids, values); + database.commit(); + engine.compactAll(); + + // Run aggregation with metrics + final AggregationMetrics metrics = new AggregationMetrics(); + final MultiColumnAggregationResult result = engine.aggregateMulti( + Long.MIN_VALUE, Long.MAX_VALUE, + List.of(new MultiColumnAggregationRequest(2, AggregationType.AVG, "avg_val")), + 3_600_000L, null, metrics); + + // Verify counters are consistent + final int totalBlocks = metrics.getFastPathBlocks() + metrics.getSlowPathBlocks() + metrics.getSkippedBlocks(); + assertThat(totalBlocks).isGreaterThan(0); + assertThat(result.size()).isGreaterThan(0); + + // toString should contain readable output + final String str = metrics.toString(); + assertThat(str).contains("AggMetrics["); + assertThat(str).contains("io="); + assertThat(str).contains("fast="); + } + + @Test + void mergeFromCombinesCounters() { + final AggregationMetrics a = new AggregationMetrics(); + a.addIo(100); + a.addDecompTs(200); + a.addFastPathBlock(); + a.addSlowPathBlock(); + + final AggregationMetrics b = new AggregationMetrics(); + b.addIo(50); + b.addDecompVal(300); + b.addSkippedBlock(); + b.addSlowPathBlock(); + + a.mergeFrom(b); + assertThat(a.getIoNanos()).isEqualTo(150); + assertThat(a.getDecompTsNanos()).isEqualTo(200); + assertThat(a.getDecompValNanos()).isEqualTo(300); + assertThat(a.getFastPathBlocks()).isEqualTo(1); + assertThat(a.getSlowPathBlocks()).isEqualTo(2); + assertThat(a.getSkippedBlocks()).isEqualTo(1); + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/BucketAlignedCompactionTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/BucketAlignedCompactionTest.java new file mode 100644 index 0000000000..9c52d061ca --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/BucketAlignedCompactionTest.java @@ -0,0 +1,247 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.database.Database; +import com.arcadedb.database.DatabaseFactory; +import com.arcadedb.schema.LocalTimeSeriesType; +import com.arcadedb.utility.FileUtils; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; + +import java.io.File; +import java.util.List; + +import static org.assertj.core.api.Assertions.assertThat; +import static org.assertj.core.api.Assertions.within; + +/** + * Tests for bucket-aligned compaction in TimeSeries. + * When compactionBucketIntervalMs is set, sealed blocks are split at + * bucket boundaries so each block fits entirely within one time bucket, + * enabling 100% fast-path aggregation. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +class BucketAlignedCompactionTest { + + private static final String DB_PATH = "target/databases/BucketAlignedCompactionTest"; + private Database database; + + @BeforeEach + void setUp() { + FileUtils.deleteRecursively(new File(DB_PATH)); + database = new DatabaseFactory(DB_PATH).create(); + } + + @AfterEach + void tearDown() { + if (database != null && database.isOpen()) + database.close(); + FileUtils.deleteRecursively(new File(DB_PATH)); + } + + @Test + void testBucketAlignedCompactionProducesSingleBucketBlocks() throws Exception { + // Create type with 1-second compaction bucket interval + database.command("sql", + "CREATE TIMESERIES TYPE Sensor TIMESTAMP ts TAGS (id STRING) FIELDS (value DOUBLE) " + + "SHARDS 1 COMPACTION_INTERVAL 1 HOURS"); + + final TimeSeriesEngine engine = ((LocalTimeSeriesType) database.getSchema().getType("Sensor")).getEngine(); + + // Insert data spanning 3 hours (at 100ms intervals = 108,000 samples) + final int samplesPerHour = 36_000; // 1h / 100ms + final int totalSamples = samplesPerHour * 3; + final long baseTs = 0L; // start at epoch 0 for simplicity + + final long[] timestamps = new long[totalSamples]; + final Object[] ids = new Object[totalSamples]; + final Object[] values = new Object[totalSamples]; + for (int i = 0; i < totalSamples; i++) { + timestamps[i] = baseTs + i * 100L; + ids[i] = "s1"; + values[i] = 10.0 + (i % 100); + } + + database.begin(); + engine.appendSamples(timestamps, ids, values); + database.commit(); + + // Compact with bucket-aligned splitting + engine.compactAll(); + + // Verify: each sealed block should fit within one 1-hour bucket + final TimeSeriesShard shard = engine.getShard(0); + final TimeSeriesSealedStore store = shard.getSealedStore(); + final int blockCount = store.getBlockCount(); + + // With 3 hours of data and 1h buckets, we expect exactly 3 blocks + assertThat(blockCount).isEqualTo(3); + + // Verify each block's timestamp range fits within one hour bucket + for (int b = 0; b < blockCount; b++) { + final long blockMin = store.getBlockMinTimestamp(b); + final long blockMax = store.getBlockMaxTimestamp(b); + final long bucketOfMin = (blockMin / 3_600_000L) * 3_600_000L; + final long bucketOfMax = (blockMax / 3_600_000L) * 3_600_000L; + assertThat(bucketOfMin).as("Block %d should fit in one bucket", b).isEqualTo(bucketOfMax); + } + + // Verify data integrity: count should match + assertThat(engine.countSamples()).isEqualTo(totalSamples); + } + + @Test + void testBucketAlignedAggregationUses100PercentFastPath() throws Exception { + database.command("sql", + "CREATE TIMESERIES TYPE Sensor TIMESTAMP ts TAGS (id STRING) FIELDS (value DOUBLE) " + + "SHARDS 1 COMPACTION_INTERVAL 1 HOURS"); + + final TimeSeriesEngine engine = ((LocalTimeSeriesType) database.getSchema().getType("Sensor")).getEngine(); + + // Insert data spanning 2 hours + final int samplesPerHour = 36_000; + final int totalSamples = samplesPerHour * 2; + final long baseTs = 0L; + + final long[] timestamps = new long[totalSamples]; + final Object[] ids = new Object[totalSamples]; + final Object[] values = new Object[totalSamples]; + for (int i = 0; i < totalSamples; i++) { + timestamps[i] = baseTs + i * 100L; + ids[i] = "s1"; + values[i] = (double) (i + 1); + } + + database.begin(); + engine.appendSamples(timestamps, ids, values); + database.commit(); + engine.compactAll(); + + // Aggregate with metrics to verify fast path usage + final AggregationMetrics metrics = new AggregationMetrics(); + final MultiColumnAggregationResult result = engine.aggregateMulti( + Long.MIN_VALUE, Long.MAX_VALUE, + List.of( + new MultiColumnAggregationRequest(2, AggregationType.SUM, "sum_val"), + new MultiColumnAggregationRequest(-1, AggregationType.COUNT, "cnt") + ), + 3_600_000L, null, metrics); + + // With bucket-aligned compaction, ALL blocks should use fast path + assertThat(metrics.getFastPathBlocks()).isEqualTo(2); + assertThat(metrics.getSlowPathBlocks()).isEqualTo(0); + + // Verify 2 buckets + assertThat(result.size()).isEqualTo(2); + + // Verify count per bucket + assertThat(result.getValue(0L, 1)).isCloseTo((double) samplesPerHour, within(0.01)); + assertThat(result.getValue(3_600_000L, 1)).isCloseTo((double) samplesPerHour, within(0.01)); + } + + @Test + void testDefaultCompactionDoesNotSplitAtBuckets() throws Exception { + // Without COMPACTION_INTERVAL, blocks use fixed SEALED_BLOCK_SIZE chunking + database.command("sql", + "CREATE TIMESERIES TYPE Sensor TIMESTAMP ts TAGS (id STRING) FIELDS (value DOUBLE) SHARDS 1"); + + final TimeSeriesEngine engine = ((LocalTimeSeriesType) database.getSchema().getType("Sensor")).getEngine(); + + // Insert 100,000 samples spanning ~2.78 hours at 100ms intervals + final int totalSamples = 100_000; + final long baseTs = 0L; + + final long[] timestamps = new long[totalSamples]; + final Object[] ids = new Object[totalSamples]; + final Object[] values = new Object[totalSamples]; + for (int i = 0; i < totalSamples; i++) { + timestamps[i] = baseTs + i * 100L; + ids[i] = "s1"; + values[i] = 1.0; + } + + database.begin(); + engine.appendSamples(timestamps, ids, values); + database.commit(); + engine.compactAll(); + + // Without bucket alignment, blocks use SEALED_BLOCK_SIZE=65536 → 2 blocks + final TimeSeriesShard shard = engine.getShard(0); + assertThat(shard.getSealedStore().getBlockCount()).isEqualTo(2); + + // First block spans ~1.82 hours → crosses 1h boundary → slow path + final AggregationMetrics metrics = new AggregationMetrics(); + engine.aggregateMulti(Long.MIN_VALUE, Long.MAX_VALUE, + List.of(new MultiColumnAggregationRequest(-1, AggregationType.COUNT, "cnt")), + 3_600_000L, null, metrics); + + // At least one block should use slow path (crossing bucket boundary) + assertThat(metrics.getSlowPathBlocks()).isGreaterThan(0); + } + + @Test + void testSqlDdlWithCompactionInterval() { + // Test that COMPACTION_INTERVAL is properly parsed and persisted + database.command("sql", + "CREATE TIMESERIES TYPE SensorHourly TIMESTAMP ts TAGS (id STRING) FIELDS (temp DOUBLE) " + + "SHARDS 2 COMPACTION_INTERVAL 1 HOURS"); + + final LocalTimeSeriesType type = (LocalTimeSeriesType) database.getSchema().getType("SensorHourly"); + assertThat(type).isNotNull(); + assertThat(type.getCompactionBucketIntervalMs()).isEqualTo(3_600_000L); + } + + @Test + void testSqlDdlWithCompactionIntervalMinutes() { + database.command("sql", + "CREATE TIMESERIES TYPE SensorMinute TIMESTAMP ts TAGS (id STRING) FIELDS (temp DOUBLE) " + + "SHARDS 1 COMPACTION_INTERVAL 15 MINUTES"); + + final LocalTimeSeriesType type = (LocalTimeSeriesType) database.getSchema().getType("SensorMinute"); + assertThat(type).isNotNull(); + assertThat(type.getCompactionBucketIntervalMs()).isEqualTo(15 * 60_000L); + } + + @Test + void testCompactionBucketIntervalPersistedAndReloaded() throws Exception { + database.command("sql", + "CREATE TIMESERIES TYPE Sensor TIMESTAMP ts TAGS (id STRING) FIELDS (value DOUBLE) " + + "SHARDS 1 COMPACTION_INTERVAL 1 HOURS"); + + // Insert some data and compact + final TimeSeriesEngine engine = ((LocalTimeSeriesType) database.getSchema().getType("Sensor")).getEngine(); + final long[] timestamps = { 0L, 100L, 200L }; + final Object[] ids = { "s1", "s1", "s1" }; + final Object[] values = { 1.0, 2.0, 3.0 }; + + database.begin(); + engine.appendSamples(timestamps, ids, values); + database.commit(); + + database.close(); + + // Reopen and verify the config is preserved + database = new DatabaseFactory(DB_PATH).open(); + final LocalTimeSeriesType reloaded = (LocalTimeSeriesType) database.getSchema().getType("Sensor"); + assertThat(reloaded.getCompactionBucketIntervalMs()).isEqualTo(3_600_000L); + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/ContinuousAggregateSQLTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/ContinuousAggregateSQLTest.java new file mode 100644 index 0000000000..d58e4888e8 --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/ContinuousAggregateSQLTest.java @@ -0,0 +1,180 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.TestHelper; +import com.arcadedb.query.sql.executor.Result; +import com.arcadedb.query.sql.executor.ResultSet; +import org.junit.jupiter.api.Test; + +import java.util.ArrayList; +import java.util.List; + +import static org.assertj.core.api.Assertions.assertThat; + +public class ContinuousAggregateSQLTest extends TestHelper { + + @Test + public void testCreateViaSql() { + createSensorType(); + insertInitialData(); + + database.command("sql", + "CREATE CONTINUOUS AGGREGATE hourly_temps AS " + + "SELECT sensor_id, ts.timeBucket('1h', ts) AS hour, avg(temperature) AS avg_temp " + + "FROM SensorReading GROUP BY sensor_id, hour"); + + assertThat(database.getSchema().existsContinuousAggregate("hourly_temps")).isTrue(); + assertThat(database.getSchema().existsType("hourly_temps")).isTrue(); + } + + @Test + public void testCreateIfNotExistsViaSql() { + createSensorType(); + insertInitialData(); + + database.command("sql", + "CREATE CONTINUOUS AGGREGATE IF NOT EXISTS hourly_temps AS " + + "SELECT sensor_id, ts.timeBucket('1h', ts) AS hour, avg(temperature) AS avg_temp " + + "FROM SensorReading GROUP BY sensor_id, hour"); + + // Should not throw + database.command("sql", + "CREATE CONTINUOUS AGGREGATE IF NOT EXISTS hourly_temps AS " + + "SELECT sensor_id, ts.timeBucket('1h', ts) AS hour, avg(temperature) AS avg_temp " + + "FROM SensorReading GROUP BY sensor_id, hour"); + + assertThat(database.getSchema().existsContinuousAggregate("hourly_temps")).isTrue(); + } + + @Test + public void testDropViaSql() { + createSensorType(); + insertInitialData(); + + database.command("sql", + "CREATE CONTINUOUS AGGREGATE hourly_temps AS " + + "SELECT sensor_id, ts.timeBucket('1h', ts) AS hour, avg(temperature) AS avg_temp " + + "FROM SensorReading GROUP BY sensor_id, hour"); + + database.command("sql", "DROP CONTINUOUS AGGREGATE hourly_temps"); + + assertThat(database.getSchema().existsContinuousAggregate("hourly_temps")).isFalse(); + } + + @Test + public void testDropIfExistsViaSql() { + // Should not throw even if it doesn't exist + database.command("sql", "DROP CONTINUOUS AGGREGATE IF EXISTS nonexistent"); + } + + @Test + public void testRefreshViaSql() { + createSensorType(); + insertInitialData(); + + database.command("sql", + "CREATE CONTINUOUS AGGREGATE hourly_temps AS " + + "SELECT sensor_id, ts.timeBucket('1h', ts) AS hour, avg(temperature) AS avg_temp " + + "FROM SensorReading GROUP BY sensor_id, hour"); + + database.command("sql", "REFRESH CONTINUOUS AGGREGATE hourly_temps"); + + assertThat(database.getSchema().getContinuousAggregate("hourly_temps").getStatus()).isEqualTo("VALID"); + } + + @Test + public void testSelectFromSchemaMetadata() { + createSensorType(); + insertInitialData(); + + database.command("sql", + "CREATE CONTINUOUS AGGREGATE hourly_temps AS " + + "SELECT sensor_id, ts.timeBucket('1h', ts) AS hour, avg(temperature) AS avg_temp " + + "FROM SensorReading GROUP BY sensor_id, hour"); + + final ResultSet rs = database.query("sql", "SELECT FROM schema:continuousAggregates"); + final List results = collectResults(rs); + assertThat(results).hasSize(1); + + final Result r = results.get(0); + assertThat(r.getProperty("name")).isEqualTo("hourly_temps"); + assertThat(r.getProperty("sourceType")).isEqualTo("SensorReading"); + assertThat(r.getProperty("bucketColumn")).isEqualTo("hour"); + assertThat(r.getProperty("bucketIntervalMs")).isEqualTo(3_600_000L); + assertThat(r.getProperty("status")).isEqualTo("VALID"); + } + + @Test + public void testEndToEndIncrementalUpdate() { + createSensorType(); + + // Insert initial data (hour 0) + database.transaction(() -> { + database.command("sql", + "INSERT INTO SensorReading SET ts = 1000, sensor_id = 'A', temperature = 20.0"); + database.command("sql", + "INSERT INTO SensorReading SET ts = 2000, sensor_id = 'A', temperature = 22.0"); + }); + + database.command("sql", + "CREATE CONTINUOUS AGGREGATE hourly_temps AS " + + "SELECT sensor_id, ts.timeBucket('1h', ts) AS hour, avg(temperature) AS avg_temp " + + "FROM SensorReading GROUP BY sensor_id, hour"); + + // Verify initial aggregate + List results = collectResults(database.query("sql", "SELECT FROM hourly_temps")); + assertThat(results).hasSize(1); + + // Insert more data (hour 1) + database.transaction(() -> { + database.command("sql", + "INSERT INTO SensorReading SET ts = 3600000, sensor_id = 'A', temperature = 30.0"); + database.command("sql", + "INSERT INTO SensorReading SET ts = 3601000, sensor_id = 'A', temperature = 32.0"); + }); + + // Verify incrementally updated aggregate + results = collectResults(database.query("sql", "SELECT FROM hourly_temps")); + assertThat(results).hasSizeGreaterThanOrEqualTo(2); + } + + private void createSensorType() { + database.command("sql", + "CREATE TIMESERIES TYPE SensorReading TIMESTAMP ts TAGS (sensor_id STRING) FIELDS (temperature DOUBLE)"); + } + + private void insertInitialData() { + database.transaction(() -> { + database.command("sql", + "INSERT INTO SensorReading SET ts = 1000, sensor_id = 'A', temperature = 22.5"); + database.command("sql", + "INSERT INTO SensorReading SET ts = 2000, sensor_id = 'B', temperature = 23.1"); + database.command("sql", + "INSERT INTO SensorReading SET ts = 3000, sensor_id = 'A', temperature = 21.8"); + }); + } + + private List collectResults(final ResultSet rs) { + final List results = new ArrayList<>(); + while (rs.hasNext()) + results.add(rs.next()); + return results; + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/ContinuousAggregateTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/ContinuousAggregateTest.java new file mode 100644 index 0000000000..e5251d6f9c --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/ContinuousAggregateTest.java @@ -0,0 +1,289 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.TestHelper; +import com.arcadedb.exception.SchemaException; +import com.arcadedb.query.sql.executor.Result; +import com.arcadedb.query.sql.executor.ResultSet; +import com.arcadedb.schema.ContinuousAggregate; +import org.junit.jupiter.api.Test; + +import java.util.ArrayList; +import java.util.List; + +import static org.assertj.core.api.Assertions.assertThat; +import static org.assertj.core.api.Assertions.assertThatThrownBy; + +public class ContinuousAggregateTest extends TestHelper { + + @Test + public void testCreateAndInitialPopulation() { + createSensorType(); + insertInitialData(); + + final ContinuousAggregate ca = database.getSchema().buildContinuousAggregate() + .withName("hourly_temps") + .withQuery("SELECT sensor_id, ts.timeBucket('1h', ts) AS hour, avg(temperature) AS avg_temp FROM SensorReading GROUP BY sensor_id, hour") + .create(); + + assertThat(ca.getName()).isEqualTo("hourly_temps"); + assertThat(ca.getSourceTypeName()).isEqualTo("SensorReading"); + assertThat(ca.getBucketColumn()).isEqualTo("hour"); + assertThat(ca.getBucketIntervalMs()).isEqualTo(3_600_000L); + assertThat(ca.getStatus()).isEqualTo("VALID"); + assertThat(ca.getWatermarkTs()).isGreaterThanOrEqualTo(0); + + // Verify backing type has data + final ResultSet rs = database.query("sql", "SELECT FROM hourly_temps"); + final List results = collectResults(rs); + assertThat(results).isNotEmpty(); + } + + @Test + public void testIncrementalRefreshOnInsert() { + createSensorType(); + insertInitialData(); + + database.getSchema().buildContinuousAggregate() + .withName("hourly_temps") + .withQuery("SELECT sensor_id, ts.timeBucket('1h', ts) AS hour, avg(temperature) AS avg_temp FROM SensorReading GROUP BY sensor_id, hour") + .create(); + + final long initialWatermark = database.getSchema().getContinuousAggregate("hourly_temps").getWatermarkTs(); + + // Insert new data in a later hour + database.transaction(() -> { + database.command("sql", + "INSERT INTO SensorReading SET ts = 7200000, sensor_id = 'A', temperature = 30.0"); + database.command("sql", + "INSERT INTO SensorReading SET ts = 7201000, sensor_id = 'A', temperature = 32.0"); + }); + + // The post-commit callback should have triggered an incremental refresh + final ContinuousAggregate ca = database.getSchema().getContinuousAggregate("hourly_temps"); + assertThat(ca.getWatermarkTs()).isGreaterThanOrEqualTo(initialWatermark); + assertThat(ca.getStatus()).isEqualTo("VALID"); + + // Verify the new bucket data exists + final ResultSet rs = database.query("sql", + "SELECT FROM hourly_temps WHERE hour >= ?", 7200000L); + final List results = collectResults(rs); + assertThat(results).isNotEmpty(); + } + + @Test + public void testWatermarkAdvances() { + createSensorType(); + + // Insert data at hour 0 + database.transaction(() -> { + database.command("sql", + "INSERT INTO SensorReading SET ts = 100, sensor_id = 'A', temperature = 20.0"); + }); + + database.getSchema().buildContinuousAggregate() + .withName("hourly_temps") + .withQuery("SELECT sensor_id, ts.timeBucket('1h', ts) AS hour, avg(temperature) AS avg_temp FROM SensorReading GROUP BY sensor_id, hour") + .create(); + + final long wm1 = database.getSchema().getContinuousAggregate("hourly_temps").getWatermarkTs(); + assertThat(wm1).isEqualTo(0L); // bucket start for ts=100 with 1h interval is 0 + + // Insert data at hour 1 + database.transaction(() -> { + database.command("sql", + "INSERT INTO SensorReading SET ts = 3600000, sensor_id = 'A', temperature = 25.0"); + }); + + final long wm2 = database.getSchema().getContinuousAggregate("hourly_temps").getWatermarkTs(); + assertThat(wm2).isGreaterThanOrEqualTo(wm1); + } + + @Test + public void testDropContinuousAggregate() { + createSensorType(); + insertInitialData(); + + database.getSchema().buildContinuousAggregate() + .withName("hourly_temps") + .withQuery("SELECT sensor_id, ts.timeBucket('1h', ts) AS hour, avg(temperature) AS avg_temp FROM SensorReading GROUP BY sensor_id, hour") + .create(); + + assertThat(database.getSchema().existsContinuousAggregate("hourly_temps")).isTrue(); + assertThat(database.getSchema().existsType("hourly_temps")).isTrue(); + + database.getSchema().dropContinuousAggregate("hourly_temps"); + + assertThat(database.getSchema().existsContinuousAggregate("hourly_temps")).isFalse(); + assertThat(database.getSchema().existsType("hourly_temps")).isFalse(); + } + + @Test + public void testIfNotExistsIdempotent() { + createSensorType(); + insertInitialData(); + + database.getSchema().buildContinuousAggregate() + .withName("hourly_temps") + .withQuery("SELECT sensor_id, ts.timeBucket('1h', ts) AS hour, avg(temperature) AS avg_temp FROM SensorReading GROUP BY sensor_id, hour") + .create(); + + // Should not throw + final ContinuousAggregate ca2 = database.getSchema().buildContinuousAggregate() + .withName("hourly_temps") + .withQuery("SELECT sensor_id, ts.timeBucket('1h', ts) AS hour, avg(temperature) AS avg_temp FROM SensorReading GROUP BY sensor_id, hour") + .withIgnoreIfExists(true) + .create(); + + assertThat(ca2.getName()).isEqualTo("hourly_temps"); + } + + @Test + public void testManualRefresh() { + createSensorType(); + insertInitialData(); + + database.getSchema().buildContinuousAggregate() + .withName("hourly_temps") + .withQuery("SELECT sensor_id, ts.timeBucket('1h', ts) AS hour, avg(temperature) AS avg_temp FROM SensorReading GROUP BY sensor_id, hour") + .create(); + + final ContinuousAggregate ca = database.getSchema().getContinuousAggregate("hourly_temps"); + final long countBefore = ca.getRefreshCount(); + + ca.refresh(); + + assertThat(ca.getRefreshCount()).isGreaterThan(countBefore); + assertThat(ca.getStatus()).isEqualTo("VALID"); + } + + @Test + public void testSchemaPersistence() { + createSensorType(); + insertInitialData(); + + database.getSchema().buildContinuousAggregate() + .withName("hourly_temps") + .withQuery("SELECT sensor_id, ts.timeBucket('1h', ts) AS hour, avg(temperature) AS avg_temp FROM SensorReading GROUP BY sensor_id, hour") + .create(); + + // Close and reopen + final String dbPath = database.getDatabasePath(); + database.close(); + database = factory.open(); + + assertThat(database.getSchema().existsContinuousAggregate("hourly_temps")).isTrue(); + final ContinuousAggregate ca = database.getSchema().getContinuousAggregate("hourly_temps"); + assertThat(ca.getSourceTypeName()).isEqualTo("SensorReading"); + assertThat(ca.getBucketColumn()).isEqualTo("hour"); + } + + @Test + public void testInvalidQueryNoTimeBucket() { + createSensorType(); + + assertThatThrownBy(() -> + database.getSchema().buildContinuousAggregate() + .withName("bad_ca") + .withQuery("SELECT sensor_id, avg(temperature) AS avg_temp FROM SensorReading GROUP BY sensor_id") + .create() + ).isInstanceOf(SchemaException.class) + .hasMessageContaining("ts.timeBucket"); + } + + @Test + public void testInvalidQueryNonTimeSeriesSource() { + database.getSchema().buildDocumentType().withName("RegularDoc").create(); + + assertThatThrownBy(() -> + database.getSchema().buildContinuousAggregate() + .withName("bad_ca") + .withQuery("SELECT ts.timeBucket('1h', ts) AS hour, count(*) AS cnt FROM RegularDoc GROUP BY hour") + .create() + ).isInstanceOf(SchemaException.class) + .hasMessageContaining("not a TimeSeries type"); + } + + @Test + public void testInvalidQueryNoGroupBy() { + createSensorType(); + + assertThatThrownBy(() -> + database.getSchema().buildContinuousAggregate() + .withName("bad_ca") + .withQuery("SELECT ts.timeBucket('1h', ts) AS hour, avg(temperature) AS avg_temp FROM SensorReading") + .create() + ).isInstanceOf(SchemaException.class) + .hasMessageContaining("GROUP BY"); + } + + @Test + public void testGetContinuousAggregates() { + createSensorType(); + insertInitialData(); + + database.getSchema().buildContinuousAggregate() + .withName("hourly_temps") + .withQuery("SELECT sensor_id, ts.timeBucket('1h', ts) AS hour, avg(temperature) AS avg_temp FROM SensorReading GROUP BY sensor_id, hour") + .create(); + + final ContinuousAggregate[] aggregates = database.getSchema().getContinuousAggregates(); + assertThat(aggregates).hasSize(1); + assertThat(aggregates[0].getName()).isEqualTo("hourly_temps"); + } + + @Test + public void testProtectSourceTypeFromDrop() { + createSensorType(); + insertInitialData(); + + database.getSchema().buildContinuousAggregate() + .withName("hourly_temps") + .withQuery("SELECT sensor_id, ts.timeBucket('1h', ts) AS hour, avg(temperature) AS avg_temp FROM SensorReading GROUP BY sensor_id, hour") + .create(); + + assertThatThrownBy(() -> database.getSchema().dropType("SensorReading")) + .isInstanceOf(SchemaException.class) + .hasMessageContaining("continuous aggregate"); + } + + private void createSensorType() { + database.command("sql", + "CREATE TIMESERIES TYPE SensorReading TIMESTAMP ts TAGS (sensor_id STRING) FIELDS (temperature DOUBLE)"); + } + + private void insertInitialData() { + database.transaction(() -> { + database.command("sql", + "INSERT INTO SensorReading SET ts = 1000, sensor_id = 'A', temperature = 22.5"); + database.command("sql", + "INSERT INTO SensorReading SET ts = 2000, sensor_id = 'B', temperature = 23.1"); + database.command("sql", + "INSERT INTO SensorReading SET ts = 3000, sensor_id = 'A', temperature = 21.8"); + }); + } + + private List collectResults(final ResultSet rs) { + final List results = new ArrayList<>(); + while (rs.hasNext()) + results.add(rs.next()); + return results; + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/CreateTimeSeriesTypeStatementTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/CreateTimeSeriesTypeStatementTest.java new file mode 100644 index 0000000000..f8311d508b --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/CreateTimeSeriesTypeStatementTest.java @@ -0,0 +1,111 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.GlobalConfiguration; +import com.arcadedb.TestHelper; +import com.arcadedb.query.sql.executor.Result; +import com.arcadedb.query.sql.executor.ResultSet; +import com.arcadedb.schema.DocumentType; +import com.arcadedb.schema.LocalTimeSeriesType; +import org.junit.jupiter.api.Test; + +import static org.assertj.core.api.Assertions.assertThat; + +/** + * Tests for CREATE TIMESERIES TYPE SQL statement. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public class CreateTimeSeriesTypeStatementTest extends TestHelper { + + @Test + public void testBasicCreateTimeSeriesType() { + final ResultSet result = database.command("sql", + "CREATE TIMESERIES TYPE SensorData TIMESTAMP ts TAGS (sensor_id STRING) FIELDS (temperature DOUBLE)"); + + assertThat(result.hasNext()).isTrue(); + final Result row = result.next(); + assertThat((String) row.getProperty("operation")).isEqualTo("create timeseries type"); + assertThat((String) row.getProperty("typeName")).isEqualTo("SensorData"); + + assertThat(database.getSchema().existsType("SensorData")).isTrue(); + final DocumentType type = database.getSchema().getType("SensorData"); + assertThat(type).isInstanceOf(LocalTimeSeriesType.class); + + final LocalTimeSeriesType tsType = (LocalTimeSeriesType) type; + assertThat(tsType.getTimestampColumn()).isEqualTo("ts"); + assertThat(tsType.getTsColumns()).hasSize(3); // ts + sensor_id + temperature + } + + @Test + public void testCreateWithShardsAndRetention() { + database.command("sql", + "CREATE TIMESERIES TYPE Metrics TIMESTAMP ts TAGS (host STRING) FIELDS (cpu DOUBLE, mem LONG) SHARDS 4 RETENTION 90 DAYS"); + + final LocalTimeSeriesType tsType = (LocalTimeSeriesType) database.getSchema().getType("Metrics"); + assertThat(tsType.getShardCount()).isEqualTo(4); + assertThat(tsType.getRetentionMs()).isEqualTo(90L * 86400000L); + assertThat(tsType.getTsColumns()).hasSize(4); // ts + host + cpu + mem + } + + @Test + public void testCreateWithRetentionHours() { + database.command("sql", + "CREATE TIMESERIES TYPE HourlyData TIMESTAMP ts FIELDS (value DOUBLE) RETENTION 24 HOURS"); + + final LocalTimeSeriesType tsType = (LocalTimeSeriesType) database.getSchema().getType("HourlyData"); + assertThat(tsType.getRetentionMs()).isEqualTo(24L * 3600000L); + } + + @Test + public void testCreateWithMultipleTags() { + database.command("sql", + "CREATE TIMESERIES TYPE MultiTag TIMESTAMP ts TAGS (region STRING, zone INTEGER) FIELDS (temp DOUBLE)"); + + final LocalTimeSeriesType tsType = (LocalTimeSeriesType) database.getSchema().getType("MultiTag"); + assertThat(tsType.getTsColumns()).hasSize(4); // ts + region + zone + temp + + // Verify roles + assertThat(tsType.getTsColumns().get(0).getRole()).isEqualTo(ColumnDefinition.ColumnRole.TIMESTAMP); + assertThat(tsType.getTsColumns().get(1).getRole()).isEqualTo(ColumnDefinition.ColumnRole.TAG); + assertThat(tsType.getTsColumns().get(2).getRole()).isEqualTo(ColumnDefinition.ColumnRole.TAG); + assertThat(tsType.getTsColumns().get(3).getRole()).isEqualTo(ColumnDefinition.ColumnRole.FIELD); + } + + @Test + public void testCreateIfNotExists() { + database.command("sql", "CREATE TIMESERIES TYPE Existing TIMESTAMP ts FIELDS (value DOUBLE)"); + // Should not throw + database.command("sql", "CREATE TIMESERIES TYPE Existing IF NOT EXISTS TIMESTAMP ts FIELDS (value DOUBLE)"); + + assertThat(database.getSchema().existsType("Existing")).isTrue(); + } + + @Test + public void testCreateMinimal() { + database.command("sql", "CREATE TIMESERIES TYPE Minimal TIMESTAMP ts FIELDS (value DOUBLE)"); + + final LocalTimeSeriesType tsType = (LocalTimeSeriesType) database.getSchema().getType("Minimal"); + assertThat(tsType.getTimestampColumn()).isEqualTo("ts"); + final int expectedShards = database.getConfiguration().getValueAsInteger(GlobalConfiguration.ASYNC_WORKER_THREADS); + assertThat(tsType.getShardCount()).isEqualTo(expectedShards); + assertThat(tsType.getRetentionMs()).isEqualTo(0L); + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/DictionaryCodecOverflowTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/DictionaryCodecOverflowTest.java new file mode 100644 index 0000000000..d047d27082 --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/DictionaryCodecOverflowTest.java @@ -0,0 +1,115 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.TestHelper; +import com.arcadedb.database.DatabaseInternal; +import com.arcadedb.engine.timeseries.codec.DictionaryCodec; +import com.arcadedb.schema.Type; +import org.junit.jupiter.api.Test; + +import java.io.IOException; +import java.util.List; + +import static org.assertj.core.api.Assertions.assertThat; +import static org.assertj.core.api.Assertions.assertThatThrownBy; + +/** + * Tests that DictionaryCodec overflow (>65,535 distinct tag values) fails cleanly + * and leaves the shard in a consistent state. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +class DictionaryCodecOverflowTest extends TestHelper { + + @Test + void testCodecOverflowThrows() { + // Direct codec test: more than MAX_DICTIONARY_SIZE distinct values + final String[] values = new String[DictionaryCodec.MAX_DICTIONARY_SIZE + 1]; + for (int i = 0; i < values.length; i++) + values[i] = "tag_" + i; + + assertThatThrownBy(() -> DictionaryCodec.encode(values)) + .isInstanceOf(IllegalArgumentException.class) + .hasMessageContaining("Dictionary overflow"); + } + + @Test + void testCompactionAutoSplitsOnOverflow() throws Exception { + final List columns = List.of( + new ColumnDefinition("ts", Type.LONG, ColumnDefinition.ColumnRole.TIMESTAMP), + new ColumnDefinition("tag", Type.STRING, ColumnDefinition.ColumnRole.TAG), + new ColumnDefinition("value", Type.DOUBLE, ColumnDefinition.ColumnRole.FIELD) + ); + + database.begin(); + final TimeSeriesShard shard = new TimeSeriesShard( + (DatabaseInternal) database, "test_dict_overflow", 0, columns); + + // Insert data with more distinct tag values than the dictionary can handle in one block + final int overflowCount = DictionaryCodec.MAX_DICTIONARY_SIZE + 100; + final long[] timestamps = new long[overflowCount]; + final Object[] tags = new Object[overflowCount]; + final Object[] values = new Object[overflowCount]; + for (int i = 0; i < overflowCount; i++) { + timestamps[i] = i * 1000L; + tags[i] = "unique_tag_" + i; + values[i] = (double) i; + } + + shard.appendSamples(timestamps, tags, values); + database.commit(); + + // Compaction should succeed by auto-splitting into multiple blocks + shard.compact(); + + // Multiple sealed blocks should be created (at least 2) + assertThat(shard.getSealedStore().getBlockCount()).isGreaterThanOrEqualTo(2); + + // Mutable bucket should be empty after compaction + database.begin(); + assertThat(shard.getMutableBucket().getSampleCount()).isEqualTo(0); + database.commit(); + + // All data should be readable from sealed store + database.begin(); + final List results = shard.scanRange(0, Long.MAX_VALUE, null, null); + database.commit(); + + assertThat(results).hasSize(overflowCount); + + shard.close(); + } + + @Test + void testCodecAtExactLimit() throws java.io.IOException { + // Exactly MAX_DICTIONARY_SIZE distinct values should succeed + final String[] values = new String[DictionaryCodec.MAX_DICTIONARY_SIZE]; + for (int i = 0; i < values.length; i++) + values[i] = "tag_" + i; + + final byte[] encoded = DictionaryCodec.encode(values); + assertThat(encoded).isNotEmpty(); + + final String[] decoded = DictionaryCodec.decode(encoded); + assertThat(decoded).hasSize(values.length); + assertThat(decoded[0]).isEqualTo("tag_0"); + assertThat(decoded[values.length - 1]).isEqualTo("tag_" + (values.length - 1)); + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/LineProtocolParserTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/LineProtocolParserTest.java new file mode 100644 index 0000000000..2b5c9d7444 --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/LineProtocolParserTest.java @@ -0,0 +1,231 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.engine.timeseries.LineProtocolParser.Precision; +import com.arcadedb.engine.timeseries.LineProtocolParser.Sample; +import org.junit.jupiter.api.Test; + +import java.util.List; + +import static org.assertj.core.api.Assertions.assertThat; + +/** + * Tests for InfluxDB Line Protocol parser. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public class LineProtocolParserTest { + + @Test + public void testSingleLine() { + final List samples = LineProtocolParser.parse( + "weather,location=us-midwest temperature=82 1465839830100400200", Precision.NANOSECONDS); + + assertThat(samples).hasSize(1); + final Sample s = samples.get(0); + assertThat(s.getMeasurement()).isEqualTo("weather"); + assertThat(s.getTags()).containsEntry("location", "us-midwest"); + assertThat(s.getFields()).containsEntry("temperature", 82.0); + assertThat(s.getTimestampMs()).isEqualTo(1465839830100L); // ns -> ms + } + + @Test + public void testMultipleLines() { + final String text = """ + cpu,host=serverA usage=55.3 1000000000 + cpu,host=serverB usage=72.1 2000000000 + cpu,host=serverC usage=91.0 3000000000 + """; + final List samples = LineProtocolParser.parse(text, Precision.NANOSECONDS); + assertThat(samples).hasSize(3); + assertThat(samples.get(0).getTags().get("host")).isEqualTo("serverA"); + assertThat(samples.get(1).getTags().get("host")).isEqualTo("serverB"); + assertThat(samples.get(2).getTags().get("host")).isEqualTo("serverC"); + } + + @Test + public void testAllFieldTypes() { + final List samples = LineProtocolParser.parse( + "test value_double=1.5,value_int=42i,value_str=\"hello\",value_bool=true 1000", Precision.MILLISECONDS); + + assertThat(samples).hasSize(1); + final Sample s = samples.get(0); + assertThat(s.getFields().get("value_double")).isEqualTo(1.5); + assertThat(s.getFields().get("value_int")).isEqualTo(42L); + assertThat(s.getFields().get("value_str")).isEqualTo("hello"); + assertThat(s.getFields().get("value_bool")).isEqualTo(true); + } + + @Test + public void testMultipleTags() { + final List samples = LineProtocolParser.parse( + "sensor,region=us-east,zone=1a,rack=42 temp=22.5 1000", Precision.MILLISECONDS); + + assertThat(samples).hasSize(1); + assertThat(samples.get(0).getTags()).hasSize(3); + assertThat(samples.get(0).getTags().get("region")).isEqualTo("us-east"); + assertThat(samples.get(0).getTags().get("zone")).isEqualTo("1a"); + assertThat(samples.get(0).getTags().get("rack")).isEqualTo("42"); + } + + @Test + public void testNoTags() { + final List samples = LineProtocolParser.parse( + "metric value=100.0 5000", Precision.MILLISECONDS); + + assertThat(samples).hasSize(1); + assertThat(samples.get(0).getMeasurement()).isEqualTo("metric"); + assertThat(samples.get(0).getTags()).isEmpty(); + assertThat(samples.get(0).getFields().get("value")).isEqualTo(100.0); + assertThat(samples.get(0).getTimestampMs()).isEqualTo(5000L); + } + + @Test + public void testMissingTimestamp() { + final List samples = LineProtocolParser.parse( + "metric value=42.0", Precision.MILLISECONDS); + + assertThat(samples).hasSize(1); + // Timestamp should be approximately "now" + assertThat(samples.get(0).getTimestampMs()).isGreaterThan(0L); + } + + @Test + public void testPrecisionConversion() { + // Nanoseconds + List ns = LineProtocolParser.parse("m v=1.0 1000000000", Precision.NANOSECONDS); + assertThat(ns.get(0).getTimestampMs()).isEqualTo(1000L); // 1 second + + // Microseconds + List us = LineProtocolParser.parse("m v=1.0 1000000", Precision.MICROSECONDS); + assertThat(us.get(0).getTimestampMs()).isEqualTo(1000L); // 1 second + + // Milliseconds + List ms = LineProtocolParser.parse("m v=1.0 1000", Precision.MILLISECONDS); + assertThat(ms.get(0).getTimestampMs()).isEqualTo(1000L); // 1 second + + // Seconds + List s = LineProtocolParser.parse("m v=1.0 1", Precision.SECONDS); + assertThat(s.get(0).getTimestampMs()).isEqualTo(1000L); // 1 second + } + + @Test + public void testEmptyAndCommentLines() { + final String text = """ + # This is a comment + + metric value=1.0 1000 + # Another comment + metric value=2.0 2000 + """; + final List samples = LineProtocolParser.parse(text, Precision.MILLISECONDS); + assertThat(samples).hasSize(2); + } + + @Test + public void testBooleanValues() { + final List samples = LineProtocolParser.parse( + "test a=true,b=false,c=t,d=f 1000", Precision.MILLISECONDS); + + assertThat(samples.get(0).getFields().get("a")).isEqualTo(true); + assertThat(samples.get(0).getFields().get("b")).isEqualTo(false); + assertThat(samples.get(0).getFields().get("c")).isEqualTo(true); + assertThat(samples.get(0).getFields().get("d")).isEqualTo(false); + } + + @Test + public void testMultipleFields() { + final List samples = LineProtocolParser.parse( + "system,host=server1 cpu=55.3,mem=8192i,disk=75.2 1000", Precision.MILLISECONDS); + + assertThat(samples.get(0).getFields()).hasSize(3); + assertThat(samples.get(0).getFields().get("cpu")).isEqualTo(55.3); + assertThat(samples.get(0).getFields().get("mem")).isEqualTo(8192L); + assertThat(samples.get(0).getFields().get("disk")).isEqualTo(75.2); + } + + @Test + public void testEmptyInput() { + assertThat(LineProtocolParser.parse("", Precision.MILLISECONDS)).isEmpty(); + assertThat(LineProtocolParser.parse(null, Precision.MILLISECONDS)).isEmpty(); + } + + /** + * Regression test: a single malformed line must not stop parsing of the remaining batch. + * Previously NumberFormatException would propagate and halt the entire parse. + */ + @Test + public void testMalformedLineDoesNotHaltBatch() { + final String text = "metric value=1.0 1000\n" + + "metric value=not_a_number 2000\n" + // malformed field value + "metric value=3.0 3000\n"; + final List samples = LineProtocolParser.parse(text, Precision.MILLISECONDS); + // Bad line is skipped; good lines are still parsed + assertThat(samples).hasSize(2); + assertThat(samples.get(0).getTimestampMs()).isEqualTo(1000L); + assertThat(samples.get(1).getTimestampMs()).isEqualTo(3000L); + } + + /** + * Regression test: a malformed timestamp must skip the line, not abort the batch. + */ + @Test + public void testMalformedTimestampDoesNotHaltBatch() { + final String text = "metric value=1.0 1000\n" + + "metric value=2.0 NOT_A_TIMESTAMP\n" + // malformed timestamp + "metric value=3.0 3000\n"; + final List samples = LineProtocolParser.parse(text, Precision.MILLISECONDS); + assertThat(samples).hasSize(2); + assertThat(samples.get(0).getTimestampMs()).isEqualTo(1000L); + assertThat(samples.get(1).getTimestampMs()).isEqualTo(3000L); + } + + /** + * Regression test: an unsigned integer field value that overflows Long.MAX_VALUE must skip + * the line (IllegalArgumentException from Simple8b range check) instead of halting the batch. + * Previously only NumberFormatException was caught, missing this case. + */ + @Test + public void testUnsignedIntegerMaxValueIsAccepted() { + // 18446744073709551615u = max uint64; stored as the signed bit-pattern -1L (correct per InfluxDB spec) + final String text = "metric value=1.0 1000\n" + + "metric overflow=18446744073709551615u 2000\n" + + "metric value=3.0 3000\n"; + final List samples = LineProtocolParser.parse(text, Precision.MILLISECONDS); + assertThat(samples).hasSize(3); + assertThat(samples.get(0).getTimestampMs()).isEqualTo(1000L); + assertThat(samples.get(1).getTimestampMs()).isEqualTo(2000L); + assertThat(samples.get(1).getFields().get("overflow")).isEqualTo(-1L); // max uint64 stored as signed bit-pattern + assertThat(samples.get(2).getTimestampMs()).isEqualTo(3000L); + } + + /** + * Regression: an unterminated quoted string should be rejected, not silently accepted + * with a wrong consumed-length. + */ + @Test + public void testUnterminatedQuotedStringIsRejected() { + // The quoted string for field2 is never closed — the line is malformed + final String text = "metric field1=1.0,field2=\"unterminated 1000\n"; + // Should skip the malformed line and return nothing (or throw) + final List samples = LineProtocolParser.parse(text, Precision.MILLISECONDS); + assertThat(samples).isEmpty(); + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/SQLFunctionTimeBucketTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/SQLFunctionTimeBucketTest.java new file mode 100644 index 0000000000..f25b3e635b --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/SQLFunctionTimeBucketTest.java @@ -0,0 +1,115 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.function.sql.time.SQLFunctionTimeBucket; +import org.junit.jupiter.api.Test; + +import java.util.Date; + +import static org.assertj.core.api.Assertions.assertThat; +import static org.assertj.core.api.Assertions.assertThatThrownBy; + +/** + * Tests for the time_bucket() SQL function. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public class SQLFunctionTimeBucketTest { + + private final SQLFunctionTimeBucket fn = new SQLFunctionTimeBucket(); + + @Test + public void testHourBucket() { + // 2026-02-20T10:35:00Z -> should truncate to 2026-02-20T10:00:00Z + final long ts = 1771580100000L; // ~2026-02-20T10:35:00Z + final Date result = (Date) fn.execute(null, null, null, new Object[] { "1h", ts }, null); + + // Should be truncated to nearest hour + assertThat(result.getTime() % 3600000L).isEqualTo(0L); + assertThat(result.getTime()).isLessThanOrEqualTo(ts); + assertThat(result.getTime()).isGreaterThan(ts - 3600000L); + } + + @Test + public void testMinuteBucket() { + final long ts = 1771580100000L; // some timestamp + final Date result = (Date) fn.execute(null, null, null, new Object[] { "5m", ts }, null); + + // Should be truncated to 5-minute boundary + assertThat(result.getTime() % (5 * 60000L)).isEqualTo(0L); + assertThat(result.getTime()).isLessThanOrEqualTo(ts); + } + + @Test + public void testSecondBucket() { + final long ts = 1771580123456L; + final Date result = (Date) fn.execute(null, null, null, new Object[] { "1s", ts }, null); + + assertThat(result.getTime() % 1000L).isEqualTo(0L); + assertThat(result.getTime()).isLessThanOrEqualTo(ts); + } + + @Test + public void testDayBucket() { + final long ts = 1771580100000L; + final Date result = (Date) fn.execute(null, null, null, new Object[] { "1d", ts }, null); + + assertThat(result.getTime() % 86400000L).isEqualTo(0L); + assertThat(result.getTime()).isLessThanOrEqualTo(ts); + } + + @Test + public void testWeekBucket() { + final long ts = 1771580100000L; + final Date result = (Date) fn.execute(null, null, null, new Object[] { "1w", ts }, null); + + assertThat(result.getTime() % (7 * 86400000L)).isEqualTo(0L); + assertThat(result.getTime()).isLessThanOrEqualTo(ts); + } + + @Test + public void testWithDateObject() { + final Date input = new Date(1771580100000L); + final Date result = (Date) fn.execute(null, null, null, new Object[] { "1h", input }, null); + + assertThat(result.getTime() % 3600000L).isEqualTo(0L); + } + + @Test + public void testExactBoundary() { + // Timestamp already on an hour boundary + final long ts = 3600000L * 5; // exactly 05:00:00 UTC epoch + final Date result = (Date) fn.execute(null, null, null, new Object[] { "1h", ts }, null); + + assertThat(result.getTime()).isEqualTo(ts); + } + + @Test + public void testInvalidInterval() { + assertThatThrownBy(() -> fn.execute(null, null, null, new Object[] { "1x", 12345L }, null)) + .isInstanceOf(IllegalArgumentException.class); + } + + @Test + public void testMissingParams() { + assertThatThrownBy(() -> fn.execute(null, null, null, new Object[] { "1h" }, null)) + .isInstanceOf(IllegalArgumentException.class); + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/TagFilterTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/TagFilterTest.java new file mode 100644 index 0000000000..c9b31286aa --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/TagFilterTest.java @@ -0,0 +1,99 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import org.junit.jupiter.api.Test; + +import static org.assertj.core.api.Assertions.assertThat; + +/** + * Unit tests for {@link TagFilter}, including the {@code matchesMapped} method that + * handles subset column projections. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +class TagFilterTest { + + @Test + void testMatchesFullSchemaOrder() { + // row[0]=ts, row[1]=col0, row[2]=col1, row[3]=col2 + final Object[] row = { 1000L, "shard-a", 42.0, "region-eu" }; + final TagFilter filter = TagFilter.eq(0, "shard-a"); + assertThat(filter.matches(row)).isTrue(); + + final TagFilter noMatch = TagFilter.eq(0, "shard-b"); + assertThat(noMatch.matches(row)).isFalse(); + } + + @Test + void testMatchesMappedNullColumnIndicesFallsBackToMatches() { + // null columnIndices → full schema order, same semantics as matches() + final Object[] row = { 1000L, "shard-a", 42.0, "region-eu" }; + final TagFilter filter = TagFilter.eq(2, "region-eu"); + assertThat(filter.matchesMapped(row, null)).isTrue(); + assertThat(filter.matchesMapped(row, null)).isEqualTo(filter.matches(row)); + } + + @Test + void testMatchesMappedSubsetColumnsCorrectly() { + // Regression: column index 2 in a subset projection [0, 2] must resolve to row[2], not row[3]. + // Schema layout (non-ts): col0="shard", col1="value", col2="region" + // Projection selects col0 and col2 (columnIndices=[0,2]). + // Row: row[0]=ts, row[1]=col0, row[2]=col2 (col1 omitted) + final Object[] row = { 1000L, "shard-a", "region-eu" }; + final int[] columnIndices = { 0, 2 }; + + // Filter on col2 ("region-eu"), schema index 2 → should map to row[2] + final TagFilter filter = TagFilter.eq(2, "region-eu"); + assertThat(filter.matchesMapped(row, columnIndices)).isTrue(); + + // Without the fix, matches() would check row[3] (out of bounds or wrong) — demonstrate the difference: + // matches(row) uses row[cond.columnIndex + 1] = row[3], which is out of range → returns false + assertThat(filter.matches(row)).isFalse(); // row[3] doesn't exist + + final TagFilter noMatch = TagFilter.eq(2, "region-us"); + assertThat(noMatch.matchesMapped(row, columnIndices)).isFalse(); + } + + @Test + void testMatchesMappedColumnNotInSubset() { + // If a tag column is not present in the selected subset, matchesMapped returns false + final Object[] row = { 1000L, "shard-a" }; // only col0 selected + final int[] columnIndices = { 0 }; + + // Filter on col2, which was not included in columnIndices=[0] + final TagFilter filter = TagFilter.eq(2, "region-eu"); + assertThat(filter.matchesMapped(row, columnIndices)).isFalse(); + } + + @Test + void testMatchesMappedMultipleConditions() { + // Two conditions on non-adjacent columns in a subset projection + // Schema: col0=shard, col1=value, col2=region, col3=env + // Projection: [0, 3] → row[1]=col0, row[2]=col3 + final Object[] row = { 2000L, "shard-b", "prod" }; + final int[] columnIndices = { 0, 3 }; + + final TagFilter filter = TagFilter.eq(0, "shard-b").and(3, "prod"); + assertThat(filter.matchesMapped(row, columnIndices)).isTrue(); + + final TagFilter partial = TagFilter.eq(0, "shard-b").and(3, "staging"); + assertThat(partial.matchesMapped(row, columnIndices)).isFalse(); + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesAccuracyTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesAccuracyTest.java new file mode 100644 index 0000000000..7aba82abe2 --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesAccuracyTest.java @@ -0,0 +1,299 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.database.Database; +import com.arcadedb.database.DatabaseFactory; +import com.arcadedb.query.sql.executor.Result; +import com.arcadedb.query.sql.executor.ResultSet; +import com.arcadedb.schema.LocalTimeSeriesType; +import com.arcadedb.utility.FileUtils; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; + +import java.io.File; +import java.util.ArrayList; +import java.util.Date; +import java.util.List; + +import static org.assertj.core.api.Assertions.assertThat; +import static org.assertj.core.api.Assertions.within; + +/** + * Deterministic accuracy test for the full TimeSeries pipeline: + * mutable insert → compaction → sealed blocks → query → aggregation. + *

+ * Uses 200,000 samples with pre-computable values (value = i + 1) so every + * COUNT, SUM, MIN, MAX, AVG can be verified exactly against closed-form formulas. + * Exercises multi-page mutable storage, multi-block sealed storage, compression + * codecs (DeltaOfDelta, Dictionary, GorillaXOR), and bucket-aligned compaction. + */ +class TimeSeriesAccuracyTest { + + private static final String DB_PATH = "target/databases/TimeSeriesAccuracyTest"; + private static final int TOTAL_SAMPLES = 200_000; + private static final long INTERVAL_MS = 54L; // 3h / 200K ≈ 54ms + private static final long HOUR_MS = 3_600_000L; + + // Per-bucket sample index ranges (value[i] = i + 1, timestamp[i] = i * 54) + // Bucket 0: [0, 3_600_000) → i in [0, 66666] → 66667 samples + // Bucket 1: [3_600_000, 7_200_000) → i in [66667, 133333] → 66667 samples + // Bucket 2: [7_200_000, ...) → i in [133334, 199999] → 66666 samples + private static final int BUCKET_0_START = 0; + private static final int BUCKET_0_END = 66_666; + private static final int BUCKET_1_START = 66_667; + private static final int BUCKET_1_END = 133_333; + private static final int BUCKET_2_START = 133_334; + private static final int BUCKET_2_END = 199_999; + + private Database database; + + @BeforeEach + void setUp() { + FileUtils.deleteRecursively(new File(DB_PATH)); + database = new DatabaseFactory(DB_PATH).create(); + } + + @AfterEach + void tearDown() { + if (database != null && database.isOpen()) + database.close(); + FileUtils.deleteRecursively(new File(DB_PATH)); + } + + @Test + void testTotalCountMatchesInserted() throws Exception { + final TimeSeriesEngine engine = createAndPopulate(); + + // Verify via direct API + assertThat(engine.countSamples()).isEqualTo(TOTAL_SAMPLES); + + // Verify via SQL + final ResultSet rs = database.query("sql", "SELECT count(*) AS cnt FROM Sensor"); + assertThat(rs.hasNext()).isTrue(); + final Result row = rs.next(); + assertThat(((Number) row.getProperty("cnt")).longValue()).isEqualTo(TOTAL_SAMPLES); + rs.close(); + } + + @Test + void testFullRangeScanReturnsAllSamples() throws Exception { + final TimeSeriesEngine engine = createAndPopulate(); + + final List rows = engine.query(Long.MIN_VALUE, Long.MAX_VALUE, null, null); + + assertThat(rows).hasSize(TOTAL_SAMPLES); + + // First sample: ts=0, sensor="s1", value=1.0 + assertThat((long) rows.get(0)[0]).isEqualTo(0L); + assertThat(rows.get(0)[1]).isEqualTo("s1"); + assertThat((double) rows.get(0)[2]).isEqualTo(1.0); + + // Last sample: ts=199999*54, sensor="s1", value=200000.0 + final Object[] last = rows.get(TOTAL_SAMPLES - 1); + assertThat((long) last[0]).isEqualTo((long) (TOTAL_SAMPLES - 1) * INTERVAL_MS); + assertThat(last[1]).isEqualTo("s1"); + assertThat((double) last[2]).isEqualTo((double) TOTAL_SAMPLES); + } + + @Test + void testPerBucketAggregationExact() throws Exception { + createAndPopulate(); + + final ResultSet rs = database.query("sql", + "SELECT ts.timeBucket('1h', ts) AS hour, count(*) AS cnt, sum(value) AS sum_val, " + + "min(value) AS min_val, max(value) AS max_val, avg(value) AS avg_val " + + "FROM Sensor GROUP BY hour ORDER BY hour"); + + final List results = collectResults(rs); + assertThat(results).hasSize(3); + + // Sort by hour to ensure deterministic order + results.sort((a, b) -> ((Date) a.getProperty("hour")).compareTo((Date) b.getProperty("hour"))); + + // Bucket 0 + assertBucketAggregates(results.get(0), BUCKET_0_START, BUCKET_0_END); + + // Bucket 1 + assertBucketAggregates(results.get(1), BUCKET_1_START, BUCKET_1_END); + + // Bucket 2 + assertBucketAggregates(results.get(2), BUCKET_2_START, BUCKET_2_END); + } + + @Test + void testGlobalAggregationExact() throws Exception { + createAndPopulate(); + + // Global: N=200000, values 1..200000 + // SUM = 200000 * 200001 / 2 = 20,000,100,000 + // MIN = 1.0, MAX = 200000.0, AVG = 100000.5, COUNT = 200000 + final ResultSet rs = database.query("sql", + "SELECT count(*) AS cnt, sum(value) AS sum_val, min(value) AS min_val, " + + "max(value) AS max_val, avg(value) AS avg_val FROM Sensor"); + + assertThat(rs.hasNext()).isTrue(); + final Result row = rs.next(); + rs.close(); + + assertThat(((Number) row.getProperty("cnt")).longValue()).isEqualTo(TOTAL_SAMPLES); + assertThat(((Number) row.getProperty("sum_val")).doubleValue()).isCloseTo(20_000_100_000.0, within(1.0)); + assertThat(((Number) row.getProperty("min_val")).doubleValue()).isEqualTo(1.0); + assertThat(((Number) row.getProperty("max_val")).doubleValue()).isEqualTo(200_000.0); + assertThat(((Number) row.getProperty("avg_val")).doubleValue()).isCloseTo(100_000.5, within(0.01)); + } + + @Test + void testDirectApiAggregationMatchesSQL() throws Exception { + final TimeSeriesEngine engine = createAndPopulate(); + + // Direct API aggregation (column index 2 = value, -1 = count) + final MultiColumnAggregationResult apiResult = engine.aggregateMulti( + Long.MIN_VALUE, Long.MAX_VALUE, + List.of( + new MultiColumnAggregationRequest(2, AggregationType.SUM, "sum_val"), + new MultiColumnAggregationRequest(2, AggregationType.MIN, "min_val"), + new MultiColumnAggregationRequest(2, AggregationType.MAX, "max_val"), + new MultiColumnAggregationRequest(2, AggregationType.AVG, "avg_val"), + new MultiColumnAggregationRequest(-1, AggregationType.COUNT, "cnt") + ), + HOUR_MS, null); + + // SQL aggregation per bucket + final ResultSet rs = database.query("sql", + "SELECT ts.timeBucket('1h', ts) AS hour, sum(value) AS sum_val, min(value) AS min_val, " + + "max(value) AS max_val, avg(value) AS avg_val, count(*) AS cnt " + + "FROM Sensor GROUP BY hour ORDER BY hour"); + final List sqlResults = collectResults(rs); + sqlResults.sort((a, b) -> ((Date) a.getProperty("hour")).compareTo((Date) b.getProperty("hour"))); + + // Compare API vs SQL for each bucket + final List bucketTimestamps = apiResult.getBucketTimestamps(); + assertThat(bucketTimestamps).hasSize(sqlResults.size()); + + for (int i = 0; i < bucketTimestamps.size(); i++) { + final long bucketTs = bucketTimestamps.get(i); + final Result sqlRow = sqlResults.get(i); + + assertThat(apiResult.getValue(bucketTs, 0)) + .as("SUM bucket %d", i) + .isCloseTo(((Number) sqlRow.getProperty("sum_val")).doubleValue(), within(1.0)); + assertThat(apiResult.getValue(bucketTs, 1)) + .as("MIN bucket %d", i) + .isCloseTo(((Number) sqlRow.getProperty("min_val")).doubleValue(), within(0.01)); + assertThat(apiResult.getValue(bucketTs, 2)) + .as("MAX bucket %d", i) + .isCloseTo(((Number) sqlRow.getProperty("max_val")).doubleValue(), within(0.01)); + assertThat(apiResult.getValue(bucketTs, 3)) + .as("AVG bucket %d", i) + .isCloseTo(((Number) sqlRow.getProperty("avg_val")).doubleValue(), within(0.01)); + assertThat((long) apiResult.getValue(bucketTs, 4)) + .as("COUNT bucket %d", i) + .isEqualTo(((Number) sqlRow.getProperty("cnt")).longValue()); + } + } + + @Test + void testRangeQueryAccuracy() throws Exception { + final TimeSeriesEngine engine = createAndPopulate(); + + // Query hour 1 only: timestamps [3_600_000, 7_200_000) + // Samples: i in [66667, 133333], values = 66668..133334 + final long fromTs = BUCKET_1_START * INTERVAL_MS; + final long toTs = BUCKET_1_END * INTERVAL_MS; + + final List rows = engine.query(fromTs, toTs, null, null); + + final int expectedCount = BUCKET_1_END - BUCKET_1_START + 1; + assertThat(rows).hasSize(expectedCount); + + // Verify SUM via direct API on the same range + final MultiColumnAggregationResult result = engine.aggregateMulti( + fromTs, toTs, + List.of( + new MultiColumnAggregationRequest(2, AggregationType.SUM, "sum_val"), + new MultiColumnAggregationRequest(-1, AggregationType.COUNT, "cnt") + ), + 0L, null); + + final long bucketTs = result.getBucketTimestamps().get(0); + final double expectedSum = rangeSum(BUCKET_1_START, BUCKET_1_END); + + assertThat((long) result.getValue(bucketTs, 1)).isEqualTo(expectedCount); + assertThat(result.getValue(bucketTs, 0)).isCloseTo(expectedSum, within(1.0)); + } + + // ---- helpers ---- + + private TimeSeriesEngine createAndPopulate() throws Exception { + database.command("sql", + "CREATE TIMESERIES TYPE Sensor TIMESTAMP ts TAGS (sensor STRING) FIELDS (value DOUBLE) " + + "SHARDS 1 COMPACTION_INTERVAL 1 HOURS"); + + final TimeSeriesEngine engine = ((LocalTimeSeriesType) database.getSchema().getType("Sensor")).getEngine(); + + final long[] timestamps = new long[TOTAL_SAMPLES]; + final Object[] sensors = new Object[TOTAL_SAMPLES]; + final Object[] values = new Object[TOTAL_SAMPLES]; + for (int i = 0; i < TOTAL_SAMPLES; i++) { + timestamps[i] = i * INTERVAL_MS; + sensors[i] = "s1"; + values[i] = (double) (i + 1); + } + + database.begin(); + engine.appendSamples(timestamps, sensors, values); + database.commit(); + + engine.compactAll(); + + return engine; + } + + /** Sum of values for sample indices [start, end] where value[i] = i + 1. */ + private static double rangeSum(final int start, final int end) { + // Sum of (start+1) + (start+2) + ... + (end+1) + // = sum(1..end+1) - sum(1..start) + // = (end+1)*(end+2)/2 - start*(start+1)/2 + return (long) (end + 1) * (end + 2) / 2.0 - (long) start * (start + 1) / 2.0; + } + + private void assertBucketAggregates(final Result row, final int start, final int end) { + final int count = end - start + 1; + final double sum = rangeSum(start, end); + final double min = start + 1.0; + final double max = end + 1.0; + final double avg = sum / count; + + assertThat(((Number) row.getProperty("cnt")).longValue()).isEqualTo(count); + assertThat(((Number) row.getProperty("sum_val")).doubleValue()).isCloseTo(sum, within(1.0)); + assertThat(((Number) row.getProperty("min_val")).doubleValue()).isEqualTo(min); + assertThat(((Number) row.getProperty("max_val")).doubleValue()).isEqualTo(max); + assertThat(((Number) row.getProperty("avg_val")).doubleValue()).isCloseTo(avg, within(0.01)); + } + + private List collectResults(final ResultSet rs) { + final List results = new ArrayList<>(); + while (rs.hasNext()) + results.add(rs.next()); + rs.close(); + return results; + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesAggregationPushDownTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesAggregationPushDownTest.java new file mode 100644 index 0000000000..323cd8c7e4 --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesAggregationPushDownTest.java @@ -0,0 +1,227 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.TestHelper; +import com.arcadedb.query.sql.executor.Result; +import com.arcadedb.query.sql.executor.ResultSet; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; + +import java.util.ArrayList; +import java.util.Date; +import java.util.List; + +import static org.assertj.core.api.Assertions.assertThat; +import static org.assertj.core.api.Assertions.within; + +/** + * Tests for TimeSeries aggregation push-down optimization. + * Verifies that SQL aggregation queries with ts.timeBucket GROUP BY + * are pushed down into the engine for direct block-level processing. + */ +class TimeSeriesAggregationPushDownTest extends TestHelper { + + @BeforeEach + void setupData() { + database.command("sql", + "CREATE TIMESERIES TYPE SensorData TIMESTAMP ts FIELDS (temperature DOUBLE, humidity DOUBLE)"); + + database.transaction(() -> { + // Insert 12 samples across 3 hour-buckets (3600000ms = 1h) + // Bucket 0 (0ms): 10, 20, 30, 40 => avg=25, max=40, min=10, sum=100, count=4 + database.command("sql", "INSERT INTO SensorData SET ts = 0, temperature = 10.0, humidity = 50.0"); + database.command("sql", "INSERT INTO SensorData SET ts = 1000, temperature = 20.0, humidity = 55.0"); + database.command("sql", "INSERT INTO SensorData SET ts = 2000, temperature = 30.0, humidity = 60.0"); + database.command("sql", "INSERT INTO SensorData SET ts = 3000, temperature = 40.0, humidity = 65.0"); + + // Bucket 1 (3600000ms): 100, 200 => avg=150, max=200, min=100, sum=300, count=2 + database.command("sql", "INSERT INTO SensorData SET ts = 3600000, temperature = 100.0, humidity = 70.0"); + database.command("sql", "INSERT INTO SensorData SET ts = 3601000, temperature = 200.0, humidity = 80.0"); + + // Bucket 2 (7200000ms): 5, 15, 25, 35, 45, 55 => avg=30, max=55, min=5, sum=180, count=6 + database.command("sql", "INSERT INTO SensorData SET ts = 7200000, temperature = 5.0, humidity = 30.0"); + database.command("sql", "INSERT INTO SensorData SET ts = 7201000, temperature = 15.0, humidity = 35.0"); + database.command("sql", "INSERT INTO SensorData SET ts = 7202000, temperature = 25.0, humidity = 40.0"); + database.command("sql", "INSERT INTO SensorData SET ts = 7203000, temperature = 35.0, humidity = 45.0"); + database.command("sql", "INSERT INTO SensorData SET ts = 7204000, temperature = 45.0, humidity = 50.0"); + database.command("sql", "INSERT INTO SensorData SET ts = 7205000, temperature = 55.0, humidity = 55.0"); + }); + } + + @Test + void testBasicHourlyAvg() { + final ResultSet rs = database.query("sql", + "SELECT ts.timeBucket('1h', ts) AS hour, avg(temperature) AS avg_temp FROM SensorData GROUP BY hour"); + + final List results = collectResults(rs); + assertThat(results).hasSize(3); + + // Sort by hour to ensure deterministic order + results.sort((a, b) -> ((Date) a.getProperty("hour")).compareTo((Date) b.getProperty("hour"))); + + assertThat(((Number) results.get(0).getProperty("avg_temp")).doubleValue()).isCloseTo(25.0, within(0.01)); + assertThat(((Number) results.get(1).getProperty("avg_temp")).doubleValue()).isCloseTo(150.0, within(0.01)); + assertThat(((Number) results.get(2).getProperty("avg_temp")).doubleValue()).isCloseTo(30.0, within(0.01)); + } + + @Test + void testMultiColumnAggregation() { + final ResultSet rs = database.query("sql", + "SELECT ts.timeBucket('1h', ts) AS hour, avg(temperature) AS avg_temp, max(humidity) AS max_hum FROM SensorData GROUP BY hour"); + + final List results = collectResults(rs); + assertThat(results).hasSize(3); + + results.sort((a, b) -> ((Date) a.getProperty("hour")).compareTo((Date) b.getProperty("hour"))); + + // Bucket 0: avg(temp)=25, max(humidity)=65 + assertThat(((Number) results.get(0).getProperty("avg_temp")).doubleValue()).isCloseTo(25.0, within(0.01)); + assertThat(((Number) results.get(0).getProperty("max_hum")).doubleValue()).isCloseTo(65.0, within(0.01)); + + // Bucket 1: avg(temp)=150, max(humidity)=80 + assertThat(((Number) results.get(1).getProperty("avg_temp")).doubleValue()).isCloseTo(150.0, within(0.01)); + assertThat(((Number) results.get(1).getProperty("max_hum")).doubleValue()).isCloseTo(80.0, within(0.01)); + + // Bucket 2: avg(temp)=30, max(humidity)=55 + assertThat(((Number) results.get(2).getProperty("avg_temp")).doubleValue()).isCloseTo(30.0, within(0.01)); + assertThat(((Number) results.get(2).getProperty("max_hum")).doubleValue()).isCloseTo(55.0, within(0.01)); + } + + @Test + void testCountWithTimeBucket() { + final ResultSet rs = database.query("sql", + "SELECT ts.timeBucket('1h', ts) AS hour, count(*) AS cnt FROM SensorData GROUP BY hour"); + + final List results = collectResults(rs); + assertThat(results).hasSize(3); + + results.sort((a, b) -> ((Date) a.getProperty("hour")).compareTo((Date) b.getProperty("hour"))); + + assertThat(((Number) results.get(0).getProperty("cnt")).longValue()).isEqualTo(4); + assertThat(((Number) results.get(1).getProperty("cnt")).longValue()).isEqualTo(2); + assertThat(((Number) results.get(2).getProperty("cnt")).longValue()).isEqualTo(6); + } + + @Test + void testWithWhereBetween() { + // Only buckets 0 and 1 should be included (0 to 3601000) + final ResultSet rs = database.query("sql", + "SELECT ts.timeBucket('1h', ts) AS hour, avg(temperature) AS avg_temp FROM SensorData WHERE ts BETWEEN 0 AND 3601000 GROUP BY hour"); + + final List results = collectResults(rs); + assertThat(results).hasSize(2); + + results.sort((a, b) -> ((Date) a.getProperty("hour")).compareTo((Date) b.getProperty("hour"))); + + assertThat(((Number) results.get(0).getProperty("avg_temp")).doubleValue()).isCloseTo(25.0, within(0.01)); + assertThat(((Number) results.get(1).getProperty("avg_temp")).doubleValue()).isCloseTo(150.0, within(0.01)); + } + + @Test + void testSumAggregation() { + final ResultSet rs = database.query("sql", + "SELECT ts.timeBucket('1h', ts) AS hour, sum(temperature) AS sum_temp FROM SensorData GROUP BY hour"); + + final List results = collectResults(rs); + assertThat(results).hasSize(3); + + results.sort((a, b) -> ((Date) a.getProperty("hour")).compareTo((Date) b.getProperty("hour"))); + + assertThat(((Number) results.get(0).getProperty("sum_temp")).doubleValue()).isCloseTo(100.0, within(0.01)); + assertThat(((Number) results.get(1).getProperty("sum_temp")).doubleValue()).isCloseTo(300.0, within(0.01)); + assertThat(((Number) results.get(2).getProperty("sum_temp")).doubleValue()).isCloseTo(180.0, within(0.01)); + } + + @Test + void testMinAggregation() { + final ResultSet rs = database.query("sql", + "SELECT ts.timeBucket('1h', ts) AS hour, min(temperature) AS min_temp FROM SensorData GROUP BY hour"); + + final List results = collectResults(rs); + assertThat(results).hasSize(3); + + results.sort((a, b) -> ((Date) a.getProperty("hour")).compareTo((Date) b.getProperty("hour"))); + + assertThat(((Number) results.get(0).getProperty("min_temp")).doubleValue()).isCloseTo(10.0, within(0.01)); + assertThat(((Number) results.get(1).getProperty("min_temp")).doubleValue()).isCloseTo(100.0, within(0.01)); + assertThat(((Number) results.get(2).getProperty("min_temp")).doubleValue()).isCloseTo(5.0, within(0.01)); + } + + @Test + void testEmptyResultSet() { + // Query a range with no data + final ResultSet rs = database.query("sql", + "SELECT ts.timeBucket('1h', ts) AS hour, avg(temperature) AS avg_temp FROM SensorData WHERE ts BETWEEN 999999999 AND 999999999 GROUP BY hour"); + + final List results = collectResults(rs); + assertThat(results).isEmpty(); + } + + @Test + void testAllRowsInOneBucket() { + // Use a very large bucket interval (1 day) so all rows fall in one bucket + final ResultSet rs = database.query("sql", + "SELECT ts.timeBucket('1d', ts) AS day, avg(temperature) AS avg_temp, count(*) AS cnt FROM SensorData GROUP BY day"); + + final List results = collectResults(rs); + assertThat(results).hasSize(1); + + // Overall: (10+20+30+40+100+200+5+15+25+35+45+55) / 12 = 580/12 = 48.333... + assertThat(((Number) results.get(0).getProperty("avg_temp")).doubleValue()).isCloseTo(48.333, within(0.01)); + assertThat(((Number) results.get(0).getProperty("cnt")).longValue()).isEqualTo(12); + } + + @Test + void testFallbackWithDistinct() { + // DISTINCT should prevent push-down and fall through to normal execution + // This verifies the fallback path still works + final ResultSet rs = database.query("sql", + "SELECT DISTINCT temperature FROM SensorData"); + + final List results = collectResults(rs); + // All 12 temperatures are unique + assertThat(results).hasSize(12); + } + + @Test + void testEquivalenceWithFallback() { + // Push-down path: ts.timeBucket GROUP BY + final ResultSet rsPushDown = database.query("sql", + "SELECT ts.timeBucket('1h', ts) AS hour, avg(temperature) AS avg_temp, max(temperature) AS max_temp FROM SensorData GROUP BY hour"); + final List pushDownResults = collectResults(rsPushDown); + pushDownResults.sort((a, b) -> ((Date) a.getProperty("hour")).compareTo((Date) b.getProperty("hour"))); + + // Verify values match expected + assertThat(pushDownResults).hasSize(3); + assertThat(((Number) pushDownResults.get(0).getProperty("avg_temp")).doubleValue()).isCloseTo(25.0, within(0.01)); + assertThat(((Number) pushDownResults.get(0).getProperty("max_temp")).doubleValue()).isCloseTo(40.0, within(0.01)); + assertThat(((Number) pushDownResults.get(1).getProperty("avg_temp")).doubleValue()).isCloseTo(150.0, within(0.01)); + assertThat(((Number) pushDownResults.get(1).getProperty("max_temp")).doubleValue()).isCloseTo(200.0, within(0.01)); + assertThat(((Number) pushDownResults.get(2).getProperty("avg_temp")).doubleValue()).isCloseTo(30.0, within(0.01)); + assertThat(((Number) pushDownResults.get(2).getProperty("max_temp")).doubleValue()).isCloseTo(55.0, within(0.01)); + } + + private List collectResults(final ResultSet rs) { + final List results = new ArrayList<>(); + while (rs.hasNext()) + results.add(rs.next()); + return results; + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesBlockStatsTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesBlockStatsTest.java new file mode 100644 index 0000000000..4eecc38abf --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesBlockStatsTest.java @@ -0,0 +1,324 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.engine.timeseries.codec.DeltaOfDeltaCodec; +import com.arcadedb.engine.timeseries.codec.GorillaXORCodec; +import com.arcadedb.engine.timeseries.codec.Simple8bCodec; +import com.arcadedb.schema.Type; +import com.arcadedb.utility.FileUtils; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; + +import java.io.File; +import java.util.List; + +import static org.assertj.core.api.Assertions.assertThat; +import static org.assertj.core.api.Assertions.within; + +/** + * Tests for per-block statistics in sealed TimeSeries blocks. + * Covers: stats persistence/reload, aggregation fast path, + * boundary blocks (slow path), truncation preserving stats, + * and equivalence between stats-based and decompression-based results. + */ +class TimeSeriesBlockStatsTest { + + private static final String TEST_PATH = "target/databases/TimeSeriesBlockStatsTest/sealed"; + private List columns; + + @BeforeEach + void setUp() { + FileUtils.deleteRecursively(new File("target/databases/TimeSeriesBlockStatsTest")); + new File("target/databases/TimeSeriesBlockStatsTest").mkdirs(); + + columns = List.of( + new ColumnDefinition("ts", Type.LONG, ColumnDefinition.ColumnRole.TIMESTAMP), + new ColumnDefinition("temperature", Type.DOUBLE, ColumnDefinition.ColumnRole.FIELD), + new ColumnDefinition("count", Type.LONG, ColumnDefinition.ColumnRole.FIELD) + ); + } + + @AfterEach + void tearDown() { + FileUtils.deleteRecursively(new File("target/databases/TimeSeriesBlockStatsTest")); + } + + @Test + void testAppendBlockWithStatsAndReload() throws Exception { + final long[] timestamps = { 1000L, 2000L, 3000L, 4000L, 5000L }; + final double[] temperatures = { 10.0, 20.0, 30.0, 40.0, 50.0 }; + final long[] counts = { 1L, 2L, 3L, 4L, 5L }; + + final byte[][] compressed = { + DeltaOfDeltaCodec.encode(timestamps), + GorillaXORCodec.encode(temperatures), + Simple8bCodec.encode(counts) + }; + + final double[] mins = { Double.NaN, 10.0, 1.0 }; + final double[] maxs = { Double.NaN, 50.0, 5.0 }; + final double[] sums = { Double.NaN, 150.0, 15.0 }; + + // Write block with stats + try (final TimeSeriesSealedStore store = new TimeSeriesSealedStore(TEST_PATH, columns)) { + store.appendBlock(5, 1000L, 5000L, compressed, mins, maxs, sums, null); + assertThat(store.getBlockCount()).isEqualTo(1); + } + + // Reload and verify stats are preserved + try (final TimeSeriesSealedStore store = new TimeSeriesSealedStore(TEST_PATH, columns)) { + assertThat(store.getBlockCount()).isEqualTo(1); + assertThat(store.getGlobalMinTimestamp()).isEqualTo(1000L); + assertThat(store.getGlobalMaxTimestamp()).isEqualTo(5000L); + + // Data should still be readable + final List results = store.scanRange(1000L, 5000L, null, null); + assertThat(results).hasSize(5); + assertThat((double) results.get(0)[1]).isEqualTo(10.0); + assertThat((double) results.get(4)[1]).isEqualTo(50.0); + } + } + + @Test + void testAggregationUsesStatsFastPath() throws Exception { + // Block fits entirely within one 1-hour bucket (bucket interval = 3600000ms) + final long[] timestamps = { 0L, 1000L, 2000L, 3000L, 4000L }; + final double[] temperatures = { 10.0, 20.0, 30.0, 40.0, 50.0 }; + final long[] counts = { 2L, 4L, 6L, 8L, 10L }; + + final byte[][] compressed = { + DeltaOfDeltaCodec.encode(timestamps), + GorillaXORCodec.encode(temperatures), + Simple8bCodec.encode(counts) + }; + + final double[] mins = { Double.NaN, 10.0, 2.0 }; + final double[] maxs = { Double.NaN, 50.0, 10.0 }; + final double[] sums = { Double.NaN, 150.0, 30.0 }; + + try (final TimeSeriesSealedStore store = new TimeSeriesSealedStore(TEST_PATH, columns)) { + store.appendBlock(5, 0L, 4000L, compressed, mins, maxs, sums, null); + + final long bucketInterval = 3600000L; // 1 hour + + final List requests = List.of( + new MultiColumnAggregationRequest(1, AggregationType.AVG, "avg_temp"), + new MultiColumnAggregationRequest(1, AggregationType.MIN, "min_temp"), + new MultiColumnAggregationRequest(1, AggregationType.MAX, "max_temp"), + new MultiColumnAggregationRequest(1, AggregationType.SUM, "sum_temp"), + new MultiColumnAggregationRequest(-1, AggregationType.COUNT, "cnt") + ); + + final MultiColumnAggregationResult result = new MultiColumnAggregationResult(requests); + store.aggregateMultiBlocks(0L, 4000L, requests, bucketInterval, result, null, null); + result.finalizeAvg(); + + assertThat(result.size()).isEqualTo(1); + final long bucket = result.getBucketTimestamps().get(0); + assertThat(bucket).isEqualTo(0L); + + // AVG = 150/5 = 30 + assertThat(result.getValue(bucket, 0)).isCloseTo(30.0, within(0.01)); + // MIN = 10 + assertThat(result.getValue(bucket, 1)).isCloseTo(10.0, within(0.01)); + // MAX = 50 + assertThat(result.getValue(bucket, 2)).isCloseTo(50.0, within(0.01)); + // SUM = 150 + assertThat(result.getValue(bucket, 3)).isCloseTo(150.0, within(0.01)); + // COUNT = 5 + assertThat(result.getValue(bucket, 4)).isCloseTo(5.0, within(0.01)); + } + } + + @Test + void testBoundaryBlockUsesSlowPath() throws Exception { + // Block spans two 1-second buckets: timestamps 500-1500ms + // bucket(500)=0, bucket(1500)=1000 → two buckets → slow path + final long[] timestamps = { 500L, 800L, 1200L, 1500L }; + final double[] temperatures = { 10.0, 20.0, 30.0, 40.0 }; + final long[] counts = { 1L, 2L, 3L, 4L }; + + final byte[][] compressed = { + DeltaOfDeltaCodec.encode(timestamps), + GorillaXORCodec.encode(temperatures), + Simple8bCodec.encode(counts) + }; + + final double[] mins = { Double.NaN, 10.0, 1.0 }; + final double[] maxs = { Double.NaN, 40.0, 4.0 }; + final double[] sums = { Double.NaN, 100.0, 10.0 }; + + try (final TimeSeriesSealedStore store = new TimeSeriesSealedStore(TEST_PATH, columns)) { + store.appendBlock(4, 500L, 1500L, compressed, mins, maxs, sums, null); + + final long bucketInterval = 1000L; + + final List requests = List.of( + new MultiColumnAggregationRequest(1, AggregationType.AVG, "avg_temp"), + new MultiColumnAggregationRequest(1, AggregationType.SUM, "sum_temp") + ); + + final MultiColumnAggregationResult result = new MultiColumnAggregationResult(requests); + store.aggregateMultiBlocks(500L, 1500L, requests, bucketInterval, result, null, null); + result.finalizeAvg(); + + // Should have 2 buckets: 0 and 1000 + assertThat(result.size()).isEqualTo(2); + + // Bucket 0 (500, 800): avg=(10+20)/2=15, sum=30 + assertThat(result.getValue(0L, 0)).isCloseTo(15.0, within(0.01)); + assertThat(result.getValue(0L, 1)).isCloseTo(30.0, within(0.01)); + + // Bucket 1000 (1200, 1500): avg=(30+40)/2=35, sum=70 + assertThat(result.getValue(1000L, 0)).isCloseTo(35.0, within(0.01)); + assertThat(result.getValue(1000L, 1)).isCloseTo(70.0, within(0.01)); + } + } + + @Test + void testMultipleBlocksAggregation() throws Exception { + final byte[][] block1 = { + DeltaOfDeltaCodec.encode(new long[] { 1000L, 2000L }), + GorillaXORCodec.encode(new double[] { 10.0, 20.0 }), + Simple8bCodec.encode(new long[] { 1L, 2L }) + }; + + final byte[][] block2 = { + DeltaOfDeltaCodec.encode(new long[] { 3000L, 4000L }), + GorillaXORCodec.encode(new double[] { 30.0, 40.0 }), + Simple8bCodec.encode(new long[] { 3L, 4L }) + }; + + try (final TimeSeriesSealedStore store = new TimeSeriesSealedStore(TEST_PATH, columns)) { + store.appendBlock(2, 1000L, 2000L, block1, + new double[] { Double.NaN, 10.0, 1.0 }, + new double[] { Double.NaN, 20.0, 2.0 }, + new double[] { Double.NaN, 30.0, 3.0 }, null); + + store.appendBlock(2, 3000L, 4000L, block2, + new double[] { Double.NaN, 30.0, 3.0 }, + new double[] { Double.NaN, 40.0, 4.0 }, + new double[] { Double.NaN, 70.0, 7.0 }, null); + + assertThat(store.getBlockCount()).isEqualTo(2); + + // Aggregation over both blocks (both fit in 1h bucket → fast path) + final List requests = List.of( + new MultiColumnAggregationRequest(1, AggregationType.SUM, "sum_temp"), + new MultiColumnAggregationRequest(-1, AggregationType.COUNT, "cnt") + ); + + final MultiColumnAggregationResult result = new MultiColumnAggregationResult(requests); + store.aggregateMultiBlocks(1000L, 4000L, requests, 3600000L, result, null, null); + + // SUM = 10+20+30+40 = 100 + final long bucket = result.getBucketTimestamps().get(0); + assertThat(result.getValue(bucket, 0)).isCloseTo(100.0, within(0.01)); + // COUNT = 4 + assertThat(result.getValue(bucket, 1)).isCloseTo(4.0, within(0.01)); + } + } + + @Test + void testTruncatePreservesStats() throws Exception { + final byte[][] block1 = { + DeltaOfDeltaCodec.encode(new long[] { 1000L, 2000L }), + GorillaXORCodec.encode(new double[] { 10.0, 20.0 }), + Simple8bCodec.encode(new long[] { 1L, 2L }) + }; + + final byte[][] block2 = { + DeltaOfDeltaCodec.encode(new long[] { 5000L, 6000L }), + GorillaXORCodec.encode(new double[] { 50.0, 60.0 }), + Simple8bCodec.encode(new long[] { 5L, 6L }) + }; + + try (final TimeSeriesSealedStore store = new TimeSeriesSealedStore(TEST_PATH, columns)) { + store.appendBlock(2, 1000L, 2000L, block1, + new double[] { Double.NaN, 10.0, 1.0 }, + new double[] { Double.NaN, 20.0, 2.0 }, + new double[] { Double.NaN, 30.0, 3.0 }, null); + + store.appendBlock(2, 5000L, 6000L, block2, + new double[] { Double.NaN, 50.0, 5.0 }, + new double[] { Double.NaN, 60.0, 6.0 }, + new double[] { Double.NaN, 110.0, 11.0 }, null); + + // Truncate: remove block 1 + store.truncateBefore(3000L); + assertThat(store.getBlockCount()).isEqualTo(1); + + // Verify aggregation still works with stats on the retained block + final List requests = List.of( + new MultiColumnAggregationRequest(1, AggregationType.SUM, "sum_temp"), + new MultiColumnAggregationRequest(1, AggregationType.MIN, "min_temp"), + new MultiColumnAggregationRequest(1, AggregationType.MAX, "max_temp") + ); + + final MultiColumnAggregationResult result = new MultiColumnAggregationResult(requests); + store.aggregateMultiBlocks(5000L, 6000L, requests, 3600000L, result, null, null); + + final long bucket = result.getBucketTimestamps().get(0); + assertThat(result.getValue(bucket, 0)).isCloseTo(110.0, within(0.01)); + assertThat(result.getValue(bucket, 1)).isCloseTo(50.0, within(0.01)); + assertThat(result.getValue(bucket, 2)).isCloseTo(60.0, within(0.01)); + } + } + + @Test + void testTruncatePreservesStatsAfterReload() throws Exception { + final byte[][] block1 = { + DeltaOfDeltaCodec.encode(new long[] { 1000L, 2000L }), + GorillaXORCodec.encode(new double[] { 10.0, 20.0 }), + Simple8bCodec.encode(new long[] { 1L, 2L }) + }; + + final byte[][] block2 = { + DeltaOfDeltaCodec.encode(new long[] { 5000L, 6000L }), + GorillaXORCodec.encode(new double[] { 50.0, 60.0 }), + Simple8bCodec.encode(new long[] { 5L, 6L }) + }; + + try (final TimeSeriesSealedStore store = new TimeSeriesSealedStore(TEST_PATH, columns)) { + store.appendBlock(2, 1000L, 2000L, block1, + new double[] { Double.NaN, 10.0, 1.0 }, + new double[] { Double.NaN, 20.0, 2.0 }, + new double[] { Double.NaN, 30.0, 3.0 }, null); + + store.appendBlock(2, 5000L, 6000L, block2, + new double[] { Double.NaN, 50.0, 5.0 }, + new double[] { Double.NaN, 60.0, 6.0 }, + new double[] { Double.NaN, 110.0, 11.0 }, null); + + store.truncateBefore(3000L); + } + + // Reload and verify the retained block is still intact + try (final TimeSeriesSealedStore store = new TimeSeriesSealedStore(TEST_PATH, columns)) { + assertThat(store.getBlockCount()).isEqualTo(1); + + final List results = store.scanRange(0L, 10000L, null, null); + assertThat(results).hasSize(2); + assertThat((double) results.get(0)[1]).isEqualTo(50.0); + assertThat((double) results.get(1)[1]).isEqualTo(60.0); + } + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesBucketTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesBucketTest.java new file mode 100644 index 0000000000..1971135d53 --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesBucketTest.java @@ -0,0 +1,205 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.TestHelper; +import com.arcadedb.database.DatabaseInternal; +import com.arcadedb.schema.LocalSchema; +import com.arcadedb.schema.Type; +import org.junit.jupiter.api.Test; + +import java.io.IOException; +import java.util.List; + +import static org.assertj.core.api.Assertions.assertThat; + +/** + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +class TimeSeriesBucketTest extends TestHelper { + + private List createTestColumns() { + return List.of( + new ColumnDefinition("ts", Type.LONG, ColumnDefinition.ColumnRole.TIMESTAMP), + new ColumnDefinition("sensor_id", Type.STRING, ColumnDefinition.ColumnRole.TAG), + new ColumnDefinition("temperature", Type.DOUBLE, ColumnDefinition.ColumnRole.FIELD) + ); + } + + private TimeSeriesBucket createAndRegisterBucket(final String name, final List cols) throws IOException { + final DatabaseInternal db = (DatabaseInternal) database; + final TimeSeriesBucket bucket = new TimeSeriesBucket(db, name, db.getDatabasePath() + "/" + name, cols); + ((LocalSchema) db.getSchema()).registerFile(bucket); + bucket.initHeaderPage(); + return bucket; + } + + @Test + void testCreateBucketAndAppend() throws Exception { + database.begin(); + final TimeSeriesBucket bucket = createAndRegisterBucket("test_ts_bucket", createTestColumns()); + + bucket.appendSamples( + new long[] { 1000L }, + new Object[] { "sensor_A" }, + new Object[] { 22.5 } + ); + database.commit(); + + database.begin(); + assertThat(bucket.getSampleCount()).isEqualTo(1); + assertThat(bucket.getMinTimestamp()).isEqualTo(1000L); + assertThat(bucket.getMaxTimestamp()).isEqualTo(1000L); + database.commit(); + } + + @Test + void testAppendMultipleSamples() throws Exception { + database.begin(); + final TimeSeriesBucket bucket = createAndRegisterBucket("test_ts_multi", createTestColumns()); + + final long[] timestamps = { 1000L, 2000L, 3000L, 4000L, 5000L }; + final Object[] sensorIds = { "A", "B", "A", "C", "B" }; + final Object[] temperatures = { 20.0, 21.5, 22.0, 19.5, 23.0 }; + + bucket.appendSamples(timestamps, sensorIds, temperatures); + database.commit(); + + database.begin(); + assertThat(bucket.getSampleCount()).isEqualTo(5); + assertThat(bucket.getMinTimestamp()).isEqualTo(1000L); + assertThat(bucket.getMaxTimestamp()).isEqualTo(5000L); + database.commit(); + } + + @Test + void testScanRange() throws Exception { + database.begin(); + final TimeSeriesBucket bucket = createAndRegisterBucket("test_ts_scan", createTestColumns()); + + final long[] timestamps = { 1000L, 2000L, 3000L, 4000L, 5000L }; + final Object[] sensorIds = { "A", "B", "A", "C", "B" }; + final Object[] temperatures = { 20.0, 21.5, 22.0, 19.5, 23.0 }; + + bucket.appendSamples(timestamps, sensorIds, temperatures); + database.commit(); + + database.begin(); + final List results = bucket.scanRange(2000L, 4000L, null); + assertThat(results).hasSize(3); + + assertThat((long) results.get(0)[0]).isEqualTo(2000L); + assertThat((String) results.get(0)[1]).isEqualTo("B"); + assertThat((double) results.get(0)[2]).isEqualTo(21.5); + + assertThat((long) results.get(2)[0]).isEqualTo(4000L); + assertThat((String) results.get(2)[1]).isEqualTo("C"); + assertThat((double) results.get(2)[2]).isEqualTo(19.5); + database.commit(); + } + + @Test + void testScanRangeEmpty() throws Exception { + database.begin(); + final TimeSeriesBucket bucket = createAndRegisterBucket("test_ts_empty", createTestColumns()); + + bucket.appendSamples( + new long[] { 1000L, 2000L }, + new Object[] { "A", "B" }, + new Object[] { 20.0, 21.0 } + ); + database.commit(); + + database.begin(); + final List results = bucket.scanRange(5000L, 6000L, null); + assertThat(results).isEmpty(); + database.commit(); + } + + @Test + void testNumericOnlyColumns() throws Exception { + final List cols = List.of( + new ColumnDefinition("ts", Type.LONG, ColumnDefinition.ColumnRole.TIMESTAMP), + new ColumnDefinition("value", Type.DOUBLE, ColumnDefinition.ColumnRole.FIELD), + new ColumnDefinition("count", Type.INTEGER, ColumnDefinition.ColumnRole.FIELD) + ); + + database.begin(); + final TimeSeriesBucket bucket = createAndRegisterBucket("test_ts_numeric", cols); + + bucket.appendSamples( + new long[] { 100L, 200L, 300L }, + new Object[] { 1.5, 2.5, 3.5 }, + new Object[] { 10, 20, 30 } + ); + database.commit(); + + database.begin(); + final List results = bucket.scanRange(100L, 300L, null); + assertThat(results).hasSize(3); + assertThat((double) results.get(0)[1]).isEqualTo(1.5); + assertThat((int) results.get(0)[2]).isEqualTo(10); + assertThat((double) results.get(2)[1]).isEqualTo(3.5); + assertThat((int) results.get(2)[2]).isEqualTo(30); + database.commit(); + } + + @Test + void testCompactionFlag() throws Exception { + database.begin(); + final TimeSeriesBucket bucket = createAndRegisterBucket("test_ts_compact", createTestColumns()); + + assertThat(bucket.isCompactionInProgress()).isFalse(); + bucket.setCompactionInProgress(true); + assertThat(bucket.isCompactionInProgress()).isTrue(); + bucket.setCompactionInProgress(false); + assertThat(bucket.isCompactionInProgress()).isFalse(); + database.commit(); + } + + @Test + void testReadAllForCompaction() throws Exception { + database.begin(); + final TimeSeriesBucket bucket = createAndRegisterBucket("test_ts_readall", createTestColumns()); + + bucket.appendSamples( + new long[] { 3000L, 1000L, 2000L }, + new Object[] { "C", "A", "B" }, + new Object[] { 30.0, 10.0, 20.0 } + ); + database.commit(); + + database.begin(); + final Object[] allData = bucket.readAllForCompaction(); + assertThat(allData).isNotNull(); + assertThat(allData).hasSize(3); // 3 columns + + final long[] ts = (long[]) allData[0]; + assertThat(ts).hasSize(3); + assertThat(ts[0]).isEqualTo(3000L); + assertThat(ts[1]).isEqualTo(1000L); + assertThat(ts[2]).isEqualTo(2000L); + database.commit(); + } + + @Override + protected boolean isCheckingDatabaseIntegrity() { + return false; + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesConcurrentAppendCompactTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesConcurrentAppendCompactTest.java new file mode 100644 index 0000000000..b7509a32cf --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesConcurrentAppendCompactTest.java @@ -0,0 +1,153 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.TestHelper; +import com.arcadedb.database.DatabaseInternal; +import com.arcadedb.exception.ConcurrentModificationException; +import com.arcadedb.schema.Type; +import org.junit.jupiter.api.Test; + +import java.util.List; +import java.util.concurrent.CountDownLatch; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicInteger; +import java.util.concurrent.atomic.AtomicReference; + +import static org.assertj.core.api.Assertions.assertThat; + +/** + * Tests concurrent append and compaction operations on the TimeSeries engine. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +class TimeSeriesConcurrentAppendCompactTest extends TestHelper { + + @Test + void testConcurrentAppendDuringCompaction() throws Exception { + final List columns = List.of( + new ColumnDefinition("ts", Type.LONG, ColumnDefinition.ColumnRole.TIMESTAMP), + new ColumnDefinition("value", Type.DOUBLE, ColumnDefinition.ColumnRole.FIELD) + ); + + database.begin(); + final TimeSeriesEngine engine = new TimeSeriesEngine((DatabaseInternal) database, "test_concurrent", columns, 2); + + // Insert initial data + for (int i = 0; i < 100; i++) + engine.appendSamples(new long[] { i * 1000L }, new Object[] { (double) i }); + database.commit(); + + // Launch concurrent writers and a compactor + final int writerCount = 4; + final int samplesPerWriter = 50; + final CountDownLatch startLatch = new CountDownLatch(1); + final CountDownLatch doneLatch = new CountDownLatch(writerCount + 1); + final AtomicReference error = new AtomicReference<>(); + final AtomicInteger totalAppended = new AtomicInteger(0); + final AtomicInteger concurrentRetries = new AtomicInteger(0); + + final ExecutorService executor = Executors.newFixedThreadPool(writerCount + 1); + + // Compactor thread — ConcurrentModificationException is expected under contention + executor.submit(() -> { + try { + startLatch.await(); + for (int retry = 0; retry < 5; retry++) { + try { + engine.compactAll(); + break; + } catch (final Exception e) { + if (hasConcurrentModification(e)) { + concurrentRetries.incrementAndGet(); + Thread.sleep(50); + } else + throw e; + } + } + } catch (final Throwable t) { + error.compareAndSet(null, t); + } finally { + doneLatch.countDown(); + } + }); + + // Writer threads + for (int w = 0; w < writerCount; w++) { + final int writerIdx = w; + executor.submit(() -> { + try { + startLatch.await(); + for (int i = 0; i < samplesPerWriter; i++) { + final long ts = 100_000L + writerIdx * 100_000L + i * 1000L; + for (int retry = 0; retry < 10; retry++) { + try { + database.begin(); + engine.appendSamples(new long[] { ts }, new Object[] { ts / 1000.0 }); + database.commit(); + totalAppended.incrementAndGet(); + break; + } catch (final ConcurrentModificationException e) { + concurrentRetries.incrementAndGet(); + } + } + } + } catch (final Throwable t) { + error.compareAndSet(null, t); + } finally { + doneLatch.countDown(); + } + }); + } + + // Start all threads simultaneously + startLatch.countDown(); + assertThat(doneLatch.await(30, TimeUnit.SECONDS)).isTrue(); + executor.shutdown(); + + if (error.get() != null) + throw new AssertionError("Concurrent operation failed", error.get()); + + // Verify all data is accessible + database.begin(); + final List results = engine.query(0, Long.MAX_VALUE, null, null); + database.commit(); + + // At least the initial samples should be present; some appended may have failed after max retries + assertThat(results.size()).isGreaterThanOrEqualTo(100); + + // Verify results are sorted by timestamp (query guarantees this) + for (int i = 1; i < results.size(); i++) + assertThat((long) results.get(i)[0]).isGreaterThanOrEqualTo((long) results.get(i - 1)[0]); + + engine.close(); + } + + private static boolean hasConcurrentModification(final Throwable t) { + Throwable cause = t; + while (cause != null) { + if (cause instanceof ConcurrentModificationException) + return true; + cause = cause.getCause(); + } + return false; + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesCrashRecoveryTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesCrashRecoveryTest.java new file mode 100644 index 0000000000..8dd704f8b4 --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesCrashRecoveryTest.java @@ -0,0 +1,375 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.TestHelper; +import com.arcadedb.database.DatabaseInternal; +import com.arcadedb.engine.timeseries.codec.DeltaOfDeltaCodec; +import com.arcadedb.engine.timeseries.codec.DictionaryCodec; +import com.arcadedb.engine.timeseries.codec.GorillaXORCodec; +import com.arcadedb.schema.Type; +import org.junit.jupiter.api.Test; + +import java.io.File; +import java.io.RandomAccessFile; +import java.nio.ByteBuffer; +import java.util.Arrays; +import java.util.List; + +import static org.assertj.core.api.Assertions.assertThat; + +/** + * Tests crash recovery for compaction: simulates an interrupted compaction by leaving the + * compaction-in-progress flag set, then reopens the shard and verifies no duplicate samples. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +class TimeSeriesCrashRecoveryTest extends TestHelper { + + private List createTestColumns() { + return List.of( + new ColumnDefinition("ts", Type.LONG, ColumnDefinition.ColumnRole.TIMESTAMP), + new ColumnDefinition("sensor", Type.STRING, ColumnDefinition.ColumnRole.TAG), + new ColumnDefinition("value", Type.DOUBLE, ColumnDefinition.ColumnRole.FIELD) + ); + } + + @Test + void testRecoveryAfterInterruptedCompaction() throws Exception { + final List columns = createTestColumns(); + + // Phase 1: Create shard, insert data, compact normally, then insert more data + database.begin(); + final TimeSeriesShard shard = new TimeSeriesShard( + (DatabaseInternal) database, "test_crash_recovery", 0, columns); + shard.appendSamples( + new long[] { 1000L, 2000L, 3000L }, + new Object[] { "A", "B", "A" }, + new Object[] { 10.0, 20.0, 30.0 } + ); + database.commit(); + + // Compact successfully first + shard.compact(); + + final long watermarkAfterFirstCompact = shard.getSealedStore().getBlockCount(); + assertThat(watermarkAfterFirstCompact).isGreaterThan(0); + + // Insert more data (these remain in the mutable bucket) + database.begin(); + shard.appendSamples( + new long[] { 4000L, 5000L }, + new Object[] { "C", "D" }, + new Object[] { 40.0, 50.0 } + ); + database.commit(); + + // Phase 2: Simulate a crash mid-compaction by manually setting the flag + // This mimics: compaction started (flag set, watermark saved), some sealed blocks + // were written, but mutable pages were NOT cleared before crash. + database.begin(); + shard.getMutableBucket().setCompactionInProgress(true); + shard.getMutableBucket().setCompactionWatermark(watermarkAfterFirstCompact); + database.commit(); + + // Also simulate partial sealed writes by compacting (writes sealed blocks), + // but the flag stays set because we set it above after the compact's flag clearing + // For a simpler test: just verify that reopening with the flag set does recovery + + shard.close(); + + // Phase 3: Reopen the shard — constructor should detect the flag and recover + database.begin(); + final TimeSeriesShard recoveredShard = new TimeSeriesShard( + (DatabaseInternal) database, "test_crash_recovery", 0, columns); + + // The compaction flag should have been cleared by recovery + assertThat(recoveredShard.getMutableBucket().isCompactionInProgress()).isFalse(); + + // Sealed store should have been truncated to the watermark + assertThat(recoveredShard.getSealedStore().getBlockCount()).isEqualTo(watermarkAfterFirstCompact); + + // Query all data — should have no duplicates + final List results = recoveredShard.scanRange(0, Long.MAX_VALUE, null, null); + database.commit(); + + // Original 3 samples (sealed) + 2 new samples (mutable) = 5 total, no duplicates + assertThat(results).hasSize(5); + + // Verify specific values are all present + final double[] values = results.stream().mapToDouble(r -> (double) r[2]).sorted().toArray(); + assertThat(values).containsExactly(10.0, 20.0, 30.0, 40.0, 50.0); + + recoveredShard.close(); + } + + @Test + void testRecoveryWithCleanState() throws Exception { + final List columns = createTestColumns(); + + // Create, insert, compact normally + database.begin(); + final TimeSeriesShard shard = new TimeSeriesShard( + (DatabaseInternal) database, "test_clean_recovery", 0, columns); + shard.appendSamples( + new long[] { 1000L, 2000L }, + new Object[] { "A", "B" }, + new Object[] { 10.0, 20.0 } + ); + database.commit(); + + shard.compact(); + shard.close(); + + // Reopen without any crash flag — should work normally + database.begin(); + final TimeSeriesShard recovered = new TimeSeriesShard( + (DatabaseInternal) database, "test_clean_recovery", 0, columns); + + assertThat(recovered.getMutableBucket().isCompactionInProgress()).isFalse(); + + final List results = recovered.scanRange(0, Long.MAX_VALUE, null, null); + database.commit(); + assertThat(results).hasSize(2); + + recovered.close(); + } + + /** + * Simulates a crash during file swap where the original .sealed file was replaced by ATOMIC_MOVE + * but a stale .tmp file remains on disk (e.g., from a prior interrupted maintenance task). + * Verifies that the stale .tmp file is cleaned up on startup and the shard opens normally. + */ + @Test + void testRecoveryWithStaleTmpFile() throws Exception { + final List columns = createTestColumns(); + + // Phase 1: Create shard, insert data, compact to create a valid .sealed file + database.begin(); + final TimeSeriesShard shard = new TimeSeriesShard( + (DatabaseInternal) database, "test_tmp_cleanup", 0, columns); + shard.appendSamples( + new long[] { 1000L, 2000L, 3000L }, + new Object[] { "A", "B", "A" }, + new Object[] { 10.0, 20.0, 30.0 } + ); + database.commit(); + shard.compact(); + + final long blockCount = shard.getSealedStore().getBlockCount(); + assertThat(blockCount).isGreaterThan(0); + shard.close(); + + // Phase 2: Create a stale .tmp file that simulates a leftover from an interrupted operation + final String shardPath = database.getDatabasePath() + "/test_tmp_cleanup_shard_0"; + final File tmpFile = new File(shardPath + ".ts.sealed.tmp"); + try (final RandomAccessFile tmp = new RandomAccessFile(tmpFile, "rw")) { + // Write some garbage data to simulate a partial temp file + tmp.write(new byte[] { 0x01, 0x02, 0x03, 0x04 }); + } + assertThat(tmpFile.exists()).isTrue(); + + // Phase 3: Reopen shard — constructor should clean up the stale .tmp file + database.begin(); + final TimeSeriesShard recovered = new TimeSeriesShard( + (DatabaseInternal) database, "test_tmp_cleanup", 0, columns); + + // Verify .tmp file was cleaned up + assertThat(tmpFile.exists()).isFalse(); + + // Verify sealed store is intact with the original blocks + assertThat(recovered.getSealedStore().getBlockCount()).isEqualTo(blockCount); + + // Verify data is correct + final List results = recovered.scanRange(0, Long.MAX_VALUE, null, null); + database.commit(); + assertThat(results).hasSize(3); + + final double[] values = results.stream().mapToDouble(r -> (double) r[2]).sorted().toArray(); + assertThat(values).containsExactly(10.0, 20.0, 30.0); + + recovered.close(); + } + + /** + * Simulates the most critical failure scenario for the old non-atomic file swap: the original + * .sealed file has been deleted but the .tmp file was never renamed (e.g., ENOSPC or OS crash + * between delete() and renameTo()). With ATOMIC_MOVE this path is no longer possible, but this + * test verifies that even in this degenerate state the sealed store recovers: the .tmp is cleaned + * up and a fresh empty sealed store is created. + */ + @Test + void testRecoveryWithMissingSealedAndOrphanedTmp() throws Exception { + final List columns = createTestColumns(); + + // Phase 1: Create shard, insert data, compact to produce sealed blocks + database.begin(); + final TimeSeriesShard shard = new TimeSeriesShard( + (DatabaseInternal) database, "test_orphan_tmp", 0, columns); + shard.appendSamples( + new long[] { 1000L, 2000L }, + new Object[] { "A", "B" }, + new Object[] { 10.0, 20.0 } + ); + database.commit(); + shard.compact(); + shard.close(); + + // Phase 2: Simulate the old non-atomic failure: delete .sealed, leave .tmp + final String shardPath = database.getDatabasePath() + "/test_orphan_tmp_shard_0"; + final File sealedFile = new File(shardPath + ".ts.sealed"); + final File tmpFile = new File(shardPath + ".ts.sealed.tmp"); + + // Copy the current sealed file to .tmp (simulating a temp file that was ready to be renamed) + java.nio.file.Files.copy(sealedFile.toPath(), tmpFile.toPath()); + // Delete the original sealed file (simulating delete() succeeded but renameTo() failed) + assertThat(sealedFile.delete()).isTrue(); + assertThat(sealedFile.exists()).isFalse(); + assertThat(tmpFile.exists()).isTrue(); + + // Phase 3: Reopen — sealed store constructor should: + // 1) Delete the stale .tmp file + // 2) Create a fresh empty .sealed file + database.begin(); + final TimeSeriesShard recovered = new TimeSeriesShard( + (DatabaseInternal) database, "test_orphan_tmp", 0, columns); + + // The .tmp should be cleaned up + assertThat(tmpFile.exists()).isFalse(); + // The .sealed should exist again (freshly created, empty) + assertThat(sealedFile.exists()).isTrue(); + + // Sealed store is empty (the old data was lost in the simulated crash), but the mutable + // bucket still has data if not cleared. Since we compacted and closed, the mutable bucket + // was cleared during compaction, so we expect 0 results from the sealed store. + assertThat(recovered.getSealedStore().getBlockCount()).isEqualTo(0); + + final List results = recovered.scanRange(0, Long.MAX_VALUE, null, null); + database.commit(); + + // In this catastrophic failure scenario (which ATOMIC_MOVE now prevents), the sealed data + // is lost. The mutable bucket data from before compaction was already consumed by compact(). + // This test primarily verifies the system doesn't crash and can resume accepting writes. + assertThat(results).isNotNull(); + + // Verify the shard can accept new writes after recovery + database.begin(); + recovered.appendSamples( + new long[] { 5000L }, + new Object[] { "X" }, + new Object[] { 99.0 } + ); + database.commit(); + + database.begin(); + final List newResults = recovered.scanRange(0, Long.MAX_VALUE, null, null); + database.commit(); + assertThat(newResults.stream().anyMatch(r -> (double) r[2] == 99.0)).isTrue(); + + recovered.close(); + } + + /** + * Regression test: verifies that extra sealed blocks written during an interrupted compaction + * are truncated back to the watermark on restart. + *

+ * Previously the test only set the flag on an already-empty window, so the truncation was a + * no-op and the actual block-removal path was never exercised. + */ + @Test + void testRecoveryTruncatesExtraSealedBlocks() throws Exception { + final List columns = createTestColumns(); + + // Step 1: Insert 3 samples and compact → 1 sealed block + database.begin(); + final TimeSeriesShard shard = new TimeSeriesShard( + (DatabaseInternal) database, "test_truncate_extra", 0, columns); + shard.appendSamples( + new long[] { 1000L, 2000L, 3000L }, + new Object[] { "A", "B", "A" }, + new Object[] { 10.0, 20.0, 30.0 } + ); + database.commit(); + + shard.compact(); + final long watermark = shard.getSealedStore().getBlockCount(); + assertThat(watermark).isGreaterThan(0); + + // Step 2: Insert more data into the mutable bucket (will be retained after recovery) + database.begin(); + shard.appendSamples( + new long[] { 4000L, 5000L }, + new Object[] { "C", "D" }, + new Object[] { 40.0, 50.0 } + ); + database.commit(); + + // Step 3: Simulate partial compaction — write extra blocks directly to the sealed store, + // mimicking the state after compact() wrote blocks but crashed before clearing the mutable bucket. + final double[] nanStats = new double[columns.size()]; + Arrays.fill(nanStats, Double.NaN); + final double[] mins = new double[columns.size()]; + final double[] maxs = new double[columns.size()]; + final double[] sums = new double[columns.size()]; + Arrays.fill(mins, Double.NaN); + Arrays.fill(maxs, Double.NaN); + final long[] extraTs = { 4000L, 5000L }; + final byte[][] compressedExtra = new byte[columns.size()][]; + compressedExtra[0] = DeltaOfDeltaCodec.encode(extraTs); // ts column + compressedExtra[1] = DictionaryCodec.encode(new String[]{"C","D"}); // tag column + final double[] extraVals = { 40.0, 50.0 }; + compressedExtra[2] = GorillaXORCodec.encode(extraVals); // field column + mins[2] = 40.0; + maxs[2] = 50.0; + sums[2] = 90.0; + shard.getSealedStore().appendBlock(2, 4000L, 5000L, compressedExtra, mins, maxs, sums, null); + shard.getSealedStore().flushHeader(); + + final long blocksWithExtra = shard.getSealedStore().getBlockCount(); + assertThat(blocksWithExtra).isGreaterThan(watermark); // extra blocks are present + + // Step 4: Set compaction-in-progress flag with the watermark pointing to BEFORE the extra blocks + database.begin(); + shard.getMutableBucket().setCompactionInProgress(true); + shard.getMutableBucket().setCompactionWatermark(watermark); + database.commit(); + + shard.close(); + + // Step 5: Reopen — crash recovery must truncate the extra blocks and clear the flag + database.begin(); + final TimeSeriesShard recovered = new TimeSeriesShard( + (DatabaseInternal) database, "test_truncate_extra", 0, columns); + + assertThat(recovered.getMutableBucket().isCompactionInProgress()).isFalse(); + // Extra blocks should have been removed; sealed store must be back to the watermark + assertThat(recovered.getSealedStore().getBlockCount()).isEqualTo(watermark); + + // Mutable bucket still has the 2 samples that were not part of the partial compaction + final List results = recovered.scanRange(0, Long.MAX_VALUE, null, null); + database.commit(); + + // 3 sealed (original) + 2 mutable = 5 unique samples, no duplicates from the extra blocks + assertThat(results).hasSize(5); + final double[] values = results.stream().mapToDouble(r -> (double) r[2]).sorted().toArray(); + assertThat(values).containsExactly(10.0, 20.0, 30.0, 40.0, 50.0); + + recovered.close(); + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesDownsamplingTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesDownsamplingTest.java new file mode 100644 index 0000000000..e8f797aa6f --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesDownsamplingTest.java @@ -0,0 +1,376 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.TestHelper; +import com.arcadedb.database.DatabaseInternal; +import com.arcadedb.query.sql.executor.ResultSet; +import com.arcadedb.schema.LocalTimeSeriesType; +import com.arcadedb.schema.Type; +import org.junit.jupiter.api.Test; + +import java.util.List; + +import static org.assertj.core.api.Assertions.assertThat; + +/** + * Tests for TimeSeries downsampling policies. + */ +class TimeSeriesDownsamplingTest extends TestHelper { + + private List createTestColumns() { + return List.of( + new ColumnDefinition("ts", Type.LONG, ColumnDefinition.ColumnRole.TIMESTAMP), + new ColumnDefinition("sensor_id", Type.STRING, ColumnDefinition.ColumnRole.TAG), + new ColumnDefinition("temperature", Type.DOUBLE, ColumnDefinition.ColumnRole.FIELD) + ); + } + + @Test + void testDdlAddAndDropPolicy() throws Exception { + database.command("sql", + "CREATE TIMESERIES TYPE SensorDDL TIMESTAMP ts TAGS (sensor_id STRING) FIELDS (temperature DOUBLE)"); + + // Add downsampling policy + database.command("sql", + "ALTER TIMESERIES TYPE SensorDDL ADD DOWNSAMPLING POLICY AFTER 7 DAYS GRANULARITY 1 HOURS AFTER 30 DAYS GRANULARITY 1 DAYS"); + + final LocalTimeSeriesType type = (LocalTimeSeriesType) database.getSchema().getType("SensorDDL"); + assertThat(type.getDownsamplingTiers()).hasSize(2); + // Sorted by afterMs ascending + assertThat(type.getDownsamplingTiers().get(0).afterMs()).isEqualTo(7 * 86400000L); + assertThat(type.getDownsamplingTiers().get(0).granularityMs()).isEqualTo(3600000L); + assertThat(type.getDownsamplingTiers().get(1).afterMs()).isEqualTo(30 * 86400000L); + assertThat(type.getDownsamplingTiers().get(1).granularityMs()).isEqualTo(86400000L); + + // Verify persistence by closing and reopening + database.close(); + database = factory.open(); + + final LocalTimeSeriesType reopened = (LocalTimeSeriesType) database.getSchema().getType("SensorDDL"); + assertThat(reopened.getDownsamplingTiers()).hasSize(2); + assertThat(reopened.getDownsamplingTiers().get(0).afterMs()).isEqualTo(7 * 86400000L); + + // Drop downsampling policy + database.command("sql", "ALTER TIMESERIES TYPE SensorDDL DROP DOWNSAMPLING POLICY"); + final LocalTimeSeriesType afterDrop = (LocalTimeSeriesType) database.getSchema().getType("SensorDDL"); + assertThat(afterDrop.getDownsamplingTiers()).isEmpty(); + } + + @Test + void testSingleTierDownsamplingAccuracy() throws Exception { + final DatabaseInternal db = (DatabaseInternal) database; + final List columns = createTestColumns(); + + database.begin(); + final TimeSeriesEngine engine = new TimeSeriesEngine(db, "ds_accuracy", columns, 1); + + // Insert 60 samples at 1-second intervals (timestamps 0..59000) + // All with same sensor, temperature values 1.0, 2.0, ..., 60.0 + final long[] timestamps = new long[60]; + final Object[] sensors = new Object[60]; + final Object[] temps = new Object[60]; + for (int i = 0; i < 60; i++) { + timestamps[i] = i * 1000L; + sensors[i] = "sensor_A"; + temps[i] = (double) (i + 1); + } + engine.appendSamples(timestamps, sensors, temps); + database.commit(); + + try { + database.begin(); + engine.compactAll(); + database.commit(); + + assertThat(engine.getShard(0).getSealedStore().getBlockCount()).isEqualTo(1); + + // Downsample to 1-minute granularity. Set nowMs such that all data is old enough. + // afterMs = 1ms means everything older than (nowMs - 1) qualifies + final List tiers = List.of(new DownsamplingTier(1L, 60000L)); + engine.applyDownsampling(tiers, 60001L); + + // All 60 samples should be aggregated into 1 sample (bucket 0) + assertThat(engine.getShard(0).getSealedStore().getBlockCount()).isEqualTo(1); + + database.begin(); + final List result = engine.query(Long.MIN_VALUE, Long.MAX_VALUE, null, null); + database.commit(); + + assertThat(result).hasSize(1); + assertThat((long) result.get(0)[0]).isEqualTo(0L); // bucket timestamp + assertThat((String) result.get(0)[1]).isEqualTo("sensor_A"); + // AVG of 1..60 = 30.5 + assertThat((double) result.get(0)[2]).isCloseTo(30.5, org.assertj.core.data.Offset.offset(0.001)); + } finally { + engine.close(); + } + } + + @Test + void testMultiTierDownsampling() throws Exception { + final DatabaseInternal db = (DatabaseInternal) database; + final List columns = createTestColumns(); + + database.begin(); + final TimeSeriesEngine engine = new TimeSeriesEngine(db, "ds_multitier", columns, 1); + + // Insert samples spanning multiple time ranges + // "Old" data: 120 samples at 1-second intervals starting at t=0 (0..119s) + // "Recent" data: 60 samples at 1-second intervals starting at t=200000 (200s..259s) + final long[] timestamps = new long[180]; + final Object[] sensors = new Object[180]; + final Object[] temps = new Object[180]; + for (int i = 0; i < 120; i++) { + timestamps[i] = i * 1000L; + sensors[i] = "sensor_A"; + temps[i] = 10.0; + } + for (int i = 0; i < 60; i++) { + timestamps[120 + i] = 200000L + i * 1000L; + sensors[120 + i] = "sensor_A"; + temps[120 + i] = 20.0; + } + engine.appendSamples(timestamps, sensors, temps); + database.commit(); + + try { + database.begin(); + engine.compactAll(); + database.commit(); + + // Tier 1: after 100ms -> 1-minute granularity (affects data older than nowMs-100) + // Tier 2: after 200ms -> 2-minute granularity (affects data older than nowMs-200) + final long nowMs = 260000L; + final List tiers = List.of( + new DownsamplingTier(100L, 60000L), + new DownsamplingTier(200L, 120000L) + ); + engine.applyDownsampling(tiers, nowMs); + + database.begin(); + final List result = engine.query(Long.MIN_VALUE, Long.MAX_VALUE, null, null); + database.commit(); + + // After tier 1 (granularity 60s): old data (0-119s) -> 2 buckets (0, 60000) + // recent data (200s-259s) -> 1 bucket (200000 rounded = 180000, 240000) + // After tier 2 (granularity 120s, cutoff 260000-200=260800-200): applies to data older than 260000-200=259800 + // Data at 0 and 60000 qualifies for tier 2 -> downsampled to 120s buckets -> 1 bucket (0) + // All data values are constant per range, so AVG=10.0 for old, 20.0 for recent + assertThat(result).isNotEmpty(); + + // Verify all old data timestamps are aligned to at least 60s boundaries + for (final Object[] row : result) { + final long ts = (long) row[0]; + if (ts < 200000L) + assertThat(ts % 60000L).isEqualTo(0L); + } + } finally { + engine.close(); + } + } + + @Test + void testIdempotency() throws Exception { + final DatabaseInternal db = (DatabaseInternal) database; + final List columns = createTestColumns(); + + database.begin(); + final TimeSeriesEngine engine = new TimeSeriesEngine(db, "ds_idempotent", columns, 1); + + // Insert 60 samples at 1-second intervals + final long[] timestamps = new long[60]; + final Object[] sensors = new Object[60]; + final Object[] temps = new Object[60]; + for (int i = 0; i < 60; i++) { + timestamps[i] = i * 1000L; + sensors[i] = "sensor_A"; + temps[i] = (double) (i + 1); + } + engine.appendSamples(timestamps, sensors, temps); + database.commit(); + + try { + database.begin(); + engine.compactAll(); + database.commit(); + + final List tiers = List.of(new DownsamplingTier(1L, 60000L)); + + // First downsampling + engine.applyDownsampling(tiers, 60001L); + + database.begin(); + final List firstResult = engine.query(Long.MIN_VALUE, Long.MAX_VALUE, null, null); + database.commit(); + + final int blockCountAfterFirst = engine.getShard(0).getSealedStore().getBlockCount(); + + // Second downsampling (should be a no-op due to density check) + engine.applyDownsampling(tiers, 60001L); + + database.begin(); + final List secondResult = engine.query(Long.MIN_VALUE, Long.MAX_VALUE, null, null); + database.commit(); + + assertThat(engine.getShard(0).getSealedStore().getBlockCount()).isEqualTo(blockCountAfterFirst); + assertThat(secondResult).hasSize(firstResult.size()); + for (int i = 0; i < firstResult.size(); i++) { + assertThat((long) secondResult.get(i)[0]).isEqualTo((long) firstResult.get(i)[0]); + assertThat((double) secondResult.get(i)[2]).isCloseTo((double) firstResult.get(i)[2], + org.assertj.core.data.Offset.offset(0.001)); + } + } finally { + engine.close(); + } + } + + @Test + void testMultiTagGrouping() throws Exception { + final DatabaseInternal db = (DatabaseInternal) database; + final List columns = createTestColumns(); + + database.begin(); + final TimeSeriesEngine engine = new TimeSeriesEngine(db, "ds_multitag", columns, 1); + + // Insert samples from two sensors in the same time bucket (0-59s) + final long[] timestamps = new long[6]; + final Object[] sensors = new Object[6]; + final Object[] temps = new Object[6]; + + // sensor_A: temps 10, 20, 30 -> avg 20 + timestamps[0] = 0; sensors[0] = "sensor_A"; temps[0] = 10.0; + timestamps[1] = 10000; sensors[1] = "sensor_A"; temps[1] = 20.0; + timestamps[2] = 20000; sensors[2] = "sensor_A"; temps[2] = 30.0; + // sensor_B: temps 100, 200, 300 -> avg 200 + timestamps[3] = 5000; sensors[3] = "sensor_B"; temps[3] = 100.0; + timestamps[4] = 15000; sensors[4] = "sensor_B"; temps[4] = 200.0; + timestamps[5] = 25000; sensors[5] = "sensor_B"; temps[5] = 300.0; + + engine.appendSamples(timestamps, sensors, temps); + database.commit(); + + try { + database.begin(); + engine.compactAll(); + database.commit(); + + final List tiers = List.of(new DownsamplingTier(1L, 60000L)); + engine.applyDownsampling(tiers, 60001L); + + database.begin(); + final List result = engine.query(Long.MIN_VALUE, Long.MAX_VALUE, null, null); + database.commit(); + + // Should produce 2 samples: one per sensor, both at bucket timestamp 0 + assertThat(result).hasSize(2); + + // Both at timestamp 0 + assertThat((long) result.get(0)[0]).isEqualTo(0L); + assertThat((long) result.get(1)[0]).isEqualTo(0L); + + // Find sensor_A and sensor_B results + double avgA = 0, avgB = 0; + for (final Object[] row : result) { + if ("sensor_A".equals(row[1])) + avgA = (double) row[2]; + else if ("sensor_B".equals(row[1])) + avgB = (double) row[2]; + } + assertThat(avgA).isCloseTo(20.0, org.assertj.core.data.Offset.offset(0.001)); + assertThat(avgB).isCloseTo(200.0, org.assertj.core.data.Offset.offset(0.001)); + } finally { + engine.close(); + } + } + + @Test + void testInteractionWithRetention() throws Exception { + final DatabaseInternal db = (DatabaseInternal) database; + final List columns = createTestColumns(); + + database.begin(); + final TimeSeriesEngine engine = new TimeSeriesEngine(db, "ds_retention", columns, 1); + + // Insert old data and recent data + engine.appendSamples( + new long[] { 1000, 2000, 3000 }, + new Object[] { "sensor_A", "sensor_A", "sensor_A" }, + new Object[] { 10.0, 20.0, 30.0 } + ); + database.commit(); + + try { + database.begin(); + engine.compactAll(); + database.commit(); + + database.begin(); + engine.appendSamples( + new long[] { 100000, 200000, 300000 }, + new Object[] { "sensor_A", "sensor_A", "sensor_A" }, + new Object[] { 100.0, 200.0, 300.0 } + ); + database.commit(); + + database.begin(); + engine.compactAll(); + database.commit(); + + // Apply retention first: remove blocks with maxTs < 50000 + engine.applyRetention(50000L); + assertThat(engine.getShard(0).getSealedStore().getBlockCount()).isEqualTo(1); + + // Apply downsampling on remaining data + final List tiers = List.of(new DownsamplingTier(1L, 200000L)); + engine.applyDownsampling(tiers, 400000L); + + database.begin(); + final List result = engine.query(Long.MIN_VALUE, Long.MAX_VALUE, null, null); + database.commit(); + + // Remaining data should be downsampled without errors + assertThat(result).isNotEmpty(); + } finally { + engine.close(); + } + } + + @Test + void testNoOpOnEmptyEngine() throws Exception { + final DatabaseInternal db = (DatabaseInternal) database; + final List columns = createTestColumns(); + + database.begin(); + final TimeSeriesEngine engine = new TimeSeriesEngine(db, "ds_empty", columns, 1); + database.commit(); + + try { + // Should not throw with empty data + engine.applyDownsampling(List.of(new DownsamplingTier(1L, 60000L)), 100000L); + assertThat(engine.getShard(0).getSealedStore().getBlockCount()).isEqualTo(0); + + // Should not throw with null/empty tier list + engine.applyDownsampling(null, 100000L); + engine.applyDownsampling(List.of(), 100000L); + } finally { + engine.close(); + } + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesEmbeddedBenchmark.java b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesEmbeddedBenchmark.java new file mode 100644 index 0000000000..dbf7848b19 --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesEmbeddedBenchmark.java @@ -0,0 +1,315 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.database.Database; +import com.arcadedb.database.DatabaseFactory; +import com.arcadedb.log.LogManager; +import com.arcadedb.query.sql.executor.Result; +import com.arcadedb.query.sql.executor.ResultSet; +import com.arcadedb.schema.LocalTimeSeriesType; +import com.arcadedb.utility.FileUtils; +import org.junit.jupiter.api.Tag; +import org.junit.jupiter.api.Test; + +import java.io.File; +import java.util.List; +import java.util.concurrent.atomic.AtomicLong; +import java.util.logging.Level; + +/** + * Benchmark for TimeSeries ingestion using the embedded (LocalDatabase) API. + * Uses the async API for parallel ingestion and logs metrics every second. + *

+ * Run with: mvn test -pl engine -Dtest="com.arcadedb.engine.timeseries.TimeSeriesEmbeddedBenchmark#run" + * Or as a standalone main() method. + * + * @Luca Garulli (l.garulli--(at)--arcadedata.com) + */ +@Tag("benchmark") +public class TimeSeriesEmbeddedBenchmark { + + private static final String DB_PATH = "target/databases/ts-benchmark-embedded"; + private static final int TOTAL_POINTS = Integer.getInteger("benchmark.totalPoints", 50_000_000); + private static final int BATCH_SIZE = Integer.getInteger("benchmark.batchSize", 20_000); + private static final int PARALLEL_LEVEL = Integer.getInteger("benchmark.parallelLevel", 3); + private static final int NUM_SENSORS = Integer.getInteger("benchmark.numSensors", 100); + public static final int ASYNCH_BACK_PRESSURE = 90; + public static final int ASYNC_COMMIT_EVERY = 5; + + public static void main(final String[] args) throws Exception { + new TimeSeriesEmbeddedBenchmark().run(); + } + + @Test + public void run() throws Exception { + // Clean up + FileUtils.deleteRecursively(new File(DB_PATH)); + + final DatabaseFactory factory = new DatabaseFactory(DB_PATH); + final Database database = factory.create(); + + try { + // Create TimeSeries type with enough shards to match the parallel level (avoids MVCC conflicts) + database.command("sql", + "CREATE TIMESERIES TYPE SensorData TIMESTAMP ts TAGS (sensor_id STRING) FIELDS (temperature DOUBLE, " + + "humidity DOUBLE) SHARDS " + PARALLEL_LEVEL); + + // Disable the auto-compaction scheduler so it doesn't interfere during inserts +// ((com.arcadedb.schema.LocalSchema) database.getSchema()) +// .getTimeSeriesMaintenanceScheduler().cancel("SensorData"); + + System.out.println("=== ArcadeDB TimeSeries Embedded Benchmark ==="); + System.out.printf("Total points: %,d | Batch size: %,d | Parallel level: %d | Sensors: %d%n", + TOTAL_POINTS, BATCH_SIZE, PARALLEL_LEVEL, NUM_SENSORS); + System.out.println("----------------------------------------------"); + + // Configure async + database.async().setParallelLevel(PARALLEL_LEVEL); + // Each task already writes BATCH_SIZE samples, so commit every few tasks (not every BATCH_SIZE tasks) + database.async().setCommitEvery(ASYNC_COMMIT_EVERY); + database.async().setBackPressure(ASYNCH_BACK_PRESSURE); + database.setReadYourWrites(false); + + final AtomicLong totalInserted = new AtomicLong(0); + final AtomicLong errors = new AtomicLong(0); + final long startTime = System.nanoTime(); + + database.async().onError(exception -> { + errors.incrementAndGet(); + LogManager.instance().log(TimeSeriesEmbeddedBenchmark.class, Level.SEVERE, + "Async error: %s", exception, exception.getMessage()); + }); + + // Start metrics reporter thread + final Thread metricsThread = new Thread(() -> { + long lastCount = 0; + long lastTime = System.nanoTime(); + while (!Thread.currentThread().isInterrupted()) { + try { + Thread.sleep(1000); + } catch (final InterruptedException e) { + break; + } + final long now = System.nanoTime(); + final long currentCount = totalInserted.get(); + final long deltaCount = currentCount - lastCount; + final double deltaSec = (now - lastTime) / 1_000_000_000.0; + final double instantRate = deltaCount / deltaSec; + final double elapsedSec = (now - startTime) / 1_000_000_000.0; + final double avgRate = currentCount / elapsedSec; + final double progress = (currentCount * 100.0) / TOTAL_POINTS; + + System.out.printf("[%6.1fs] Inserted: %,12d (%5.1f%%) | Instant: %,12.0f pts/s | Avg: %,12.0f pts/s | " + + "Errors: %d%n", + elapsedSec, currentCount, progress, instantRate, avgRate, errors.get()); + + lastCount = currentCount; + lastTime = now; + } + }, "metrics-reporter"); + metricsThread.setDaemon(true); + metricsThread.start(); + + // Insert data points using async appendSamples API (handles shard routing and transactions automatically) + final long baseTimestamp = System.currentTimeMillis() - (long) TOTAL_POINTS * 100; + final int batchCount = TOTAL_POINTS / BATCH_SIZE; + + for (int batch = 0; batch < batchCount; batch++) { + final long batchStart = baseTimestamp + (long) batch * BATCH_SIZE * 100; + final long[] timestamps = new long[BATCH_SIZE]; + final Object[] sensorIds = new Object[BATCH_SIZE]; + final Object[] temperatures = new Object[BATCH_SIZE]; + final Object[] humidities = new Object[BATCH_SIZE]; + + for (int i = 0; i < BATCH_SIZE; i++) { + timestamps[i] = batchStart + i * 100L; + sensorIds[i] = "sensor_" + (i % NUM_SENSORS); + temperatures[i] = 20.0 + (Math.random() * 15.0); + humidities[i] = 40.0 + (Math.random() * 40.0); + } + + database.async().appendSamples("SensorData", timestamps, sensorIds, temperatures, humidities); + totalInserted.addAndGet(BATCH_SIZE); + } + + // Wait for all async operations to complete + database.async().waitCompletion(); + final long endTime = System.nanoTime(); + + // Stop metrics thread + metricsThread.interrupt(); + metricsThread.join(2000); + + // Print final results + final double totalSec = (endTime - startTime) / 1_000_000_000.0; + final long finalCount = totalInserted.get(); + final double finalRate = finalCount / totalSec; + + System.out.println("=============================================="); + System.out.println(" FINAL RESULTS"); + System.out.println("=============================================="); + System.out.printf("Total points inserted: %,d%n", finalCount); + System.out.printf("Total time: %.2f seconds%n", totalSec); + System.out.printf("Average throughput: %,.0f points/second%n", finalRate); + System.out.printf("Errors: %d%n", errors.get()); + System.out.printf("Parallel level: %d%n", PARALLEL_LEVEL); + + // Compact mutable data into sealed columnar storage + System.out.println("\n--- Compaction ---"); + final long compactStart = System.nanoTime(); + ((LocalTimeSeriesType) database.getSchema().getType("SensorData")).getEngine().compactAll(); + final long compactTime = (System.nanoTime() - compactStart) / 1_000_000; + System.out.printf("Compaction time: %,d ms%n", compactTime); + + System.out.println("=============================================="); + + // Close database to flush everything from RAM — forces cold reads from disk + database.close(); + + // Reopen database — all queries below are truly cold (no page cache, no JIT warmup on query paths) + System.out.println("\n--- Cold Queries (after close/reopen, all data from disk) ---"); + final long midTs = baseTimestamp + (long) (TOTAL_POINTS / 2) * 100; + final Database coldDb = factory.open(); + try { + // Data distribution after cold open + final TimeSeriesEngine coldEngine = + ((LocalTimeSeriesType) coldDb.getSchema().getType("SensorData")).getEngine(); + System.out.println("\n--- Data Distribution ---"); + for (int s = 0; s < coldEngine.getShardCount(); s++) { + final TimeSeriesShard shard = coldEngine.getShard(s); + System.out.printf("Shard %d: sealed blocks=%d, mutable samples=%,d%n", + s, shard.getSealedStore().getBlockCount(), shard.getMutableBucket().getSampleCount()); + } + + // Count query + long queryStart = System.nanoTime(); + try (final ResultSet rs = coldDb.query("sql", "SELECT count(*) AS cnt FROM SensorData")) { + long count = 0; + if (rs.hasNext()) + count = ((Number) rs.next().getProperty("cnt")).longValue(); + long queryTime = (System.nanoTime() - queryStart) / 1_000_000; + System.out.printf("COUNT(*): %,d ms (result: %,d)%n", queryTime, count); + } + + // Range scan (1 hour window) + queryStart = System.nanoTime(); + long rangeScanCount = 0; + try (final ResultSet rs = coldDb.query("sql", "SELECT FROM SensorData WHERE ts BETWEEN ? AND ?", + midTs, midTs + 3_600_000L)) { + while (rs.hasNext()) { + rs.next(); + rangeScanCount++; + } + } + long queryTime = (System.nanoTime() - queryStart) / 1_000_000; + System.out.printf("1h range scan: %,d ms (rows: %,d)%n", queryTime, rangeScanCount); + + // Aggregation with time bucket + try { + queryStart = System.nanoTime(); + long aggRows = 0; + try (final ResultSet rs = coldDb.query("sql", + "SELECT ts.timeBucket('1h', ts) AS hour, avg(temperature) AS avg_temp, max(temperature) AS max_temp " + + "FROM SensorData GROUP BY hour")) { + while (rs.hasNext()) { + rs.next(); + aggRows++; + } + } + queryTime = (System.nanoTime() - queryStart) / 1_000_000; + System.out.printf("Hourly aggregation: %,d ms (buckets: %,d)%n", queryTime, aggRows); + } catch (final Exception e) { + System.out.printf("Hourly aggregation: SKIPPED (%s)%n", e.getMessage()); + } + + // Direct API test (bypasses SQL layer entirely) + queryStart = System.nanoTime(); + int directCount = 0; + final java.util.Iterator iter = coldEngine.iterateQuery(midTs, midTs + 3_600_000L, null, null); + while (iter.hasNext()) { + iter.next(); + directCount++; + } + queryTime = (System.nanoTime() - queryStart) / 1_000_000; + System.out.printf("Direct API 1h scan: %,d ms (rows: %,d)%n", queryTime, directCount); + + // Full scan — measure how long it takes to iterate ALL 50M points from disk + queryStart = System.nanoTime(); + long fullScanCount = 0; + final java.util.Iterator fullIter = coldEngine.iterateQuery(Long.MIN_VALUE, Long.MAX_VALUE, null, + null); + while (fullIter.hasNext()) { + fullIter.next(); + fullScanCount++; + } + queryTime = (System.nanoTime() - queryStart) / 1_000_000; + final double scanRate = fullScanCount / (queryTime / 1000.0); + System.out.printf("Full scan (all data): %,d ms (rows: %,d, rate: %,.0f rows/s)%n", + queryTime, fullScanCount, scanRate); + + // Direct API aggregation — bypasses SQL layer entirely + final AggregationMetrics aggMetrics = new AggregationMetrics(); + queryStart = System.nanoTime(); + final MultiColumnAggregationResult directAgg = coldEngine.aggregateMulti( + Long.MIN_VALUE, Long.MAX_VALUE, + List.of( + new MultiColumnAggregationRequest(2, AggregationType.AVG, "avg_temp"), + new MultiColumnAggregationRequest(2, AggregationType.MAX, "max_temp") + ), + 3_600_000L, null, aggMetrics); + queryTime = (System.nanoTime() - queryStart) / 1_000_000; + System.out.printf("Direct API agg: %,d ms (buckets: %,d)%n", queryTime, directAgg.size()); + System.out.println(" " + aggMetrics); + + // Profiled hourly aggregation — shows execution plan with push-down + System.out.println("\n--- PROFILE: Hourly aggregation ---"); + try (final ResultSet profileRs = coldDb.command("sql", + "PROFILE SELECT ts.timeBucket('1h', ts) AS hour, avg(temperature) AS avg_temp, max(temperature) AS " + + "max_temp " + + "FROM SensorData GROUP BY hour")) { + if (profileRs.hasNext()) { + final Result profile = profileRs.next(); + System.out.println((String) profile.getProperty("executionPlanAsString")); + } + } + + // Profiled range scan — shows cost breakdown per execution step + System.out.println("\n--- PROFILE: 1h range scan ---"); + try (final ResultSet profileRs = coldDb.command("sql", + "PROFILE SELECT FROM SensorData WHERE ts BETWEEN ? AND ?", midTs, midTs + 3_600_000L)) { + if (profileRs.hasNext()) { + final Result profile = profileRs.next(); + System.out.println((String) profile.getProperty("executionPlanAsString")); + } + } + + System.out.println("=============================================="); + } finally { + coldDb.close(); + } + + } finally { + if (database.isOpen()) + database.close(); + factory.close(); + FileUtils.deleteRecursively(new File(DB_PATH)); + } + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesEngineTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesEngineTest.java new file mode 100644 index 0000000000..c35d5548b0 --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesEngineTest.java @@ -0,0 +1,149 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.TestHelper; +import com.arcadedb.database.DatabaseInternal; +import com.arcadedb.schema.Type; +import org.junit.jupiter.api.Test; + +import java.util.List; + +import static org.assertj.core.api.Assertions.assertThat; + +/** + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +class TimeSeriesEngineTest extends TestHelper { + + @Test + void testMultiShardWriteAndQuery() throws Exception { + final List cols = List.of( + new ColumnDefinition("ts", Type.LONG, ColumnDefinition.ColumnRole.TIMESTAMP), + new ColumnDefinition("value", Type.DOUBLE, ColumnDefinition.ColumnRole.FIELD) + ); + + database.begin(); + final TimeSeriesEngine engine = new TimeSeriesEngine((DatabaseInternal) database, "test_engine", cols, 2); + + // Write data — will go to shard based on current thread + engine.appendSamples(new long[] { 1000L, 2000L, 3000L }, new Object[] { 10.0, 20.0, 30.0 }); + database.commit(); + + database.begin(); + final List results = engine.query(1000L, 3000L, null, null); + assertThat(results).hasSize(3); + + // Results should be sorted by timestamp + assertThat((long) results.get(0)[0]).isEqualTo(1000L); + assertThat((long) results.get(1)[0]).isEqualTo(2000L); + assertThat((long) results.get(2)[0]).isEqualTo(3000L); + database.commit(); + + engine.close(); + } + + @Test + void testShardCount() throws Exception { + final List cols = List.of( + new ColumnDefinition("ts", Type.LONG, ColumnDefinition.ColumnRole.TIMESTAMP), + new ColumnDefinition("value", Type.DOUBLE, ColumnDefinition.ColumnRole.FIELD) + ); + + database.begin(); + final TimeSeriesEngine engine = new TimeSeriesEngine((DatabaseInternal) database, "test_shards", cols, 4); + assertThat(engine.getShardCount()).isEqualTo(4); + assertThat(engine.getColumns()).hasSize(2); + assertThat(engine.getTypeName()).isEqualTo("test_shards"); + database.commit(); + + engine.close(); + } + + @Test + void testQueryWithTagFilter() throws Exception { + final List cols = List.of( + new ColumnDefinition("ts", Type.LONG, ColumnDefinition.ColumnRole.TIMESTAMP), + new ColumnDefinition("sensor", Type.STRING, ColumnDefinition.ColumnRole.TAG), + new ColumnDefinition("value", Type.DOUBLE, ColumnDefinition.ColumnRole.FIELD) + ); + + database.begin(); + final TimeSeriesEngine engine = new TimeSeriesEngine((DatabaseInternal) database, "test_filter", cols, 1); + + engine.appendSamples( + new long[] { 1000L, 2000L, 3000L, 4000L }, + new Object[] { "A", "B", "A", "B" }, + new Object[] { 10.0, 20.0, 30.0, 40.0 } + ); + database.commit(); + + database.begin(); + final TagFilter filter = TagFilter.eq(0, "B"); + final List results = engine.query(1000L, 4000L, null, filter); + assertThat(results).hasSize(2); + assertThat((String) results.get(0)[1]).isEqualTo("B"); + assertThat((String) results.get(1)[1]).isEqualTo("B"); + database.commit(); + + engine.close(); + } + + @Test + void testRoundRobinShardDistribution() throws Exception { + final List cols = List.of( + new ColumnDefinition("ts", Type.LONG, ColumnDefinition.ColumnRole.TIMESTAMP), + new ColumnDefinition("value", Type.DOUBLE, ColumnDefinition.ColumnRole.FIELD) + ); + + database.begin(); + final TimeSeriesEngine engine = new TimeSeriesEngine((DatabaseInternal) database, "test_roundrobin", cols, 4); + + // Append 4 batches — round-robin should put one in each shard + for (int i = 0; i < 4; i++) + engine.appendSamples(new long[] { (i + 1) * 1000L }, new Object[] { (double) i }); + + // Each shard should have exactly 1 sample + for (int s = 0; s < 4; s++) + assertThat(engine.getShard(s).getMutableBucket().getSampleCount()).isEqualTo(1); + + // Append 4 more — second round-robin cycle + for (int i = 4; i < 8; i++) + engine.appendSamples(new long[] { (i + 1) * 1000L }, new Object[] { (double) i }); + + // Each shard should now have exactly 2 samples + for (int s = 0; s < 4; s++) + assertThat(engine.getShard(s).getMutableBucket().getSampleCount()).isEqualTo(2); + + database.commit(); + + // All 8 samples should be queryable + database.begin(); + final List results = engine.query(Long.MIN_VALUE, Long.MAX_VALUE, null, null); + assertThat(results).hasSize(8); + database.commit(); + + engine.close(); + } + + @Override + protected boolean isCheckingDatabaseIntegrity() { + return false; + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesFormatVersionTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesFormatVersionTest.java new file mode 100644 index 0000000000..b2d36847db --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesFormatVersionTest.java @@ -0,0 +1,222 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.engine.timeseries.codec.DeltaOfDeltaCodec; +import com.arcadedb.engine.timeseries.codec.GorillaXORCodec; +import com.arcadedb.schema.LocalTimeSeriesType; +import com.arcadedb.schema.Type; +import com.arcadedb.serializer.json.JSONObject; +import com.arcadedb.utility.FileUtils; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; + +import java.io.File; +import java.io.IOException; +import java.io.RandomAccessFile; +import java.util.List; + +import static org.assertj.core.api.Assertions.assertThat; +import static org.assertj.core.api.Assertions.assertThatThrownBy; + +/** + * Tests for TimeSeries disk format versioning and CRC32 integrity checks. + */ +class TimeSeriesFormatVersionTest { + + private static final String TEST_DIR = "target/databases/TimeSeriesFormatVersionTest"; + private static final String TEST_PATH = TEST_DIR + "/sealed"; + + private List columns; + + @BeforeEach + void setUp() { + FileUtils.deleteRecursively(new File(TEST_DIR)); + new File(TEST_DIR).mkdirs(); + + columns = List.of( + new ColumnDefinition("ts", Type.LONG, ColumnDefinition.ColumnRole.TIMESTAMP), + new ColumnDefinition("value", Type.DOUBLE, ColumnDefinition.ColumnRole.FIELD) + ); + } + + @AfterEach + void tearDown() { + FileUtils.deleteRecursively(new File(TEST_DIR)); + } + + @Test + void testSealedStoreHeaderHasVersionByte() throws Exception { + final long[] timestamps = { 1000L, 2000L, 3000L }; + final double[] values = { 10.0, 20.0, 30.0 }; + + try (final TimeSeriesSealedStore store = new TimeSeriesSealedStore(TEST_PATH, columns)) { + store.appendBlock(3, 1000L, 3000L, new byte[][] { + DeltaOfDeltaCodec.encode(timestamps), + GorillaXORCodec.encode(values) + }, new double[] { Double.NaN, 10.0 }, new double[] { Double.NaN, 30.0 }, new double[] { Double.NaN, 60.0 }, null); + } + + // Read raw file bytes and verify version byte at offset 4 + try (final RandomAccessFile raf = new RandomAccessFile(TEST_PATH + ".ts.sealed", "r")) { + // Magic: bytes 0-3 + final int magic = raf.readInt(); + assertThat(magic).isEqualTo(0x54534958); // "TSIX" + + // Format version: byte 4 + final byte version = raf.readByte(); + assertThat(version).isEqualTo((byte) TimeSeriesSealedStore.CURRENT_VERSION); + + // Column count: bytes 5-6 + final short colCount = raf.readShort(); + assertThat(colCount).isEqualTo((short) 2); + + // Block count: bytes 7-10 + final int blockCount = raf.readInt(); + assertThat(blockCount).isEqualTo(1); + } + } + + @Test + void testSealedStoreRejectsNewerVersion() throws Exception { + // Create a valid file first + try (final TimeSeriesSealedStore store = new TimeSeriesSealedStore(TEST_PATH, columns)) { + store.appendBlock(1, 1000L, 1000L, new byte[][] { + DeltaOfDeltaCodec.encode(new long[] { 1000L }), + GorillaXORCodec.encode(new double[] { 10.0 }) + }, new double[] { Double.NaN, 10.0 }, new double[] { Double.NaN, 10.0 }, new double[] { Double.NaN, 10.0 }, null); + } + + // Corrupt the version byte to 99 + try (final RandomAccessFile raf = new RandomAccessFile(TEST_PATH + ".ts.sealed", "rw")) { + raf.seek(4); // version byte offset + raf.writeByte(99); + } + + // Opening should fail + assertThatThrownBy(() -> new TimeSeriesSealedStore(TEST_PATH, columns)) + .isInstanceOf(IOException.class) + .hasMessageContaining("version"); + } + + @Test + void testBlockCRC32DetectsCorruption() throws Exception { + final long[] timestamps = { 1000L, 2000L, 3000L }; + final double[] values = { 10.0, 20.0, 30.0 }; + + try (final TimeSeriesSealedStore store = new TimeSeriesSealedStore(TEST_PATH, columns)) { + store.appendBlock(3, 1000L, 3000L, new byte[][] { + DeltaOfDeltaCodec.encode(timestamps), + GorillaXORCodec.encode(values) + }, new double[] { Double.NaN, 10.0 }, new double[] { Double.NaN, 30.0 }, new double[] { Double.NaN, 60.0 }, null); + } + + // Flip a byte in the compressed data region (somewhere after the header + block meta) + final File sealedFile = new File(TEST_PATH + ".ts.sealed"); + final long fileLen = sealedFile.length(); + try (final RandomAccessFile raf = new RandomAccessFile(sealedFile, "rw")) { + // The CRC is the last 4 bytes of the file. Corrupt a byte just before it. + final long corruptOffset = fileLen - 8; // well inside compressed data, before CRC + raf.seek(corruptOffset); + final byte original = raf.readByte(); + raf.seek(corruptOffset); + raf.writeByte(original ^ 0xFF); + } + + // First read should fail with CRC mismatch (CRC validated lazily on block access) + assertThatThrownBy(() -> { + try (final TimeSeriesSealedStore store = new TimeSeriesSealedStore(TEST_PATH, columns)) { + store.scanRange(1000L, 3000L, null, null); + } + }).isInstanceOf(IOException.class) + .hasMessageContaining("CRC"); + } + + @Test + void testStatsWithoutColIndex() throws Exception { + final long[] timestamps = { 1000L, 2000L, 3000L }; + final double[] values = { 10.0, 20.0, 30.0 }; + + final double[] mins = { Double.NaN, 10.0 }; + final double[] maxs = { Double.NaN, 30.0 }; + final double[] sums = { Double.NaN, 60.0 }; + + // Write and read back — stats should round-trip without colIdx + try (final TimeSeriesSealedStore store = new TimeSeriesSealedStore(TEST_PATH, columns)) { + store.appendBlock(3, 1000L, 3000L, new byte[][] { + DeltaOfDeltaCodec.encode(timestamps), + GorillaXORCodec.encode(values) + }, mins, maxs, sums, null); + } + + // Reload and verify data + stats-based aggregation both work + try (final TimeSeriesSealedStore store = new TimeSeriesSealedStore(TEST_PATH, columns)) { + assertThat(store.getBlockCount()).isEqualTo(1); + + final List results = store.scanRange(1000L, 3000L, null, null); + assertThat(results).hasSize(3); + assertThat((double) results.get(0)[1]).isEqualTo(10.0); + assertThat((double) results.get(2)[1]).isEqualTo(30.0); + + // Verify aggregation still uses block stats correctly + final List requests = List.of( + new MultiColumnAggregationRequest(1, AggregationType.SUM, "sum_val"), + new MultiColumnAggregationRequest(1, AggregationType.MIN, "min_val"), + new MultiColumnAggregationRequest(1, AggregationType.MAX, "max_val") + ); + + final MultiColumnAggregationResult result = new MultiColumnAggregationResult(requests); + store.aggregateMultiBlocks(1000L, 3000L, requests, 3600000L, result, null, null); + + final long bucket = result.getBucketTimestamps().get(0); + assertThat(result.getValue(bucket, 0)).isEqualTo(60.0); // SUM + assertThat(result.getValue(bucket, 1)).isEqualTo(10.0); // MIN + assertThat(result.getValue(bucket, 2)).isEqualTo(30.0); // MAX + } + } + + @Test + void testSchemaJsonFormatVersionRoundTrip() { + final JSONObject json = new JSONObject(); + json.put("timestampColumn", "ts"); + json.put("shardCount", 1); + json.put("retentionMs", 0L); + json.put("sealedFormatVersion", 0); + json.put("mutableFormatVersion", 0); + json.put("tsColumns", new com.arcadedb.serializer.json.JSONArray()); + + // Simulate fromJSON + final int sealedVersion = json.getInt("sealedFormatVersion", 0); + final int mutableVersion = json.getInt("mutableFormatVersion", 0); + + assertThat(sealedVersion).isEqualTo(0); + assertThat(mutableVersion).isEqualTo(0); + + // Verify a JSON without the fields defaults to 0 + final JSONObject legacyJson = new JSONObject(); + legacyJson.put("timestampColumn", "ts"); + legacyJson.put("shardCount", 1); + legacyJson.put("retentionMs", 0L); + legacyJson.put("tsColumns", new com.arcadedb.serializer.json.JSONArray()); + + assertThat(legacyJson.getInt("sealedFormatVersion", 0)).isEqualTo(0); + assertThat(legacyJson.getInt("mutableFormatVersion", 0)).isEqualTo(0); + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesFunctionCorrelateTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesFunctionCorrelateTest.java new file mode 100644 index 0000000000..ec31e6f066 --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesFunctionCorrelateTest.java @@ -0,0 +1,110 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.TestHelper; +import com.arcadedb.query.sql.executor.Result; +import com.arcadedb.query.sql.executor.ResultSet; +import org.junit.jupiter.api.Test; + +import static org.assertj.core.api.Assertions.assertThat; +import static org.assertj.core.api.Assertions.within; + +public class TimeSeriesFunctionCorrelateTest extends TestHelper { + + @Test + public void testPerfectPositiveCorrelation() { + database.command("sql", + "CREATE TIMESERIES TYPE CorrSensor TIMESTAMP ts FIELDS (a DOUBLE, b DOUBLE)"); + + database.transaction(() -> { + for (int i = 1; i <= 10; i++) + database.command("sql", + "INSERT INTO CorrSensor SET ts = " + (i * 1000) + ", a = " + (double) i + ", b = " + (double) i); + }); + + final ResultSet rs = database.query("sql", "SELECT ts.correlate(a, b) AS corr FROM CorrSensor"); + assertThat(rs.hasNext()).isTrue(); + assertThat(((Number) rs.next().getProperty("corr")).doubleValue()).isCloseTo(1.0, within(0.001)); + } + + @Test + public void testPerfectNegativeCorrelation() { + database.command("sql", + "CREATE TIMESERIES TYPE NegCorrSensor TIMESTAMP ts FIELDS (a DOUBLE, b DOUBLE)"); + + database.transaction(() -> { + for (int i = 1; i <= 10; i++) + database.command("sql", + "INSERT INTO NegCorrSensor SET ts = " + (i * 1000) + ", a = " + (double) i + ", b = " + (double) (-i)); + }); + + final ResultSet rs = database.query("sql", "SELECT ts.correlate(a, b) AS corr FROM NegCorrSensor"); + assertThat(((Number) rs.next().getProperty("corr")).doubleValue()).isCloseTo(-1.0, within(0.001)); + } + + @Test + public void testUncorrelated() { + database.command("sql", + "CREATE TIMESERIES TYPE UncorrSensor TIMESTAMP ts FIELDS (a DOUBLE, b DOUBLE)"); + + // a increases, b alternates — near zero correlation + database.transaction(() -> { + database.command("sql", "INSERT INTO UncorrSensor SET ts = 1000, a = 1.0, b = 1.0"); + database.command("sql", "INSERT INTO UncorrSensor SET ts = 2000, a = 2.0, b = -1.0"); + database.command("sql", "INSERT INTO UncorrSensor SET ts = 3000, a = 3.0, b = 1.0"); + database.command("sql", "INSERT INTO UncorrSensor SET ts = 4000, a = 4.0, b = -1.0"); + database.command("sql", "INSERT INTO UncorrSensor SET ts = 5000, a = 5.0, b = 1.0"); + database.command("sql", "INSERT INTO UncorrSensor SET ts = 6000, a = 6.0, b = -1.0"); + }); + + final ResultSet rs = database.query("sql", "SELECT ts.correlate(a, b) AS corr FROM UncorrSensor"); + final double corr = ((Number) rs.next().getProperty("corr")).doubleValue(); + assertThat(Math.abs(corr)).isLessThan(0.3); + } + + @Test + public void testSingleSample() { + database.command("sql", + "CREATE TIMESERIES TYPE SingleCorr TIMESTAMP ts FIELDS (a DOUBLE, b DOUBLE)"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO SingleCorr SET ts = 1000, a = 5.0, b = 10.0"); + }); + + final ResultSet rs = database.query("sql", "SELECT ts.correlate(a, b) AS corr FROM SingleCorr"); + assertThat(rs.hasNext()).isTrue(); + assertThat((Object) rs.next().getProperty("corr")).isNull(); + } + + @Test + public void testConstantSeries() { + database.command("sql", + "CREATE TIMESERIES TYPE ConstCorr TIMESTAMP ts FIELDS (a DOUBLE, b DOUBLE)"); + + database.transaction(() -> { + for (int i = 1; i <= 5; i++) + database.command("sql", + "INSERT INTO ConstCorr SET ts = " + (i * 1000) + ", a = 42.0, b = " + (double) i); + }); + + final ResultSet rs = database.query("sql", "SELECT ts.correlate(a, b) AS corr FROM ConstCorr"); + assertThat((Object) rs.next().getProperty("corr")).isNull(); + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesFunctionDeltaTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesFunctionDeltaTest.java new file mode 100644 index 0000000000..a6fabb1ca3 --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesFunctionDeltaTest.java @@ -0,0 +1,98 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.TestHelper; +import com.arcadedb.query.sql.executor.Result; +import com.arcadedb.query.sql.executor.ResultSet; +import org.junit.jupiter.api.Test; + +import java.util.ArrayList; +import java.util.List; + +import static org.assertj.core.api.Assertions.assertThat; + +public class TimeSeriesFunctionDeltaTest extends TestHelper { + + @Test + public void testIncreasingCounter() { + database.command("sql", + "CREATE TIMESERIES TYPE DeltaSensor TIMESTAMP ts FIELDS (value DOUBLE)"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO DeltaSensor SET ts = 1000, value = 100.0"); + database.command("sql", "INSERT INTO DeltaSensor SET ts = 2000, value = 150.0"); + database.command("sql", "INSERT INTO DeltaSensor SET ts = 3000, value = 250.0"); + }); + + final ResultSet rs = database.query("sql", "SELECT ts.delta(value, ts) AS d FROM DeltaSensor"); + assertThat(rs.hasNext()).isTrue(); + assertThat(((Number) rs.next().getProperty("d")).doubleValue()).isEqualTo(150.0); + } + + @Test + public void testNegativeDelta() { + database.command("sql", + "CREATE TIMESERIES TYPE NegDeltaSensor TIMESTAMP ts FIELDS (value DOUBLE)"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO NegDeltaSensor SET ts = 1000, value = 100.0"); + database.command("sql", "INSERT INTO NegDeltaSensor SET ts = 2000, value = 40.0"); + }); + + final ResultSet rs = database.query("sql", "SELECT ts.delta(value, ts) AS d FROM NegDeltaSensor"); + assertThat(((Number) rs.next().getProperty("d")).doubleValue()).isEqualTo(-60.0); + } + + @Test + public void testSingleSample() { + database.command("sql", + "CREATE TIMESERIES TYPE SingleDelta TIMESTAMP ts FIELDS (value DOUBLE)"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO SingleDelta SET ts = 1000, value = 42.0"); + }); + + final ResultSet rs = database.query("sql", "SELECT ts.delta(value, ts) AS d FROM SingleDelta"); + assertThat(((Number) rs.next().getProperty("d")).doubleValue()).isEqualTo(0.0); + } + + @Test + public void testWithGroupBy() { + database.command("sql", + "CREATE TIMESERIES TYPE GroupedDelta TIMESTAMP ts TAGS (sensor STRING) FIELDS (value DOUBLE)"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO GroupedDelta SET ts = 1000, sensor = 'A', value = 10.0"); + database.command("sql", "INSERT INTO GroupedDelta SET ts = 3000, sensor = 'A', value = 50.0"); + database.command("sql", "INSERT INTO GroupedDelta SET ts = 1000, sensor = 'B', value = 100.0"); + database.command("sql", "INSERT INTO GroupedDelta SET ts = 3000, sensor = 'B', value = 80.0"); + }); + + final ResultSet rs = database.query("sql", + "SELECT sensor, ts.delta(value, ts) AS d FROM GroupedDelta GROUP BY sensor ORDER BY sensor"); + final List results = new ArrayList<>(); + while (rs.hasNext()) + results.add(rs.next()); + + assertThat(results).hasSize(2); + assertThat(((Number) results.get(0).getProperty("d")).doubleValue()).isEqualTo(40.0); + assertThat(((Number) results.get(1).getProperty("d")).doubleValue()).isEqualTo(-20.0); + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesFunctionFirstLastTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesFunctionFirstLastTest.java new file mode 100644 index 0000000000..b9e1072d1a --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesFunctionFirstLastTest.java @@ -0,0 +1,132 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.TestHelper; +import com.arcadedb.query.sql.executor.Result; +import com.arcadedb.query.sql.executor.ResultSet; +import org.junit.jupiter.api.Test; + +import java.util.ArrayList; +import java.util.List; + +import static org.assertj.core.api.Assertions.assertThat; + +public class TimeSeriesFunctionFirstLastTest extends TestHelper { + + @Test + public void testBasicFirstLast() { + database.command("sql", + "CREATE TIMESERIES TYPE Sensor TIMESTAMP ts FIELDS (value DOUBLE)"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO Sensor SET ts = 3000, value = 30.0"); + database.command("sql", "INSERT INTO Sensor SET ts = 1000, value = 10.0"); + database.command("sql", "INSERT INTO Sensor SET ts = 5000, value = 50.0"); + database.command("sql", "INSERT INTO Sensor SET ts = 2000, value = 20.0"); + database.command("sql", "INSERT INTO Sensor SET ts = 4000, value = 40.0"); + }); + + final ResultSet rs = database.query("sql", + "SELECT ts.first(value, ts) AS first_val, ts.last(value, ts) AS last_val FROM Sensor"); + assertThat(rs.hasNext()).isTrue(); + final Result row = rs.next(); + assertThat(((Number) row.getProperty("first_val")).doubleValue()).isEqualTo(10.0); + assertThat(((Number) row.getProperty("last_val")).doubleValue()).isEqualTo(50.0); + } + + @Test + public void testUnsortedInput() { + database.command("sql", + "CREATE TIMESERIES TYPE UnsortedSensor TIMESTAMP ts FIELDS (value DOUBLE)"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO UnsortedSensor SET ts = 5000, value = 50.0"); + database.command("sql", "INSERT INTO UnsortedSensor SET ts = 1000, value = 10.0"); + database.command("sql", "INSERT INTO UnsortedSensor SET ts = 3000, value = 30.0"); + }); + + final ResultSet rs = database.query("sql", + "SELECT ts.first(value, ts) AS first_val, ts.last(value, ts) AS last_val FROM UnsortedSensor"); + assertThat(rs.hasNext()).isTrue(); + final Result row = rs.next(); + assertThat(((Number) row.getProperty("first_val")).doubleValue()).isEqualTo(10.0); + assertThat(((Number) row.getProperty("last_val")).doubleValue()).isEqualTo(50.0); + } + + @Test + public void testWithGroupBy() { + database.command("sql", + "CREATE TIMESERIES TYPE GroupedSensor TIMESTAMP ts TAGS (sensor_id STRING) FIELDS (value DOUBLE)"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO GroupedSensor SET ts = 1000, sensor_id = 'A', value = 10.0"); + database.command("sql", "INSERT INTO GroupedSensor SET ts = 3000, sensor_id = 'A', value = 30.0"); + database.command("sql", "INSERT INTO GroupedSensor SET ts = 2000, sensor_id = 'B', value = 200.0"); + database.command("sql", "INSERT INTO GroupedSensor SET ts = 4000, sensor_id = 'B', value = 400.0"); + }); + + final ResultSet rs = database.query("sql", + "SELECT sensor_id, ts.first(value, ts) AS first_val, ts.last(value, ts) AS last_val FROM GroupedSensor GROUP BY sensor_id ORDER BY sensor_id"); + final List results = new ArrayList<>(); + while (rs.hasNext()) + results.add(rs.next()); + + assertThat(results).hasSize(2); + assertThat(((Number) results.get(0).getProperty("first_val")).doubleValue()).isEqualTo(10.0); + assertThat(((Number) results.get(0).getProperty("last_val")).doubleValue()).isEqualTo(30.0); + assertThat(((Number) results.get(1).getProperty("first_val")).doubleValue()).isEqualTo(200.0); + assertThat(((Number) results.get(1).getProperty("last_val")).doubleValue()).isEqualTo(400.0); + } + + @Test + public void testSingleRow() { + database.command("sql", + "CREATE TIMESERIES TYPE SingleSensor TIMESTAMP ts FIELDS (value DOUBLE)"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO SingleSensor SET ts = 1000, value = 42.0"); + }); + + final ResultSet rs = database.query("sql", + "SELECT ts.first(value, ts) AS first_val, ts.last(value, ts) AS last_val FROM SingleSensor"); + assertThat(rs.hasNext()).isTrue(); + final Result row = rs.next(); + assertThat(((Number) row.getProperty("first_val")).doubleValue()).isEqualTo(42.0); + assertThat(((Number) row.getProperty("last_val")).doubleValue()).isEqualTo(42.0); + } + + @Test + public void testNullHandling() { + database.command("sql", + "CREATE TIMESERIES TYPE NullSensor TIMESTAMP ts FIELDS (value DOUBLE)"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO NullSensor SET ts = 1000, value = 10.0"); + database.command("sql", "INSERT INTO NullSensor SET ts = 3000, value = 30.0"); + }); + + final ResultSet rs = database.query("sql", + "SELECT ts.first(value, ts) AS first_val, ts.last(value, ts) AS last_val FROM NullSensor"); + assertThat(rs.hasNext()).isTrue(); + final Result row = rs.next(); + assertThat(((Number) row.getProperty("first_val")).doubleValue()).isEqualTo(10.0); + assertThat(((Number) row.getProperty("last_val")).doubleValue()).isEqualTo(30.0); + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesFunctionInterpolateTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesFunctionInterpolateTest.java new file mode 100644 index 0000000000..d72df53693 --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesFunctionInterpolateTest.java @@ -0,0 +1,121 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.TestHelper; +import com.arcadedb.query.sql.executor.ResultSet; +import org.junit.jupiter.api.Test; + +import java.util.List; + +import static org.assertj.core.api.Assertions.assertThat; + +public class TimeSeriesFunctionInterpolateTest extends TestHelper { + + @Test + public void testNoNulls() { + database.command("sql", + "CREATE TIMESERIES TYPE InterpSensor TIMESTAMP ts FIELDS (value DOUBLE)"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO InterpSensor SET ts = 1000, value = 1.0"); + database.command("sql", "INSERT INTO InterpSensor SET ts = 2000, value = 2.0"); + database.command("sql", "INSERT INTO InterpSensor SET ts = 3000, value = 3.0"); + }); + + final ResultSet rs = database.query("sql", "SELECT ts.interpolate(value, 'zero') AS filled FROM InterpSensor"); + @SuppressWarnings("unchecked") + final List filled = (List) rs.next().getProperty("filled"); + + assertThat(filled).hasSize(3); + assertThat(((Number) filled.get(0)).doubleValue()).isEqualTo(1.0); + assertThat(((Number) filled.get(1)).doubleValue()).isEqualTo(2.0); + assertThat(((Number) filled.get(2)).doubleValue()).isEqualTo(3.0); + } + + @Test + public void testZeroMethod() { + database.command("sql", "CREATE DOCUMENT TYPE ZeroInterp"); + database.command("sql", "CREATE PROPERTY ZeroInterp.ts LONG"); + database.command("sql", "CREATE PROPERTY ZeroInterp.value DOUBLE"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO ZeroInterp SET ts = 1000, value = 10.0"); + database.command("sql", "INSERT INTO ZeroInterp SET ts = 2000"); // null value + database.command("sql", "INSERT INTO ZeroInterp SET ts = 3000, value = 30.0"); + }); + + final ResultSet rs = database.query("sql", + "SELECT ts.interpolate(value, 'zero') AS filled FROM ZeroInterp ORDER BY ts"); + @SuppressWarnings("unchecked") + final List filled = (List) rs.next().getProperty("filled"); + + assertThat(filled).hasSize(3); + assertThat(((Number) filled.get(0)).doubleValue()).isEqualTo(10.0); + assertThat(((Number) filled.get(1)).doubleValue()).isEqualTo(0.0); + assertThat(((Number) filled.get(2)).doubleValue()).isEqualTo(30.0); + } + + @Test + public void testPrevMethodWithDocumentType() { + // Use a regular document type where nulls are properly preserved + database.command("sql", "CREATE DOCUMENT TYPE PrevInterp"); + database.command("sql", "CREATE PROPERTY PrevInterp.ts LONG"); + database.command("sql", "CREATE PROPERTY PrevInterp.value DOUBLE"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO PrevInterp SET ts = 1000, value = 10.0"); + database.command("sql", "INSERT INTO PrevInterp SET ts = 2000"); // null value + database.command("sql", "INSERT INTO PrevInterp SET ts = 3000"); // null value + database.command("sql", "INSERT INTO PrevInterp SET ts = 4000, value = 40.0"); + }); + + final ResultSet rs = database.query("sql", + "SELECT ts.interpolate(value, 'prev') AS filled FROM PrevInterp ORDER BY ts"); + @SuppressWarnings("unchecked") + final List filled = (List) rs.next().getProperty("filled"); + + assertThat(filled).hasSize(4); + assertThat(((Number) filled.get(0)).doubleValue()).isEqualTo(10.0); + assertThat(((Number) filled.get(1)).doubleValue()).isEqualTo(10.0); + assertThat(((Number) filled.get(2)).doubleValue()).isEqualTo(10.0); + assertThat(((Number) filled.get(3)).doubleValue()).isEqualTo(40.0); + } + + @Test + public void testAllNullsWithZero() { + database.command("sql", "CREATE DOCUMENT TYPE AllNullInterp"); + database.command("sql", "CREATE PROPERTY AllNullInterp.ts LONG"); + database.command("sql", "CREATE PROPERTY AllNullInterp.value DOUBLE"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO AllNullInterp SET ts = 1000"); + database.command("sql", "INSERT INTO AllNullInterp SET ts = 2000"); + }); + + final ResultSet rs = database.query("sql", + "SELECT ts.interpolate(value, 'zero') AS filled FROM AllNullInterp ORDER BY ts"); + @SuppressWarnings("unchecked") + final List filled = (List) rs.next().getProperty("filled"); + + assertThat(filled).hasSize(2); + assertThat(((Number) filled.get(0)).doubleValue()).isEqualTo(0.0); + assertThat(((Number) filled.get(1)).doubleValue()).isEqualTo(0.0); + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesFunctionMovingAvgTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesFunctionMovingAvgTest.java new file mode 100644 index 0000000000..f972bd0b75 --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesFunctionMovingAvgTest.java @@ -0,0 +1,118 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.TestHelper; +import com.arcadedb.query.sql.executor.Result; +import com.arcadedb.query.sql.executor.ResultSet; +import org.junit.jupiter.api.Test; + +import java.util.List; + +import static org.assertj.core.api.Assertions.assertThat; +import static org.assertj.core.api.Assertions.within; + +public class TimeSeriesFunctionMovingAvgTest extends TestHelper { + + @Test + public void testWindowOf3On5Values() { + database.command("sql", + "CREATE TIMESERIES TYPE MaSensor TIMESTAMP ts FIELDS (value DOUBLE)"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO MaSensor SET ts = 1000, value = 1.0"); + database.command("sql", "INSERT INTO MaSensor SET ts = 2000, value = 2.0"); + database.command("sql", "INSERT INTO MaSensor SET ts = 3000, value = 3.0"); + database.command("sql", "INSERT INTO MaSensor SET ts = 4000, value = 4.0"); + database.command("sql", "INSERT INTO MaSensor SET ts = 5000, value = 5.0"); + }); + + final ResultSet rs = database.query("sql", "SELECT ts.movingAvg(value, 3) AS ma FROM MaSensor"); + assertThat(rs.hasNext()).isTrue(); + @SuppressWarnings("unchecked") + final List ma = (List) rs.next().getProperty("ma"); + + assertThat(ma).hasSize(5); + // Position 0: avg(1) = 1.0 + assertThat(ma.get(0)).isCloseTo(1.0, within(0.001)); + // Position 1: avg(1,2) = 1.5 + assertThat(ma.get(1)).isCloseTo(1.5, within(0.001)); + // Position 2: avg(1,2,3) = 2.0 + assertThat(ma.get(2)).isCloseTo(2.0, within(0.001)); + // Position 3: avg(2,3,4) = 3.0 + assertThat(ma.get(3)).isCloseTo(3.0, within(0.001)); + // Position 4: avg(3,4,5) = 4.0 + assertThat(ma.get(4)).isCloseTo(4.0, within(0.001)); + } + + @Test + public void testWindowOf1() { + database.command("sql", + "CREATE TIMESERIES TYPE Ma1Sensor TIMESTAMP ts FIELDS (value DOUBLE)"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO Ma1Sensor SET ts = 1000, value = 10.0"); + database.command("sql", "INSERT INTO Ma1Sensor SET ts = 2000, value = 20.0"); + database.command("sql", "INSERT INTO Ma1Sensor SET ts = 3000, value = 30.0"); + }); + + final ResultSet rs = database.query("sql", "SELECT ts.movingAvg(value, 1) AS ma FROM Ma1Sensor"); + @SuppressWarnings("unchecked") + final List ma = (List) rs.next().getProperty("ma"); + + assertThat(ma).hasSize(3); + assertThat(ma.get(0)).isCloseTo(10.0, within(0.001)); + assertThat(ma.get(1)).isCloseTo(20.0, within(0.001)); + assertThat(ma.get(2)).isCloseTo(30.0, within(0.001)); + } + + @Test + public void testWindowLargerThanData() { + database.command("sql", + "CREATE TIMESERIES TYPE MaBigWindow TIMESTAMP ts FIELDS (value DOUBLE)"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO MaBigWindow SET ts = 1000, value = 4.0"); + database.command("sql", "INSERT INTO MaBigWindow SET ts = 2000, value = 8.0"); + }); + + final ResultSet rs = database.query("sql", "SELECT ts.movingAvg(value, 10) AS ma FROM MaBigWindow"); + @SuppressWarnings("unchecked") + final List ma = (List) rs.next().getProperty("ma"); + + assertThat(ma).hasSize(2); + assertThat(ma.get(0)).isCloseTo(4.0, within(0.001)); + assertThat(ma.get(1)).isCloseTo(6.0, within(0.001)); + } + + @Test + public void testEmptyInput() { + database.command("sql", + "CREATE TIMESERIES TYPE MaEmpty TIMESTAMP ts FIELDS (value DOUBLE)"); + + final ResultSet rs = database.query("sql", "SELECT ts.movingAvg(value, 3) AS ma FROM MaEmpty"); + // Empty result set means no rows returned or null result + if (rs.hasNext()) { + final Result row = rs.next(); + final Object ma = row.getProperty("ma"); + if (ma instanceof List) + assertThat((List) ma).isEmpty(); + } + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesFunctionRateTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesFunctionRateTest.java new file mode 100644 index 0000000000..0e8402856f --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesFunctionRateTest.java @@ -0,0 +1,112 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.TestHelper; +import com.arcadedb.query.sql.executor.Result; +import com.arcadedb.query.sql.executor.ResultSet; +import org.junit.jupiter.api.Test; + +import static org.assertj.core.api.Assertions.assertThat; + +public class TimeSeriesFunctionRateTest extends TestHelper { + + @Test + public void testLinearIncrease() { + database.command("sql", + "CREATE TIMESERIES TYPE RateSensor TIMESTAMP ts FIELDS (value DOUBLE)"); + + database.transaction(() -> { + // 10 values, 1 second apart, value = ts/1000 (so rate = 1.0 per second) + for (int i = 0; i < 10; i++) + database.command("sql", "INSERT INTO RateSensor SET ts = " + (i * 1000) + ", value = " + (double) i); + }); + + final ResultSet rs = database.query("sql", "SELECT ts.rate(value, ts) AS r FROM RateSensor"); + assertThat(rs.hasNext()).isTrue(); + final Result row = rs.next(); + assertThat(((Number) row.getProperty("r")).doubleValue()).isEqualTo(1.0); + } + + @Test + public void testConstantValues() { + database.command("sql", + "CREATE TIMESERIES TYPE ConstSensor TIMESTAMP ts FIELDS (value DOUBLE)"); + + database.transaction(() -> { + for (int i = 0; i < 5; i++) + database.command("sql", "INSERT INTO ConstSensor SET ts = " + (i * 1000) + ", value = 42.0"); + }); + + final ResultSet rs = database.query("sql", "SELECT ts.rate(value, ts) AS r FROM ConstSensor"); + assertThat(rs.hasNext()).isTrue(); + assertThat(((Number) rs.next().getProperty("r")).doubleValue()).isEqualTo(0.0); + } + + @Test + public void testDecreasing() { + database.command("sql", + "CREATE TIMESERIES TYPE DecSensor TIMESTAMP ts FIELDS (value DOUBLE)"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO DecSensor SET ts = 0, value = 100.0"); + database.command("sql", "INSERT INTO DecSensor SET ts = 2000, value = 80.0"); + }); + + final ResultSet rs = database.query("sql", "SELECT ts.rate(value, ts) AS r FROM DecSensor"); + assertThat(((Number) rs.next().getProperty("r")).doubleValue()).isEqualTo(-10.0); + } + + @Test + public void testSingleSample() { + database.command("sql", + "CREATE TIMESERIES TYPE SingleRateSensor TIMESTAMP ts FIELDS (value DOUBLE)"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO SingleRateSensor SET ts = 1000, value = 5.0"); + }); + + final ResultSet rs = database.query("sql", "SELECT ts.rate(value, ts) AS r FROM SingleRateSensor"); + assertThat(rs.hasNext()).isTrue(); + assertThat((Object) rs.next().getProperty("r")).isNull(); + } + + @Test + public void testWithTimeBucketGroupBy() { + database.command("sql", + "CREATE TIMESERIES TYPE BucketRateSensor TIMESTAMP ts FIELDS (value DOUBLE)"); + + database.transaction(() -> { + // Two 1-minute buckets: 0-59s and 60-119s + // Bucket 1: value goes 0 -> 10 over 10 seconds => rate 1.0/s + database.command("sql", "INSERT INTO BucketRateSensor SET ts = 0, value = 0.0"); + database.command("sql", "INSERT INTO BucketRateSensor SET ts = 10000, value = 10.0"); + // Bucket 2: value goes 100 -> 120 over 10 seconds => rate 2.0/s + database.command("sql", "INSERT INTO BucketRateSensor SET ts = 60000, value = 100.0"); + database.command("sql", "INSERT INTO BucketRateSensor SET ts = 70000, value = 120.0"); + }); + + final ResultSet rs = database.query("sql", + "SELECT ts.timeBucket('1m', ts) AS minute, ts.rate(value, ts) AS r FROM BucketRateSensor GROUP BY minute ORDER BY minute"); + final Result r1 = rs.next(); + final Result r2 = rs.next(); + assertThat(((Number) r1.getProperty("r")).doubleValue()).isEqualTo(1.0); + assertThat(((Number) r2.getProperty("r")).doubleValue()).isEqualTo(2.0); + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesGapAnalysisTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesGapAnalysisTest.java new file mode 100644 index 0000000000..1535555a0e --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesGapAnalysisTest.java @@ -0,0 +1,539 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.TestHelper; +import com.arcadedb.query.sql.executor.Result; +import com.arcadedb.query.sql.executor.ResultSet; +import com.arcadedb.schema.LocalTimeSeriesType; +import org.junit.jupiter.api.Test; + +import java.util.ArrayList; +import java.util.List; +import java.util.Set; + +import static org.assertj.core.api.Assertions.assertThat; +import static org.assertj.core.api.Assertions.within; + +/** + * Tests for gap analysis features: counter reset in ts.rate(), time range operators, + * multi-tag filtering, linear interpolation, ts.percentile. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +class TimeSeriesGapAnalysisTest extends TestHelper { + + // ===== Counter Reset Handling in ts.rate() ===== + + @Test + void testRateWithCounterReset() { + database.command("sql", + "CREATE TIMESERIES TYPE ResetCounter TIMESTAMP ts FIELDS (value DOUBLE)"); + + database.transaction(() -> { + // Counter goes 0 -> 100 then resets to 0 -> 50 + database.command("sql", "INSERT INTO ResetCounter SET ts = 0, value = 0.0"); + database.command("sql", "INSERT INTO ResetCounter SET ts = 1000, value = 50.0"); + database.command("sql", "INSERT INTO ResetCounter SET ts = 2000, value = 100.0"); + // Counter reset here + database.command("sql", "INSERT INTO ResetCounter SET ts = 3000, value = 10.0"); + database.command("sql", "INSERT INTO ResetCounter SET ts = 4000, value = 50.0"); + }); + + final ResultSet rs = database.query("sql", "SELECT ts.rate(value, ts, true) AS r FROM ResetCounter"); + assertThat(rs.hasNext()).isTrue(); + final double rate = ((Number) rs.next().getProperty("r")).doubleValue(); + // Total increase: 50 + 50 (0->50->100) + 10 + 40 (reset: 10 from 0, then +40) = 150 + // Over 4 seconds => 150/4 = 37.5/s + assertThat(rate).isCloseTo(37.5, within(0.01)); + } + + @Test + void testRateWithMultipleResets() { + database.command("sql", + "CREATE TIMESERIES TYPE MultiResetCounter TIMESTAMP ts FIELDS (value DOUBLE)"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO MultiResetCounter SET ts = 0, value = 0.0"); + database.command("sql", "INSERT INTO MultiResetCounter SET ts = 1000, value = 100.0"); + // Reset 1 + database.command("sql", "INSERT INTO MultiResetCounter SET ts = 2000, value = 20.0"); + // Reset 2 + database.command("sql", "INSERT INTO MultiResetCounter SET ts = 3000, value = 10.0"); + }); + + final ResultSet rs = database.query("sql", "SELECT ts.rate(value, ts, true) AS r FROM MultiResetCounter"); + assertThat(rs.hasNext()).isTrue(); + final double rate = ((Number) rs.next().getProperty("r")).doubleValue(); + // Total increase: 100 (0->100) + 20 (reset, +20) + 10 (reset, +10) = 130 + // Over 3 seconds => 130/3 ≈ 43.33/s + assertThat(rate).isCloseTo(130.0 / 3.0, within(0.01)); + } + + @Test + void testRateWithoutResetDetection() { + database.command("sql", + "CREATE TIMESERIES TYPE NoResetCounter TIMESTAMP ts FIELDS (value DOUBLE)"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO NoResetCounter SET ts = 0, value = 0.0"); + database.command("sql", "INSERT INTO NoResetCounter SET ts = 1000, value = 10.0"); + database.command("sql", "INSERT INTO NoResetCounter SET ts = 2000, value = 20.0"); + }); + + // Without counter reset detection (default), simple rate = (last - first) / time + final ResultSet rs = database.query("sql", "SELECT ts.rate(value, ts) AS r FROM NoResetCounter"); + assertThat(rs.hasNext()).isTrue(); + assertThat(((Number) rs.next().getProperty("r")).doubleValue()).isEqualTo(10.0); + } + + @Test + void testRateDecreasingWithoutResetDetection() { + database.command("sql", + "CREATE TIMESERIES TYPE DecGauge TIMESTAMP ts FIELDS (value DOUBLE)"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO DecGauge SET ts = 0, value = 100.0"); + database.command("sql", "INSERT INTO DecGauge SET ts = 2000, value = 80.0"); + }); + + // Without counter reset detection, decreasing values produce negative rate + final ResultSet rs = database.query("sql", "SELECT ts.rate(value, ts) AS r FROM DecGauge"); + assertThat(((Number) rs.next().getProperty("r")).doubleValue()).isEqualTo(-10.0); + } + + // ===== Time Range Operators ===== + + @Test + void testGreaterThanOperator() { + database.command("sql", + "CREATE TIMESERIES TYPE GtSensor TIMESTAMP ts FIELDS (value DOUBLE)"); + + database.transaction(() -> { + for (int i = 0; i < 10; i++) + database.command("sql", "INSERT INTO GtSensor SET ts = " + (i * 1000) + ", value = " + (double) i); + }); + + // ts > 5000 should return timestamps 6000, 7000, 8000, 9000 + final ResultSet rs = database.query("sql", "SELECT value FROM GtSensor WHERE ts > 5000"); + final List results = new ArrayList<>(); + while (rs.hasNext()) + results.add(rs.next()); + assertThat(results).hasSize(4); + } + + @Test + void testGreaterThanOrEqualOperator() { + database.command("sql", + "CREATE TIMESERIES TYPE GeSensor TIMESTAMP ts FIELDS (value DOUBLE)"); + + database.transaction(() -> { + for (int i = 0; i < 10; i++) + database.command("sql", "INSERT INTO GeSensor SET ts = " + (i * 1000) + ", value = " + (double) i); + }); + + // ts >= 5000 should return timestamps 5000, 6000, 7000, 8000, 9000 + final ResultSet rs = database.query("sql", "SELECT value FROM GeSensor WHERE ts >= 5000"); + final List results = new ArrayList<>(); + while (rs.hasNext()) + results.add(rs.next()); + assertThat(results).hasSize(5); + } + + @Test + void testLessThanOperator() { + database.command("sql", + "CREATE TIMESERIES TYPE LtSensor TIMESTAMP ts FIELDS (value DOUBLE)"); + + database.transaction(() -> { + for (int i = 0; i < 10; i++) + database.command("sql", "INSERT INTO LtSensor SET ts = " + (i * 1000) + ", value = " + (double) i); + }); + + // ts < 3000 should return timestamps 0, 1000, 2000 + final ResultSet rs = database.query("sql", "SELECT value FROM LtSensor WHERE ts < 3000"); + final List results = new ArrayList<>(); + while (rs.hasNext()) + results.add(rs.next()); + assertThat(results).hasSize(3); + } + + @Test + void testCombinedRangeOperators() { + database.command("sql", + "CREATE TIMESERIES TYPE CombinedSensor TIMESTAMP ts FIELDS (value DOUBLE)"); + + database.transaction(() -> { + for (int i = 0; i < 10; i++) + database.command("sql", "INSERT INTO CombinedSensor SET ts = " + (i * 1000) + ", value = " + (double) i); + }); + + // ts >= 2000 AND ts <= 5000 should return timestamps 2000, 3000, 4000, 5000 + final ResultSet rs = database.query("sql", "SELECT value FROM CombinedSensor WHERE ts >= 2000 AND ts <= 5000"); + final List results = new ArrayList<>(); + while (rs.hasNext()) + results.add(rs.next()); + assertThat(results).hasSize(4); + } + + // ===== Multi-Tag Filtering ===== + + @Test + void testMultiTagFilter() { + final TagFilter filter = TagFilter.eq(0, "us-east") + .and(1, "prod"); + + // Row: [timestamp, tag1, tag2, value] + final Object[] matchingRow = { 1000L, "us-east", "prod", 42.0 }; + final Object[] wrongRegion = { 1000L, "eu-west", "prod", 42.0 }; + final Object[] wrongEnv = { 1000L, "us-east", "dev", 42.0 }; + + assertThat(filter.matches(matchingRow)).isTrue(); + assertThat(filter.matches(wrongRegion)).isFalse(); + assertThat(filter.matches(wrongEnv)).isFalse(); + } + + @Test + void testMultiTagFilterWithIn() { + final TagFilter filter = TagFilter.in(0, Set.of("us-east", "us-west")) + .and(1, "prod"); + + final Object[] row1 = { 1000L, "us-east", "prod", 42.0 }; + final Object[] row2 = { 1000L, "us-west", "prod", 42.0 }; + final Object[] row3 = { 1000L, "eu-west", "prod", 42.0 }; + + assertThat(filter.matches(row1)).isTrue(); + assertThat(filter.matches(row2)).isTrue(); + assertThat(filter.matches(row3)).isFalse(); + } + + @Test + void testSingleTagFilterBackwardCompatibility() { + final TagFilter filter = TagFilter.eq(0, "sensor-1"); + assertThat(filter.getColumnIndex()).isEqualTo(0); + assertThat(filter.getConditionCount()).isEqualTo(1); + + final Object[] match = { 1000L, "sensor-1", 42.0 }; + final Object[] noMatch = { 1000L, "sensor-2", 42.0 }; + assertThat(filter.matches(match)).isTrue(); + assertThat(filter.matches(noMatch)).isFalse(); + } + + // ===== Linear Interpolation ===== + + @Test + void testLinearInterpolation() { + database.command("sql", "CREATE DOCUMENT TYPE LinearInterp"); + database.command("sql", "CREATE PROPERTY LinearInterp.ts LONG"); + database.command("sql", "CREATE PROPERTY LinearInterp.value DOUBLE"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO LinearInterp SET ts = 1000, value = 10.0"); + database.command("sql", "INSERT INTO LinearInterp SET ts = 2000"); // null value + database.command("sql", "INSERT INTO LinearInterp SET ts = 3000, value = 30.0"); + }); + + final ResultSet rs = database.query("sql", + "SELECT ts.interpolate(value, 'linear', ts) AS filled FROM LinearInterp ORDER BY ts"); + @SuppressWarnings("unchecked") + final List filled = (List) rs.next().getProperty("filled"); + + assertThat(filled).hasSize(3); + assertThat(((Number) filled.get(0)).doubleValue()).isEqualTo(10.0); + assertThat(((Number) filled.get(1)).doubleValue()).isCloseTo(20.0, within(0.01)); // linearly interpolated + assertThat(((Number) filled.get(2)).doubleValue()).isEqualTo(30.0); + } + + @Test + void testLinearInterpolationMultipleGaps() { + database.command("sql", "CREATE DOCUMENT TYPE LinearMultiGap"); + database.command("sql", "CREATE PROPERTY LinearMultiGap.ts LONG"); + database.command("sql", "CREATE PROPERTY LinearMultiGap.value DOUBLE"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO LinearMultiGap SET ts = 0, value = 0.0"); + database.command("sql", "INSERT INTO LinearMultiGap SET ts = 1000"); // null + database.command("sql", "INSERT INTO LinearMultiGap SET ts = 2000"); // null + database.command("sql", "INSERT INTO LinearMultiGap SET ts = 3000, value = 30.0"); + }); + + final ResultSet rs = database.query("sql", + "SELECT ts.interpolate(value, 'linear', ts) AS filled FROM LinearMultiGap ORDER BY ts"); + @SuppressWarnings("unchecked") + final List filled = (List) rs.next().getProperty("filled"); + + assertThat(filled).hasSize(4); + assertThat(((Number) filled.get(0)).doubleValue()).isEqualTo(0.0); + assertThat(((Number) filled.get(1)).doubleValue()).isCloseTo(10.0, within(0.01)); + assertThat(((Number) filled.get(2)).doubleValue()).isCloseTo(20.0, within(0.01)); + assertThat(((Number) filled.get(3)).doubleValue()).isEqualTo(30.0); + } + + @Test + void testLinearInterpolationNoGaps() { + database.command("sql", + "CREATE TIMESERIES TYPE LinearNoGap TIMESTAMP ts FIELDS (value DOUBLE)"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO LinearNoGap SET ts = 1000, value = 1.0"); + database.command("sql", "INSERT INTO LinearNoGap SET ts = 2000, value = 2.0"); + database.command("sql", "INSERT INTO LinearNoGap SET ts = 3000, value = 3.0"); + }); + + final ResultSet rs = database.query("sql", + "SELECT ts.interpolate(value, 'linear', ts) AS filled FROM LinearNoGap"); + @SuppressWarnings("unchecked") + final List filled = (List) rs.next().getProperty("filled"); + + assertThat(filled).hasSize(3); + assertThat(((Number) filled.get(0)).doubleValue()).isEqualTo(1.0); + assertThat(((Number) filled.get(1)).doubleValue()).isEqualTo(2.0); + assertThat(((Number) filled.get(2)).doubleValue()).isEqualTo(3.0); + } + + // ===== ts.percentile() ===== + + @Test + void testPercentileP50() { + database.command("sql", + "CREATE TIMESERIES TYPE PercSensor TIMESTAMP ts FIELDS (value DOUBLE)"); + + database.transaction(() -> { + for (int i = 1; i <= 100; i++) + database.command("sql", "INSERT INTO PercSensor SET ts = " + (i * 1000) + ", value = " + (double) i); + }); + + final ResultSet rs = database.query("sql", "SELECT ts.percentile(value, 0.5) AS p50 FROM PercSensor"); + assertThat(rs.hasNext()).isTrue(); + final double p50 = ((Number) rs.next().getProperty("p50")).doubleValue(); + assertThat(p50).isCloseTo(50.5, within(0.5)); + } + + @Test + void testPercentileP95() { + database.command("sql", + "CREATE TIMESERIES TYPE Perc95Sensor TIMESTAMP ts FIELDS (value DOUBLE)"); + + database.transaction(() -> { + for (int i = 1; i <= 100; i++) + database.command("sql", "INSERT INTO Perc95Sensor SET ts = " + (i * 1000) + ", value = " + (double) i); + }); + + final ResultSet rs = database.query("sql", "SELECT ts.percentile(value, 0.95) AS p95 FROM Perc95Sensor"); + assertThat(rs.hasNext()).isTrue(); + final double p95 = ((Number) rs.next().getProperty("p95")).doubleValue(); + assertThat(p95).isCloseTo(95.0, within(1.0)); + } + + @Test + void testPercentileP99() { + database.command("sql", + "CREATE TIMESERIES TYPE Perc99Sensor TIMESTAMP ts FIELDS (value DOUBLE)"); + + database.transaction(() -> { + for (int i = 1; i <= 1000; i++) + database.command("sql", "INSERT INTO Perc99Sensor SET ts = " + (i * 100) + ", value = " + (double) i); + }); + + final ResultSet rs = database.query("sql", "SELECT ts.percentile(value, 0.99) AS p99 FROM Perc99Sensor"); + assertThat(rs.hasNext()).isTrue(); + final double p99 = ((Number) rs.next().getProperty("p99")).doubleValue(); + assertThat(p99).isCloseTo(990.0, within(2.0)); + } + + @Test + void testPercentileWithGroupBy() { + database.command("sql", + "CREATE TIMESERIES TYPE PercGroupSensor TIMESTAMP ts FIELDS (value DOUBLE)"); + + database.transaction(() -> { + // Bucket 1: values 1-10 + for (int i = 1; i <= 10; i++) + database.command("sql", "INSERT INTO PercGroupSensor SET ts = " + (i * 1000) + ", value = " + (double) i); + // Bucket 2: values 100-110 + for (int i = 1; i <= 10; i++) + database.command("sql", "INSERT INTO PercGroupSensor SET ts = " + (60000 + i * 1000) + ", value = " + (double) (100 + i)); + }); + + final ResultSet rs = database.query("sql", + "SELECT ts.timeBucket('1m', ts) AS minute, ts.percentile(value, 0.5) AS p50 FROM PercGroupSensor GROUP BY minute ORDER BY minute"); + + final Result r1 = rs.next(); + final Result r2 = rs.next(); + assertThat(((Number) r1.getProperty("p50")).doubleValue()).isCloseTo(5.5, within(0.5)); + assertThat(((Number) r2.getProperty("p50")).doubleValue()).isCloseTo(105.5, within(0.5)); + } + + // ===== Automatic Scheduler ===== + + @Test + void testMaintenanceSchedulerCreation() { + final TimeSeriesMaintenanceScheduler scheduler = new TimeSeriesMaintenanceScheduler(); + // Should not throw + scheduler.cancel("nonexistent"); + scheduler.shutdown(); + } + + @Test + void testMaintenanceSchedulerAlwaysSchedulesEvenWithoutPolicies() { + // Regression: scheduler previously skipped scheduling when no retention/downsampling + // policy was set, leaving the mutable bucket growing unboundedly. + database.command("sql", "CREATE TIMESERIES TYPE NoPolicySeries TIMESTAMP ts FIELDS (value DOUBLE)"); + + final LocalTimeSeriesType tsType = (LocalTimeSeriesType) database.getSchema().getType("NoPolicySeries"); + assertThat(tsType.getRetentionMs()).isZero(); + assertThat(tsType.getDownsamplingTiers()).isEmpty(); + + final TimeSeriesMaintenanceScheduler scheduler = new TimeSeriesMaintenanceScheduler(); + try { + // schedule() must register a task even though no policies are configured + scheduler.schedule(database, tsType); + // Verify a task was registered: cancelling it should not throw + scheduler.cancel("NoPolicySeries"); + } finally { + scheduler.shutdown(); + } + } + + // ===== Tag Filter on Aggregation Queries ===== + + @Test + void testTagFilterOnAggregation() { + database.command("sql", + "CREATE TIMESERIES TYPE TagAggStocks TIMESTAMP ts TAGS (symbol STRING) FIELDS (price DOUBLE)"); + + database.transaction(() -> { + // Insert data for two symbols + for (int i = 0; i < 10; i++) { + database.command("sql", "INSERT INTO TagAggStocks SET ts = " + (i * 1000) + ", symbol = 'TSLA', price = " + (100.0 + i)); + database.command("sql", "INSERT INTO TagAggStocks SET ts = " + (i * 1000) + ", symbol = 'AAPL', price = " + (200.0 + i)); + } + }); + + // Query for TSLA only — should get average around 104.5 (100..109) + final ResultSet rs = database.query("sql", + "SELECT avg(price) AS avg_price FROM TagAggStocks WHERE symbol = 'TSLA'"); + assertThat(rs.hasNext()).isTrue(); + final Result row = rs.next(); + final double avgPrice = ((Number) row.getProperty("avg_price")).doubleValue(); + // TSLA prices: 100, 101, ..., 109 => avg = 104.5 + assertThat(avgPrice).isCloseTo(104.5, within(0.01)); + + // Query for AAPL only — should get average around 204.5 (200..209) + final ResultSet rs2 = database.query("sql", + "SELECT avg(price) AS avg_price FROM TagAggStocks WHERE symbol = 'AAPL'"); + assertThat(rs2.hasNext()).isTrue(); + final double avgPrice2 = ((Number) rs2.next().getProperty("avg_price")).doubleValue(); + assertThat(avgPrice2).isCloseTo(204.5, within(0.01)); + + // Query for nonexistent symbol — should get no results + final ResultSet rs3 = database.query("sql", + "SELECT avg(price) AS avg_price FROM TagAggStocks WHERE symbol = 'MSFT'"); + if (rs3.hasNext()) { + final Object val = rs3.next().getProperty("avg_price"); + // Should be null or empty + assertThat(val).isNull(); + } + } + + // ===== Block-Level Tag Filter Tests (after compaction) ===== + + @Test + void testTagFilterBlockSkipping() throws Exception { + database.command("sql", + "CREATE TIMESERIES TYPE BlockSkipStocks TIMESTAMP ts TAGS (symbol STRING) FIELDS (price DOUBLE)"); + + database.transaction(() -> { + for (int i = 0; i < 20; i++) { + database.command("sql", "INSERT INTO BlockSkipStocks SET ts = " + (i * 1000) + ", symbol = 'TSLA', price = " + (100.0 + i)); + database.command("sql", "INSERT INTO BlockSkipStocks SET ts = " + (i * 1000 + 1) + ", symbol = 'AAPL', price = " + (200.0 + i)); + } + }); + + // Compact to create sealed blocks with tag metadata + ((LocalTimeSeriesType) database.getSchema().getType("BlockSkipStocks")).getEngine().compactAll(); + + // Query for TSLA only — verify correct results + final ResultSet rs = database.query("sql", + "SELECT price FROM BlockSkipStocks WHERE symbol = 'TSLA' ORDER BY ts"); + final List results = new ArrayList<>(); + while (rs.hasNext()) + results.add(rs.next()); + assertThat(results).hasSize(20); + assertThat(((Number) results.getFirst().getProperty("price")).doubleValue()).isCloseTo(100.0, within(0.01)); + assertThat(((Number) results.getLast().getProperty("price")).doubleValue()).isCloseTo(119.0, within(0.01)); + } + + @Test + void testTagFilterAggregationAfterCompaction() throws Exception { + database.command("sql", + "CREATE TIMESERIES TYPE AggCompactStocks TIMESTAMP ts TAGS (symbol STRING) FIELDS (price DOUBLE)"); + + database.transaction(() -> { + for (int i = 0; i < 10; i++) { + database.command("sql", "INSERT INTO AggCompactStocks SET ts = " + (i * 1000) + ", symbol = 'TSLA', price = " + (100.0 + i)); + database.command("sql", "INSERT INTO AggCompactStocks SET ts = " + (i * 1000 + 1) + ", symbol = 'AAPL', price = " + (200.0 + i)); + } + }); + + // Compact to create sealed blocks with tag metadata + ((LocalTimeSeriesType) database.getSchema().getType("AggCompactStocks")).getEngine().compactAll(); + + // AVG for TSLA only — should be 104.5 (100..109) + final ResultSet rs = database.query("sql", + "SELECT avg(price) AS avg_price FROM AggCompactStocks WHERE symbol = 'TSLA'"); + assertThat(rs.hasNext()).isTrue(); + final double avgPrice = ((Number) rs.next().getProperty("avg_price")).doubleValue(); + assertThat(avgPrice).isCloseTo(104.5, within(0.01)); + + // SUM for AAPL only — should be 2045 (200+201+...+209) + final ResultSet rs2 = database.query("sql", + "SELECT sum(price) AS sum_price FROM AggCompactStocks WHERE symbol = 'AAPL'"); + assertThat(rs2.hasNext()).isTrue(); + final double sumPrice = ((Number) rs2.next().getProperty("sum_price")).doubleValue(); + assertThat(sumPrice).isCloseTo(2045.0, within(0.01)); + } + + @Test + void testTagFilterNonexistentTag() throws Exception { + database.command("sql", + "CREATE TIMESERIES TYPE NonExistTagStocks TIMESTAMP ts TAGS (symbol STRING) FIELDS (price DOUBLE)"); + + database.transaction(() -> { + for (int i = 0; i < 5; i++) + database.command("sql", "INSERT INTO NonExistTagStocks SET ts = " + (i * 1000) + ", symbol = 'TSLA', price = " + (100.0 + i)); + }); + + // Compact to create sealed blocks with tag metadata + ((LocalTimeSeriesType) database.getSchema().getType("NonExistTagStocks")).getEngine().compactAll(); + + // Query for MSFT which doesn't exist — should return empty or null + final ResultSet rs = database.query("sql", + "SELECT avg(price) AS avg_price FROM NonExistTagStocks WHERE symbol = 'MSFT'"); + if (rs.hasNext()) { + final Object val = rs.next().getProperty("avg_price"); + assertThat(val).isNull(); + } + + // Also verify raw query returns no rows + final ResultSet rs2 = database.query("sql", + "SELECT price FROM NonExistTagStocks WHERE symbol = 'MSFT'"); + assertThat(rs2.hasNext()).isFalse(); + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesNamespaceTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesNamespaceTest.java new file mode 100644 index 0000000000..b33d663a71 --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesNamespaceTest.java @@ -0,0 +1,110 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.TestHelper; +import com.arcadedb.query.sql.executor.Result; +import com.arcadedb.query.sql.executor.ResultSet; +import org.junit.jupiter.api.Test; + +import static org.assertj.core.api.Assertions.assertThat; + +/** + * Tests that ts.* namespaced functions are correctly resolved by the SQL parser. + */ +public class TimeSeriesNamespaceTest extends TestHelper { + + @Test + public void testTsFirstNamespace() { + database.command("sql", "CREATE TIMESERIES TYPE NsSensor TIMESTAMP ts FIELDS (value DOUBLE)"); + database.transaction(() -> { + database.command("sql", "INSERT INTO NsSensor SET ts = 1000, value = 10.0"); + database.command("sql", "INSERT INTO NsSensor SET ts = 2000, value = 20.0"); + database.command("sql", "INSERT INTO NsSensor SET ts = 3000, value = 30.0"); + }); + + final ResultSet rs = database.query("sql", + "SELECT ts.first(value, ts) AS first_val FROM NsSensor"); + assertThat(rs.hasNext()).isTrue(); + final Result row = rs.next(); + assertThat(((Number) row.getProperty("first_val")).doubleValue()).isEqualTo(10.0); + rs.close(); + } + + @Test + public void testTsTimeBucketNamespace() { + database.command("sql", "CREATE TIMESERIES TYPE BucketNs TIMESTAMP ts FIELDS (value DOUBLE)"); + database.transaction(() -> { + for (int i = 0; i < 10; i++) + database.command("sql", "INSERT INTO BucketNs SET ts = " + (i * 1000) + ", value = " + (i * 1.0)); + }); + + final ResultSet rs = database.query("sql", + "SELECT ts.timeBucket('5s', ts) AS tb, avg(value) AS avg_val FROM BucketNs GROUP BY tb ORDER BY tb"); + assertThat(rs.hasNext()).isTrue(); + int count = 0; + while (rs.hasNext()) { + rs.next(); + count++; + } + assertThat(count).isEqualTo(2); + rs.close(); + } + + @Test + public void testTsRateWithTimeBucket() { + database.command("sql", "CREATE TIMESERIES TYPE RateNs TIMESTAMP ts FIELDS (value DOUBLE)"); + database.transaction(() -> { + database.command("sql", "INSERT INTO RateNs SET ts = 1000, value = 10.0"); + database.command("sql", "INSERT INTO RateNs SET ts = 2000, value = 20.0"); + database.command("sql", "INSERT INTO RateNs SET ts = 3000, value = 30.0"); + database.command("sql", "INSERT INTO RateNs SET ts = 4000, value = 40.0"); + }); + + final ResultSet rs = database.query("sql", + "SELECT ts.timeBucket('2s', ts) AS tb, ts.rate(value, ts) AS r FROM RateNs GROUP BY tb ORDER BY tb"); + assertThat(rs.hasNext()).isTrue(); + int count = 0; + while (rs.hasNext()) { + rs.next(); + count++; + } + assertThat(count).isGreaterThan(0); + rs.close(); + } + + @Test + public void testMixedNamespacedAndRegularFunctions() { + database.command("sql", "CREATE TIMESERIES TYPE MixedNs TIMESTAMP ts FIELDS (value DOUBLE)"); + database.transaction(() -> { + database.command("sql", "INSERT INTO MixedNs SET ts = 1000, value = 10.0"); + database.command("sql", "INSERT INTO MixedNs SET ts = 2000, value = 20.0"); + database.command("sql", "INSERT INTO MixedNs SET ts = 3000, value = 30.0"); + }); + + final ResultSet rs = database.query("sql", + "SELECT ts.first(value, ts) AS first_val, avg(value) AS avg_val, count(*) AS cnt FROM MixedNs"); + assertThat(rs.hasNext()).isTrue(); + final Result row = rs.next(); + assertThat(((Number) row.getProperty("first_val")).doubleValue()).isEqualTo(10.0); + assertThat(((Number) row.getProperty("avg_val")).doubleValue()).isEqualTo(20.0); + assertThat(((Number) row.getProperty("cnt")).longValue()).isEqualTo(3L); + rs.close(); + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesPhase2SQLTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesPhase2SQLTest.java new file mode 100644 index 0000000000..a58028bc30 --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesPhase2SQLTest.java @@ -0,0 +1,173 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.TestHelper; +import com.arcadedb.query.sql.executor.Result; +import com.arcadedb.query.sql.executor.ResultSet; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; + +import java.util.ArrayList; +import java.util.List; + +import static org.assertj.core.api.Assertions.assertThat; +import static org.assertj.core.api.Assertions.within; + +/** + * End-to-end SQL integration tests for all Phase 2 TimeSeries functions. + */ +public class TimeSeriesPhase2SQLTest extends TestHelper { + + @BeforeEach + public void setupData() { + database.command("sql", + "CREATE TIMESERIES TYPE SensorData TIMESTAMP ts TAGS (sensor STRING) FIELDS (value DOUBLE)"); + + database.transaction(() -> { + // Sensor A: linear increase 0..9 over 10 seconds + for (int i = 0; i < 10; i++) + database.command("sql", + "INSERT INTO SensorData SET ts = " + (i * 1000) + ", sensor = 'A', value = " + (double) i); + + // Sensor B: decreasing 100..91 over 10 seconds + for (int i = 0; i < 10; i++) + database.command("sql", + "INSERT INTO SensorData SET ts = " + (i * 1000) + ", sensor = 'B', value = " + (100.0 - i)); + }); + } + + @Test + public void testTsFirstTsLastGroupBySensor() { + final ResultSet rs = database.query("sql", + "SELECT sensor, ts.first(value, ts) AS first_val, ts.last(value, ts) AS last_val " + + "FROM SensorData GROUP BY sensor ORDER BY sensor"); + + final List results = new ArrayList<>(); + while (rs.hasNext()) + results.add(rs.next()); + + assertThat(results).hasSize(2); + // Sensor A: first=0, last=9 + assertThat(((Number) results.get(0).getProperty("first_val")).doubleValue()).isEqualTo(0.0); + assertThat(((Number) results.get(0).getProperty("last_val")).doubleValue()).isEqualTo(9.0); + // Sensor B: first=100, last=91 + assertThat(((Number) results.get(1).getProperty("first_val")).doubleValue()).isEqualTo(100.0); + assertThat(((Number) results.get(1).getProperty("last_val")).doubleValue()).isEqualTo(91.0); + } + + @Test + public void testRateGroupBySensor() { + final ResultSet rs = database.query("sql", + "SELECT sensor, ts.rate(value, ts) AS r FROM SensorData GROUP BY sensor ORDER BY sensor"); + + final List results = new ArrayList<>(); + while (rs.hasNext()) + results.add(rs.next()); + + assertThat(results).hasSize(2); + // Sensor A: (9-0)/(9000ms) * 1000 = 1.0 per second + assertThat(((Number) results.get(0).getProperty("r")).doubleValue()).isCloseTo(1.0, within(0.001)); + // Sensor B: (91-100)/(9000ms) * 1000 = -1.0 per second + assertThat(((Number) results.get(1).getProperty("r")).doubleValue()).isCloseTo(-1.0, within(0.001)); + } + + @Test + public void testDeltaGroupBySensor() { + final ResultSet rs = database.query("sql", + "SELECT sensor, ts.delta(value, ts) AS d FROM SensorData GROUP BY sensor ORDER BY sensor"); + + final List results = new ArrayList<>(); + while (rs.hasNext()) + results.add(rs.next()); + + assertThat(results).hasSize(2); + assertThat(((Number) results.get(0).getProperty("d")).doubleValue()).isEqualTo(9.0); + assertThat(((Number) results.get(1).getProperty("d")).doubleValue()).isEqualTo(-9.0); + } + + @Test + public void testCorrelateAcrossSensors() { + // Create a joined view with both sensors' values + database.command("sql", + "CREATE TIMESERIES TYPE JoinedData TIMESTAMP ts FIELDS (a DOUBLE, b DOUBLE)"); + + database.transaction(() -> { + for (int i = 0; i < 10; i++) + database.command("sql", + "INSERT INTO JoinedData SET ts = " + (i * 1000) + ", a = " + (double) i + ", b = " + (100.0 - i)); + }); + + final ResultSet rs = database.query("sql", "SELECT ts.correlate(a, b) AS corr FROM JoinedData"); + assertThat(rs.hasNext()).isTrue(); + // Perfect negative correlation + assertThat(((Number) rs.next().getProperty("corr")).doubleValue()).isCloseTo(-1.0, within(0.001)); + } + + @Test + public void testMovingAvgOnSensor() { + final ResultSet rs = database.query("sql", + "SELECT ts.movingAvg(value, 3) AS ma FROM SensorData WHERE sensor = 'A'"); + + assertThat(rs.hasNext()).isTrue(); + @SuppressWarnings("unchecked") + final List ma = (List) rs.next().getProperty("ma"); + assertThat(ma).hasSize(10); + // Position 2: avg(0,1,2) = 1.0 + assertThat(ma.get(2)).isCloseTo(1.0, within(0.001)); + // Position 9: avg(7,8,9) = 8.0 + assertThat(ma.get(9)).isCloseTo(8.0, within(0.001)); + } + + @Test + public void testRateWithTimeBucket() { + final ResultSet rs = database.query("sql", + "SELECT ts.timeBucket('5s', ts) AS tb, ts.rate(value, ts) AS r " + + "FROM SensorData WHERE sensor = 'A' GROUP BY tb ORDER BY tb"); + + final List results = new ArrayList<>(); + while (rs.hasNext()) + results.add(rs.next()); + + assertThat(results).hasSize(2); + // Both buckets should have rate of 1.0/s + assertThat(((Number) results.get(0).getProperty("r")).doubleValue()).isCloseTo(1.0, within(0.001)); + assertThat(((Number) results.get(1).getProperty("r")).doubleValue()).isCloseTo(1.0, within(0.001)); + } + + @Test + public void testAllFunctionsTogether() { + // Single query using ts_first, ts_last, rate, and delta + final ResultSet rs = database.query("sql", + "SELECT sensor, ts.first(value, ts) AS first_val, ts.last(value, ts) AS last_val, " + + "ts.rate(value, ts) AS r, ts.delta(value, ts) AS d " + + "FROM SensorData GROUP BY sensor ORDER BY sensor"); + + final List results = new ArrayList<>(); + while (rs.hasNext()) + results.add(rs.next()); + + assertThat(results).hasSize(2); + final Result a = results.get(0); + assertThat(((Number) a.getProperty("first_val")).doubleValue()).isEqualTo(0.0); + assertThat(((Number) a.getProperty("last_val")).doubleValue()).isEqualTo(9.0); + assertThat(((Number) a.getProperty("r")).doubleValue()).isCloseTo(1.0, within(0.001)); + assertThat(((Number) a.getProperty("d")).doubleValue()).isEqualTo(9.0); + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesRetentionTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesRetentionTest.java new file mode 100644 index 0000000000..d4636e7b7f --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesRetentionTest.java @@ -0,0 +1,298 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.TestHelper; +import com.arcadedb.database.DatabaseInternal; +import com.arcadedb.schema.Type; +import org.junit.jupiter.api.Test; + +import java.util.List; + +import static org.assertj.core.api.Assertions.assertThat; + +/** + * Tests for TimeSeries retention policy execution. + * Retention is block-granular: entire blocks are removed when their maxTimestamp < cutoff. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +class TimeSeriesRetentionTest extends TestHelper { + + private List createTestColumns() { + return List.of( + new ColumnDefinition("ts", Type.LONG, ColumnDefinition.ColumnRole.TIMESTAMP), + new ColumnDefinition("sensor_id", Type.STRING, ColumnDefinition.ColumnRole.TAG), + new ColumnDefinition("temperature", Type.DOUBLE, ColumnDefinition.ColumnRole.FIELD) + ); + } + + @Test + void testRetentionRemovesOldBlocks() throws Exception { + final DatabaseInternal db = (DatabaseInternal) database; + final List columns = createTestColumns(); + + database.begin(); + final TimeSeriesEngine engine = new TimeSeriesEngine(db, "retention_test", columns, 1); + + // Insert first batch (timestamps 1000-2000) and compact → block 1 + engine.appendSamples( + new long[] { 1000, 2000 }, + new Object[] { "sensor_A", "sensor_A" }, + new Object[] { 10.0, 20.0 } + ); + database.commit(); + + try { + database.begin(); + engine.compactAll(); + database.commit(); + + // Insert second batch (timestamps 3000-4000) and compact → block 2 + database.begin(); + engine.appendSamples( + new long[] { 3000, 4000 }, + new Object[] { "sensor_A", "sensor_A" }, + new Object[] { 30.0, 40.0 } + ); + database.commit(); + + database.begin(); + engine.compactAll(); + database.commit(); + + // Insert third batch (timestamps 5000-6000) and compact → block 3 + database.begin(); + engine.appendSamples( + new long[] { 5000, 6000 }, + new Object[] { "sensor_A", "sensor_A" }, + new Object[] { 50.0, 60.0 } + ); + database.commit(); + + database.begin(); + engine.compactAll(); + database.commit(); + + // Verify 3 sealed blocks and 6 total samples + assertThat(engine.getShard(0).getSealedStore().getBlockCount()).isEqualTo(3); + + database.begin(); + final List allBefore = engine.query(Long.MIN_VALUE, Long.MAX_VALUE, null, null); + assertThat(allBefore).hasSize(6); + database.commit(); + + // Apply retention: remove blocks with maxTimestamp < 2500 + // This removes block 1 (maxTs=2000), keeps blocks 2 and 3 + engine.applyRetention(2500L); + + assertThat(engine.getShard(0).getSealedStore().getBlockCount()).isEqualTo(2); + + database.begin(); + final List allAfter = engine.query(Long.MIN_VALUE, Long.MAX_VALUE, null, null); + assertThat(allAfter).hasSize(4); + assertThat((long) allAfter.get(0)[0]).isEqualTo(3000L); + database.commit(); + } finally { + engine.close(); + } + } + + @Test + void testRetentionWithNoDataToRemove() throws Exception { + final DatabaseInternal db = (DatabaseInternal) database; + final List columns = createTestColumns(); + + database.begin(); + final TimeSeriesEngine engine = new TimeSeriesEngine(db, "retention_noop_test", columns, 1); + + engine.appendSamples( + new long[] { 1000, 2000, 3000 }, + new Object[] { "sensor_B", "sensor_B", "sensor_B" }, + new Object[] { 10.0, 20.0, 30.0 } + ); + database.commit(); + + try { + database.begin(); + engine.compactAll(); + database.commit(); + + // Apply retention with a cutoff older than all data + engine.applyRetention(500L); + + // All data should remain (block maxTs=3000 >= 500) + assertThat(engine.getShard(0).getSealedStore().getBlockCount()).isEqualTo(1); + + database.begin(); + final List allData = engine.query(Long.MIN_VALUE, Long.MAX_VALUE, null, null); + assertThat(allData).hasSize(3); + database.commit(); + } finally { + engine.close(); + } + } + + @Test + void testRetentionRemovesAllBlocks() throws Exception { + final DatabaseInternal db = (DatabaseInternal) database; + final List columns = createTestColumns(); + + database.begin(); + final TimeSeriesEngine engine = new TimeSeriesEngine(db, "retention_all_test", columns, 1); + + engine.appendSamples( + new long[] { 1000, 2000, 3000 }, + new Object[] { "sensor_C", "sensor_C", "sensor_C" }, + new Object[] { 10.0, 20.0, 30.0 } + ); + database.commit(); + + try { + database.begin(); + engine.compactAll(); + database.commit(); + + // Apply retention with cutoff newer than all data + engine.applyRetention(10000L); + + // All blocks removed + assertThat(engine.getShard(0).getSealedStore().getBlockCount()).isEqualTo(0); + + database.begin(); + final List allData = engine.query(Long.MIN_VALUE, Long.MAX_VALUE, null, null); + assertThat(allData).isEmpty(); + database.commit(); + } finally { + engine.close(); + } + } + + @Test + void testRetentionWithMultipleShards() throws Exception { + final DatabaseInternal db = (DatabaseInternal) database; + final List columns = createTestColumns(); + + database.begin(); + final TimeSeriesEngine engine = new TimeSeriesEngine(db, "retention_shards_test", columns, 2); + + // Insert old data into shard 0 and compact → block with maxTs=2000 + engine.getShard(0).appendSamples( + new long[] { 1000, 2000 }, + new Object[] { "sensor_1", "sensor_1" }, + new Object[] { 10.0, 20.0 } + ); + database.commit(); + + try { + database.begin(); + engine.getShard(0).compact(); + database.commit(); + + // Insert recent data into shard 0 and compact → block with maxTs=4000 + database.begin(); + engine.getShard(0).appendSamples( + new long[] { 3000, 4000 }, + new Object[] { "sensor_1", "sensor_1" }, + new Object[] { 30.0, 40.0 } + ); + database.commit(); + + database.begin(); + engine.getShard(0).compact(); + database.commit(); + + // Insert old data into shard 1 and compact → block with maxTs=1500 + database.begin(); + engine.getShard(1).appendSamples( + new long[] { 500, 1500 }, + new Object[] { "sensor_2", "sensor_2" }, + new Object[] { 5.0, 15.0 } + ); + database.commit(); + + database.begin(); + engine.getShard(1).compact(); + database.commit(); + + // Insert recent data into shard 1 and compact → block with maxTs=5000 + database.begin(); + engine.getShard(1).appendSamples( + new long[] { 4500, 5000 }, + new Object[] { "sensor_2", "sensor_2" }, + new Object[] { 45.0, 50.0 } + ); + database.commit(); + + database.begin(); + engine.getShard(1).compact(); + database.commit(); + + // Verify: 2 blocks in each shard, 8 total samples + assertThat(engine.getShard(0).getSealedStore().getBlockCount()).isEqualTo(2); + assertThat(engine.getShard(1).getSealedStore().getBlockCount()).isEqualTo(2); + + database.begin(); + final List allBefore = engine.query(Long.MIN_VALUE, Long.MAX_VALUE, null, null); + assertThat(allBefore).hasSize(8); + database.commit(); + + // Apply retention: remove blocks with maxTs < 2500 + // Shard 0: removes block(maxTs=2000), keeps block(maxTs=4000) + // Shard 1: removes block(maxTs=1500), keeps block(maxTs=5000) + engine.applyRetention(2500L); + + assertThat(engine.getShard(0).getSealedStore().getBlockCount()).isEqualTo(1); + assertThat(engine.getShard(1).getSealedStore().getBlockCount()).isEqualTo(1); + + database.begin(); + final List allAfter = engine.query(Long.MIN_VALUE, Long.MAX_VALUE, null, null); + assertThat(allAfter).hasSize(4); + for (final Object[] row : allAfter) + assertThat((long) row[0]).isGreaterThanOrEqualTo(3000L); + database.commit(); + } finally { + engine.close(); + } + } + + @Test + void testRetentionOnEmptyEngine() throws Exception { + final DatabaseInternal db = (DatabaseInternal) database; + final List columns = createTestColumns(); + + database.begin(); + final TimeSeriesEngine engine = new TimeSeriesEngine(db, "retention_empty_test", columns, 1); + database.commit(); + + try { + // Apply retention on empty engine — should not throw + engine.applyRetention(5000L); + + assertThat(engine.getShard(0).getSealedStore().getBlockCount()).isEqualTo(0); + + database.begin(); + final List allData = engine.query(Long.MIN_VALUE, Long.MAX_VALUE, null, null); + assertThat(allData).isEmpty(); + database.commit(); + } finally { + engine.close(); + } + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesSQLTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesSQLTest.java new file mode 100644 index 0000000000..8f647906a5 --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesSQLTest.java @@ -0,0 +1,120 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.TestHelper; +import com.arcadedb.query.sql.executor.Result; +import com.arcadedb.query.sql.executor.ResultSet; +import org.junit.jupiter.api.Test; + +import java.util.ArrayList; +import java.util.List; + +import static org.assertj.core.api.Assertions.assertThat; + +/** + * End-to-end SQL tests for TimeSeries INSERT and SELECT. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public class TimeSeriesSQLTest extends TestHelper { + + @Test + public void testInsertAndSelectAll() { + database.command("sql", + "CREATE TIMESERIES TYPE SensorReading TIMESTAMP ts TAGS (sensor_id STRING) FIELDS (temperature DOUBLE)"); + + database.transaction(() -> { + database.command("sql", + "INSERT INTO SensorReading SET ts = 1000, sensor_id = 'A', temperature = 22.5"); + database.command("sql", + "INSERT INTO SensorReading SET ts = 2000, sensor_id = 'B', temperature = 23.1"); + database.command("sql", + "INSERT INTO SensorReading SET ts = 3000, sensor_id = 'A', temperature = 21.8"); + }); + + final ResultSet rs = database.query("sql", "SELECT FROM SensorReading"); + final List results = new ArrayList<>(); + while (rs.hasNext()) + results.add(rs.next()); + + assertThat(results).hasSize(3); + } + + @Test + public void testSelectWithBetween() { + database.command("sql", + "CREATE TIMESERIES TYPE TempData TIMESTAMP ts FIELDS (value DOUBLE)"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO TempData SET ts = 1000, value = 10.0"); + database.command("sql", "INSERT INTO TempData SET ts = 2000, value = 20.0"); + database.command("sql", "INSERT INTO TempData SET ts = 3000, value = 30.0"); + database.command("sql", "INSERT INTO TempData SET ts = 4000, value = 40.0"); + database.command("sql", "INSERT INTO TempData SET ts = 5000, value = 50.0"); + }); + + final ResultSet rs = database.query("sql", "SELECT FROM TempData WHERE ts BETWEEN 2000 AND 4000"); + final List results = new ArrayList<>(); + while (rs.hasNext()) + results.add(rs.next()); + + assertThat(results).hasSize(3); // 2000, 3000, 4000 + } + + @Test + public void testInsertWithSET() { + database.command("sql", + "CREATE TIMESERIES TYPE DeviceMetrics TIMESTAMP ts TAGS (device STRING) FIELDS (cpu DOUBLE, mem LONG)"); + + database.transaction(() -> { + database.command("sql", + "INSERT INTO DeviceMetrics SET ts = 1000, device = 'server1', cpu = 75.5, mem = 8192"); + }); + + final ResultSet rs = database.query("sql", "SELECT FROM DeviceMetrics"); + assertThat(rs.hasNext()).isTrue(); + final Result row = rs.next(); + assertThat((String) row.getProperty("device")).isEqualTo("server1"); + assertThat(((Number) row.getProperty("cpu")).doubleValue()).isEqualTo(75.5); + } + + @Test + public void testTimeBucketFunction() { + database.command("sql", + "CREATE TIMESERIES TYPE HourlyMetrics TIMESTAMP ts FIELDS (value DOUBLE)"); + + database.transaction(() -> { + // Insert samples at different times within the same hour bucket (3600000ms = 1 hour) + database.command("sql", "INSERT INTO HourlyMetrics SET ts = 3600000, value = 10.0"); + database.command("sql", "INSERT INTO HourlyMetrics SET ts = 3601000, value = 20.0"); + database.command("sql", "INSERT INTO HourlyMetrics SET ts = 3602000, value = 30.0"); + }); + + // Query using time_bucket function - this is a standard SQL function call + final ResultSet rs = database.query("sql", + "SELECT ts.timeBucket('1h', ts) AS hour, avg(value) AS avg_val FROM HourlyMetrics GROUP BY hour"); + final List results = new ArrayList<>(); + while (rs.hasNext()) + results.add(rs.next()); + + assertThat(results).hasSize(1); // All in same hour bucket + assertThat(((Number) results.get(0).getProperty("avg_val")).doubleValue()).isEqualTo(20.0); + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesSealedStoreTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesSealedStoreTest.java new file mode 100644 index 0000000000..4c1df91e9d --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesSealedStoreTest.java @@ -0,0 +1,375 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.engine.timeseries.codec.DeltaOfDeltaCodec; +import com.arcadedb.engine.timeseries.codec.DictionaryCodec; +import com.arcadedb.engine.timeseries.codec.GorillaXORCodec; +import com.arcadedb.schema.Type; +import com.arcadedb.utility.FileUtils; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; + +import java.io.File; +import java.io.IOException; +import java.util.Iterator; +import java.util.List; + +import static org.assertj.core.api.Assertions.assertThat; +import static org.assertj.core.api.Assertions.assertThatThrownBy; + +/** + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +class TimeSeriesSealedStoreTest { + + private static final String TEST_PATH = "target/databases/TimeSeriesSealedStoreTest/sealed"; + private List columns; + + // Stats arrays: ts(NaN), sensor_id(NaN), temperature(has stats) + private static final double[] NO_MINS = { Double.NaN, Double.NaN, Double.NaN }; + private static final double[] NO_MAXS = { Double.NaN, Double.NaN, Double.NaN }; + private static final double[] NO_SUMS = { Double.NaN, Double.NaN, Double.NaN }; + + @BeforeEach + void setUp() { + FileUtils.deleteRecursively(new File("target/databases/TimeSeriesSealedStoreTest")); + new File("target/databases/TimeSeriesSealedStoreTest").mkdirs(); + + columns = List.of( + new ColumnDefinition("ts", Type.LONG, ColumnDefinition.ColumnRole.TIMESTAMP), + new ColumnDefinition("sensor_id", Type.STRING, ColumnDefinition.ColumnRole.TAG), + new ColumnDefinition("temperature", Type.DOUBLE, ColumnDefinition.ColumnRole.FIELD) + ); + } + + @AfterEach + void tearDown() { + FileUtils.deleteRecursively(new File("target/databases/TimeSeriesSealedStoreTest")); + } + + @Test + void testCreateEmptyStore() throws Exception { + try (final TimeSeriesSealedStore store = new TimeSeriesSealedStore(TEST_PATH, columns)) { + assertThat(store.getBlockCount()).isEqualTo(0); + } + } + + @Test + void testAppendAndReadBlock() throws Exception { + try (final TimeSeriesSealedStore store = new TimeSeriesSealedStore(TEST_PATH, columns)) { + final long[] timestamps = { 1000L, 2000L, 3000L, 4000L, 5000L }; + final String[] sensorIds = { "A", "B", "A", "C", "B" }; + final double[] temperatures = { 20.0, 21.5, 22.0, 19.5, 23.0 }; + + final byte[][] compressed = { + DeltaOfDeltaCodec.encode(timestamps), + DictionaryCodec.encode(sensorIds), + GorillaXORCodec.encode(temperatures) + }; + + store.appendBlock(5, 1000L, 5000L, compressed, + new double[] { Double.NaN, Double.NaN, 19.5 }, + new double[] { Double.NaN, Double.NaN, 23.0 }, + new double[] { Double.NaN, Double.NaN, 106.0 }, null); + + assertThat(store.getBlockCount()).isEqualTo(1); + assertThat(store.getGlobalMinTimestamp()).isEqualTo(1000L); + assertThat(store.getGlobalMaxTimestamp()).isEqualTo(5000L); + + // Read back + final List results = store.scanRange(1000L, 5000L, null, null); + assertThat(results).hasSize(5); + + assertThat((long) results.get(0)[0]).isEqualTo(1000L); + assertThat((String) results.get(0)[1]).isEqualTo("A"); + assertThat((double) results.get(0)[2]).isEqualTo(20.0); + + assertThat((long) results.get(4)[0]).isEqualTo(5000L); + assertThat((String) results.get(4)[1]).isEqualTo("B"); + assertThat((double) results.get(4)[2]).isEqualTo(23.0); + } + } + + @Test + void testRangeFilter() throws Exception { + try (final TimeSeriesSealedStore store = new TimeSeriesSealedStore(TEST_PATH, columns)) { + final long[] timestamps = { 1000L, 2000L, 3000L, 4000L, 5000L }; + final String[] sensorIds = { "A", "B", "A", "C", "B" }; + final double[] temperatures = { 20.0, 21.5, 22.0, 19.5, 23.0 }; + + final byte[][] compressed = { + DeltaOfDeltaCodec.encode(timestamps), + DictionaryCodec.encode(sensorIds), + GorillaXORCodec.encode(temperatures) + }; + store.appendBlock(5, 1000L, 5000L, compressed, NO_MINS, NO_MAXS, NO_SUMS, null); + + // Query subset + final List results = store.scanRange(2000L, 4000L, null, null); + assertThat(results).hasSize(3); + assertThat((long) results.get(0)[0]).isEqualTo(2000L); + assertThat((long) results.get(2)[0]).isEqualTo(4000L); + } + } + + @Test + void testMultipleBlocks() throws Exception { + try (final TimeSeriesSealedStore store = new TimeSeriesSealedStore(TEST_PATH, columns)) { + // Block 1: timestamps 1000-3000 + store.appendBlock(3, 1000L, 3000L, new byte[][] { + DeltaOfDeltaCodec.encode(new long[] { 1000L, 2000L, 3000L }), + DictionaryCodec.encode(new String[] { "A", "A", "A" }), + GorillaXORCodec.encode(new double[] { 10.0, 11.0, 12.0 }) + }, NO_MINS, NO_MAXS, NO_SUMS, null); + + // Block 2: timestamps 4000-6000 + store.appendBlock(3, 4000L, 6000L, new byte[][] { + DeltaOfDeltaCodec.encode(new long[] { 4000L, 5000L, 6000L }), + DictionaryCodec.encode(new String[] { "B", "B", "B" }), + GorillaXORCodec.encode(new double[] { 20.0, 21.0, 22.0 }) + }, NO_MINS, NO_MAXS, NO_SUMS, null); + + assertThat(store.getBlockCount()).isEqualTo(2); + assertThat(store.getGlobalMinTimestamp()).isEqualTo(1000L); + assertThat(store.getGlobalMaxTimestamp()).isEqualTo(6000L); + + // Query spanning both blocks + final List results = store.scanRange(2000L, 5000L, null, null); + assertThat(results).hasSize(4); + } + } + + @Test + void testBlockSkipping() throws Exception { + try (final TimeSeriesSealedStore store = new TimeSeriesSealedStore(TEST_PATH, columns)) { + store.appendBlock(2, 1000L, 2000L, new byte[][] { + DeltaOfDeltaCodec.encode(new long[] { 1000L, 2000L }), + DictionaryCodec.encode(new String[] { "A", "A" }), + GorillaXORCodec.encode(new double[] { 10.0, 11.0 }) + }, NO_MINS, NO_MAXS, NO_SUMS, null); + + store.appendBlock(2, 5000L, 6000L, new byte[][] { + DeltaOfDeltaCodec.encode(new long[] { 5000L, 6000L }), + DictionaryCodec.encode(new String[] { "B", "B" }), + GorillaXORCodec.encode(new double[] { 20.0, 21.0 }) + }, NO_MINS, NO_MAXS, NO_SUMS, null); + + // Query only block 2 + final List results = store.scanRange(5000L, 6000L, null, null); + assertThat(results).hasSize(2); + assertThat((String) results.get(0)[1]).isEqualTo("B"); + } + } + + /** + * Regression test: scanRange must apply per-row tag filtering for SLOW_PATH blocks + * (blocks containing multiple distinct tag values where only some match the filter). + */ + @Test + void testTagFilterSlowPathScanRange() throws Exception { + try (final TimeSeriesSealedStore store = new TimeSeriesSealedStore(TEST_PATH, columns)) { + // Single block with mixed tag values — triggers SLOW_PATH in blockMatchesTagFilter + final long[] timestamps = { 1000L, 2000L, 3000L, 4000L, 5000L }; + final String[] sensorIds = { "A", "B", "A", "C", "B" }; + final double[] temperatures = { 20.0, 21.5, 22.0, 19.5, 23.0 }; + + final String[][] tagDV = new String[3][]; + tagDV[1] = new String[] { "A", "B", "C" }; // mixed → SLOW_PATH + + final byte[][] compressed = { + DeltaOfDeltaCodec.encode(timestamps), + DictionaryCodec.encode(sensorIds), + GorillaXORCodec.encode(temperatures) + }; + store.appendBlock(5, 1000L, 5000L, compressed, + new double[] { Double.NaN, Double.NaN, 19.5 }, + new double[] { Double.NaN, Double.NaN, 23.0 }, + new double[] { Double.NaN, Double.NaN, 106.0 }, tagDV); + + // Filter for sensor_id == "A" only — should return rows at t=1000 and t=3000 + final TagFilter filterA = TagFilter.eq(0, "A"); + final List results = store.scanRange(1000L, 5000L, null, filterA); + assertThat(results).hasSize(2); + assertThat((long) results.get(0)[0]).isEqualTo(1000L); + assertThat((String) results.get(0)[1]).isEqualTo("A"); + assertThat((long) results.get(1)[0]).isEqualTo(3000L); + assertThat((String) results.get(1)[1]).isEqualTo("A"); + } + } + + /** + * Regression test: iterateRange must apply per-row tag filtering for SLOW_PATH blocks. + */ + @Test + void testTagFilterSlowPathIterateRange() throws Exception { + try (final TimeSeriesSealedStore store = new TimeSeriesSealedStore(TEST_PATH, columns)) { + final long[] timestamps = { 1000L, 2000L, 3000L }; + final String[] sensorIds = { "X", "Y", "X" }; + final double[] temperatures = { 10.0, 20.0, 30.0 }; + + final String[][] tagDV = new String[3][]; + tagDV[1] = new String[] { "X", "Y" }; // mixed → SLOW_PATH + + store.appendBlock(3, 1000L, 3000L, new byte[][] { + DeltaOfDeltaCodec.encode(timestamps), + DictionaryCodec.encode(sensorIds), + GorillaXORCodec.encode(temperatures) + }, new double[] { Double.NaN, Double.NaN, 10.0 }, + new double[] { Double.NaN, Double.NaN, 30.0 }, + new double[] { Double.NaN, Double.NaN, 60.0 }, tagDV); + + final TagFilter filterX = TagFilter.eq(0, "X"); + final java.util.Iterator iter = store.iterateRange(1000L, 3000L, null, filterX); + final List results = new java.util.ArrayList<>(); + while (iter.hasNext()) + results.add(iter.next()); + + assertThat(results).hasSize(2); + assertThat((String) results.get(0)[1]).isEqualTo("X"); + assertThat((String) results.get(1)[1]).isEqualTo("X"); + } + } + + /** + * Regression: buildTagMetadata must throw when a tag value's UTF-8 encoding exceeds 32767 bytes. + * Previously it silently truncated the value via (short) val.length, causing data corruption. + */ + @Test + void testTagValueTooLongRejected() throws Exception { + try (final TimeSeriesSealedStore store = new TimeSeriesSealedStore(TEST_PATH, columns)) { + // 'ß' encodes to 2 UTF-8 bytes, so 16384 repetitions = 32768 bytes > 32767 limit + final String longValue = "ß".repeat(16384); + final String[][] tagDV = new String[3][]; + tagDV[1] = new String[] { longValue }; + + assertThatThrownBy(() -> store.appendBlock(1, 1000L, 1000L, new byte[][] { + DeltaOfDeltaCodec.encode(new long[] { 1000L }), + DictionaryCodec.encode(new String[] { longValue }), + GorillaXORCodec.encode(new double[] { 1.0 }) + }, NO_MINS, NO_MAXS, NO_SUMS, tagDV)) + .isInstanceOf(IllegalArgumentException.class) + .hasMessageContaining("too long"); + } + } + + /** + * Regression: loadDirectory must throw when the file's column count does not match the schema. + * Previously the mismatch was silently ignored, potentially causing incorrect reads. + */ + @Test + void testColumnCountMismatchOnReopen() throws Exception { + // Write a store with 3 columns + try (final TimeSeriesSealedStore store = new TimeSeriesSealedStore(TEST_PATH, columns)) { + store.appendBlock(1, 1000L, 1000L, new byte[][] { + DeltaOfDeltaCodec.encode(new long[] { 1000L }), + DictionaryCodec.encode(new String[] { "A" }), + GorillaXORCodec.encode(new double[] { 1.0 }) + }, NO_MINS, NO_MAXS, NO_SUMS, null); + } + + // Try to reopen with a different schema (2 columns instead of 3) + final List wrongSchema = List.of( + new ColumnDefinition("ts", Type.LONG, ColumnDefinition.ColumnRole.TIMESTAMP), + new ColumnDefinition("temperature", Type.DOUBLE, ColumnDefinition.ColumnRole.FIELD) + ); + assertThatThrownBy(() -> new TimeSeriesSealedStore(TEST_PATH, wrongSchema)) + .isInstanceOf(IOException.class) + .hasMessageContaining("Column count mismatch"); + } + + /** + * Regression: rewriteWithBlocks must write blocks in ascending minTimestamp order on disk. + * Previously retained (newer) blocks were written first, then downsampled (older) blocks second. + * After a restart, loadDirectory reads in file order — if the file is not sorted, + * binary search in iterateRange fails to find blocks. + */ + @Test + void testDownsamplePreservesAscendingOrderOnDisk() throws Exception { + final List numericCols = List.of( + new ColumnDefinition("ts", Type.LONG, ColumnDefinition.ColumnRole.TIMESTAMP), + new ColumnDefinition("value", Type.DOUBLE, ColumnDefinition.ColumnRole.FIELD) + ); + final double[] noMins2 = { Double.NaN, Double.NaN }; + final double[] noMaxs2 = { Double.NaN, Double.NaN }; + final double[] noSums2 = { Double.NaN, Double.NaN }; + + try (final TimeSeriesSealedStore store = new TimeSeriesSealedStore(TEST_PATH, numericCols)) { + // Older block: t=6000..8000 — will be downsampled; bucket = (6000/5000)*5000 = 5000 + store.appendBlock(3, 6_000L, 8_000L, new byte[][] { + DeltaOfDeltaCodec.encode(new long[] { 6_000L, 7_000L, 8_000L }), + GorillaXORCodec.encode(new double[] { 1.0, 2.0, 3.0 }) + }, new double[] { Double.NaN, 1.0 }, new double[] { Double.NaN, 3.0 }, new double[] { Double.NaN, 6.0 }, null); + + // Newer block: t=100_000..102_000 — retained (beyond cutoff) + store.appendBlock(3, 100_000L, 102_000L, new byte[][] { + DeltaOfDeltaCodec.encode(new long[] { 100_000L, 101_000L, 102_000L }), + GorillaXORCodec.encode(new double[] { 10.0, 20.0, 30.0 }) + }, new double[] { Double.NaN, 10.0 }, new double[] { Double.NaN, 30.0 }, new double[] { Double.NaN, 60.0 }, null); + + // Downsample blocks older than t=10_000 to 5_000ms granularity + store.downsampleBlocks(10_000L, 5_000L, 0, + List.of(), // no tag columns + List.of(1)); // numeric column at index 1 + } + + // Reopen to force loadDirectory — verifies on-disk ordering is correct + try (final TimeSeriesSealedStore store = new TimeSeriesSealedStore(TEST_PATH, numericCols)) { + // iterateRange uses binary search — requires ascending on-disk block order. + // Downsampled block lands at bucket t=5000; query 5000..9000 to find it. + final Iterator iter = store.iterateRange(5_000L, 9_000L, null, null); + final List old = new java.util.ArrayList<>(); + while (iter.hasNext()) + old.add(iter.next()); + + // Key assertion: iterateRange must find rows even after reopen (binary search correctness) + assertThat(old).isNotEmpty(); + + // Newer block should also be retrievable + final Iterator iterNew = store.iterateRange(100_000L, 102_000L, null, null); + assertThat(iterNew.hasNext()).isTrue(); + } + } + + @Test + void testTruncateBefore() throws Exception { + try (final TimeSeriesSealedStore store = new TimeSeriesSealedStore(TEST_PATH, columns)) { + store.appendBlock(2, 1000L, 2000L, new byte[][] { + DeltaOfDeltaCodec.encode(new long[] { 1000L, 2000L }), + DictionaryCodec.encode(new String[] { "A", "A" }), + GorillaXORCodec.encode(new double[] { 10.0, 11.0 }) + }, NO_MINS, NO_MAXS, NO_SUMS, null); + + store.appendBlock(2, 5000L, 6000L, new byte[][] { + DeltaOfDeltaCodec.encode(new long[] { 5000L, 6000L }), + DictionaryCodec.encode(new String[] { "B", "B" }), + GorillaXORCodec.encode(new double[] { 20.0, 21.0 }) + }, NO_MINS, NO_MAXS, NO_SUMS, null); + + // Truncate old data + store.truncateBefore(3000L); + assertThat(store.getBlockCount()).isEqualTo(1); + + final List results = store.scanRange(0L, 10000L, null, null); + assertThat(results).hasSize(2); + assertThat((long) results.get(0)[0]).isEqualTo(5000L); + } + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesShardTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesShardTest.java new file mode 100644 index 0000000000..a01836e375 --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesShardTest.java @@ -0,0 +1,126 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.TestHelper; +import com.arcadedb.database.DatabaseInternal; +import com.arcadedb.schema.Type; +import org.junit.jupiter.api.Test; + +import java.util.List; + +import static org.assertj.core.api.Assertions.assertThat; + +/** + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +class TimeSeriesShardTest extends TestHelper { + + private List createTestColumns() { + return List.of( + new ColumnDefinition("ts", Type.LONG, ColumnDefinition.ColumnRole.TIMESTAMP), + new ColumnDefinition("sensor_id", Type.STRING, ColumnDefinition.ColumnRole.TAG), + new ColumnDefinition("temperature", Type.DOUBLE, ColumnDefinition.ColumnRole.FIELD) + ); + } + + @Test + void testAppendAndScan() throws Exception { + database.begin(); + final TimeSeriesShard shard = new TimeSeriesShard( + (DatabaseInternal) database, "test_shard", 0, createTestColumns()); + + shard.appendSamples( + new long[] { 1000L, 2000L, 3000L }, + new Object[] { "A", "B", "A" }, + new Object[] { 20.0, 21.5, 22.0 } + ); + database.commit(); + + database.begin(); + final List results = shard.scanRange(1000L, 3000L, null, null); + assertThat(results).hasSize(3); + assertThat((long) results.get(0)[0]).isEqualTo(1000L); + assertThat((double) results.get(0)[2]).isEqualTo(20.0); + database.commit(); + + shard.close(); + } + + @Test + void testCompaction() throws Exception { + database.begin(); + final TimeSeriesShard shard = new TimeSeriesShard( + (DatabaseInternal) database, "test_compact_shard", 0, createTestColumns()); + + // Insert out-of-order data + shard.appendSamples( + new long[] { 3000L, 1000L, 2000L }, + new Object[] { "C", "A", "B" }, + new Object[] { 30.0, 10.0, 20.0 } + ); + database.commit(); + + // Compact + shard.compact(); + + // Verify sealed data is readable and sorted + database.begin(); + assertThat(shard.getSealedStore().getBlockCount()).isEqualTo(1); + + final List results = shard.scanRange(1000L, 3000L, null, null); + assertThat(results).hasSize(3); + // Sealed results should be sorted + assertThat((long) results.get(0)[0]).isEqualTo(1000L); + assertThat((long) results.get(1)[0]).isEqualTo(2000L); + assertThat((long) results.get(2)[0]).isEqualTo(3000L); + database.commit(); + + shard.close(); + } + + @Test + void testTagFilter() throws Exception { + database.begin(); + final TimeSeriesShard shard = new TimeSeriesShard( + (DatabaseInternal) database, "test_filter_shard", 0, createTestColumns()); + + shard.appendSamples( + new long[] { 1000L, 2000L, 3000L, 4000L }, + new Object[] { "A", "B", "A", "B" }, + new Object[] { 20.0, 21.0, 22.0, 23.0 } + ); + database.commit(); + + database.begin(); + final TagFilter filter = TagFilter.eq(0, "A"); + final List results = shard.scanRange(1000L, 4000L, null, filter); + assertThat(results).hasSize(2); + assertThat((String) results.get(0)[1]).isEqualTo("A"); + assertThat((String) results.get(1)[1]).isEqualTo("A"); + database.commit(); + + shard.close(); + } + + @Override + protected boolean isCheckingDatabaseIntegrity() { + return false; + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesTypeTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesTypeTest.java new file mode 100644 index 0000000000..eb226c32b8 --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/TimeSeriesTypeTest.java @@ -0,0 +1,166 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.GlobalConfiguration; +import com.arcadedb.TestHelper; +import com.arcadedb.database.DatabaseFactory; +import com.arcadedb.schema.DocumentType; +import com.arcadedb.schema.LocalTimeSeriesType; +import com.arcadedb.schema.Type; +import org.junit.jupiter.api.Test; + +import static org.assertj.core.api.Assertions.assertThat; + +/** + * Tests for TimeSeries schema type integration. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public class TimeSeriesTypeTest extends TestHelper { + + @Test + public void testCreateTimeSeriesType() { + final LocalTimeSeriesType type = database.getSchema().buildTimeSeriesType() + .withName("SensorData") + .withTimestamp("ts") + .withTag("sensor_id", Type.STRING) + .withField("temperature", Type.DOUBLE) + .withShards(2) + .withRetention(86400000L) + .create(); + + assertThat(type).isNotNull(); + assertThat(type.getName()).isEqualTo("SensorData"); + assertThat(type.getTimestampColumn()).isEqualTo("ts"); + assertThat(type.getShardCount()).isEqualTo(2); + assertThat(type.getRetentionMs()).isEqualTo(86400000L); + assertThat(type.getTsColumns()).hasSize(3); + assertThat(type.getEngine()).isNotNull(); + + // Verify properties registered + assertThat(type.existsProperty("ts")).isTrue(); + assertThat(type.existsProperty("sensor_id")).isTrue(); + assertThat(type.existsProperty("temperature")).isTrue(); + + // Verify type is in schema + assertThat(database.getSchema().existsType("SensorData")).isTrue(); + final DocumentType fromSchema = database.getSchema().getType("SensorData"); + assertThat(fromSchema).isInstanceOf(LocalTimeSeriesType.class); + } + + @Test + public void testTimeSeriesTypeJSON() { + final LocalTimeSeriesType type = database.getSchema().buildTimeSeriesType() + .withName("Metrics") + .withTimestamp("ts") + .withTag("host", Type.STRING) + .withField("cpu", Type.DOUBLE) + .withField("mem", Type.LONG) + .withShards(1) + .create(); + + final var json = type.toJSON(); + assertThat(json.getString("type")).isEqualTo("t"); + assertThat(json.getString("timestampColumn")).isEqualTo("ts"); + assertThat(json.getInt("shardCount")).isEqualTo(1); + assertThat(json.getJSONArray("tsColumns").length()).isEqualTo(4); + } + + @Test + public void testTimeSeriesTypePersistence() { + database.getSchema().buildTimeSeriesType() + .withName("PersistentTS") + .withTimestamp("ts") + .withTag("device", Type.STRING) + .withField("value", Type.DOUBLE) + .withShards(2) + .withRetention(3600000L) + .create(); + + // Close and reopen + final String dbPath = database.getDatabasePath(); + database.close(); + + database = new DatabaseFactory(dbPath).open(); + + assertThat(database.getSchema().existsType("PersistentTS")).isTrue(); + final DocumentType reloaded = database.getSchema().getType("PersistentTS"); + assertThat(reloaded).isInstanceOf(LocalTimeSeriesType.class); + + final LocalTimeSeriesType tsType = (LocalTimeSeriesType) reloaded; + assertThat(tsType.getTimestampColumn()).isEqualTo("ts"); + assertThat(tsType.getShardCount()).isEqualTo(2); + assertThat(tsType.getRetentionMs()).isEqualTo(3600000L); + assertThat(tsType.getTsColumns()).hasSize(3); + + // Verify column roles restored correctly + final ColumnDefinition tsCol = tsType.getTsColumns().get(0); + assertThat(tsCol.getName()).isEqualTo("ts"); + assertThat(tsCol.getRole()).isEqualTo(ColumnDefinition.ColumnRole.TIMESTAMP); + + final ColumnDefinition tagCol = tsType.getTsColumns().get(1); + assertThat(tagCol.getName()).isEqualTo("device"); + assertThat(tagCol.getRole()).isEqualTo(ColumnDefinition.ColumnRole.TAG); + + final ColumnDefinition fieldCol = tsType.getTsColumns().get(2); + assertThat(fieldCol.getName()).isEqualTo("value"); + assertThat(fieldCol.getRole()).isEqualTo(ColumnDefinition.ColumnRole.FIELD); + } + + @Test + public void testColumnDefinitions() { + final LocalTimeSeriesType type = database.getSchema().buildTimeSeriesType() + .withName("AllTypes") + .withTimestamp("ts") + .withTag("region", Type.STRING) + .withTag("zone", Type.INTEGER) + .withField("temp", Type.DOUBLE) + .withField("count", Type.LONG) + .create(); + + assertThat(type.getTsColumns()).hasSize(5); + + // Verify timestamp column + assertThat(type.getTsColumns().get(0).getRole()).isEqualTo(ColumnDefinition.ColumnRole.TIMESTAMP); + assertThat(type.getTsColumns().get(0).getDataType()).isEqualTo(Type.LONG); + + // Verify tags + assertThat(type.getTsColumns().get(1).getRole()).isEqualTo(ColumnDefinition.ColumnRole.TAG); + assertThat(type.getTsColumns().get(2).getRole()).isEqualTo(ColumnDefinition.ColumnRole.TAG); + + // Verify fields + assertThat(type.getTsColumns().get(3).getRole()).isEqualTo(ColumnDefinition.ColumnRole.FIELD); + assertThat(type.getTsColumns().get(4).getRole()).isEqualTo(ColumnDefinition.ColumnRole.FIELD); + } + + @Test + public void testDefaultShardCountMatchesAsyncWorkerThreads() { + final int expectedShards = database.getConfiguration().getValueAsInteger(GlobalConfiguration.ASYNC_WORKER_THREADS); + + final LocalTimeSeriesType type = database.getSchema().buildTimeSeriesType() + .withName("DefaultShards") + .withTimestamp("ts") + .withField("value", Type.DOUBLE) + .create(); + + assertThat(type.getShardCount()).isEqualTo(expectedShards); + assertThat(type.getEngine().getShardCount()).isEqualTo(expectedShards); + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/WindowFunctionTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/WindowFunctionTest.java new file mode 100644 index 0000000000..81862172c8 --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/WindowFunctionTest.java @@ -0,0 +1,312 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries; + +import com.arcadedb.TestHelper; +import com.arcadedb.query.sql.executor.Result; +import com.arcadedb.query.sql.executor.ResultSet; +import org.junit.jupiter.api.Test; + +import java.util.List; + +import static org.assertj.core.api.Assertions.assertThat; + +public class WindowFunctionTest extends TestHelper { + + @Test + public void testLagBasic() { + database.command("sql", "CREATE TIMESERIES TYPE LagSensor TIMESTAMP ts FIELDS (temperature DOUBLE)"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO LagSensor SET ts = 1000, temperature = 10.0"); + database.command("sql", "INSERT INTO LagSensor SET ts = 2000, temperature = 20.0"); + database.command("sql", "INSERT INTO LagSensor SET ts = 3000, temperature = 30.0"); + database.command("sql", "INSERT INTO LagSensor SET ts = 4000, temperature = 40.0"); + database.command("sql", "INSERT INTO LagSensor SET ts = 5000, temperature = 50.0"); + }); + + final ResultSet rs = database.query("sql", "SELECT ts.lag(temperature, 1, ts) AS prev FROM LagSensor"); + assertThat(rs.hasNext()).isTrue(); + @SuppressWarnings("unchecked") + final List prev = (List) rs.next().getProperty("prev"); + + assertThat(prev).hasSize(5); + assertThat(prev.get(0)).isNull(); + assertThat(prev.get(1)).isEqualTo(10.0); + assertThat(prev.get(2)).isEqualTo(20.0); + assertThat(prev.get(3)).isEqualTo(30.0); + assertThat(prev.get(4)).isEqualTo(40.0); + } + + @Test + public void testLagWithOffset2() { + database.command("sql", "CREATE TIMESERIES TYPE LagOff2 TIMESTAMP ts FIELDS (temperature DOUBLE)"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO LagOff2 SET ts = 1000, temperature = 10.0"); + database.command("sql", "INSERT INTO LagOff2 SET ts = 2000, temperature = 20.0"); + database.command("sql", "INSERT INTO LagOff2 SET ts = 3000, temperature = 30.0"); + database.command("sql", "INSERT INTO LagOff2 SET ts = 4000, temperature = 40.0"); + database.command("sql", "INSERT INTO LagOff2 SET ts = 5000, temperature = 50.0"); + }); + + final ResultSet rs = database.query("sql", "SELECT ts.lag(temperature, 2, ts) AS prev FROM LagOff2"); + assertThat(rs.hasNext()).isTrue(); + @SuppressWarnings("unchecked") + final List prev = (List) rs.next().getProperty("prev"); + + assertThat(prev).hasSize(5); + assertThat(prev.get(0)).isNull(); + assertThat(prev.get(1)).isNull(); + assertThat(prev.get(2)).isEqualTo(10.0); + assertThat(prev.get(3)).isEqualTo(20.0); + assertThat(prev.get(4)).isEqualTo(30.0); + } + + @Test + public void testLagWithDefault() { + database.command("sql", "CREATE TIMESERIES TYPE LagDef TIMESTAMP ts FIELDS (temperature DOUBLE)"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO LagDef SET ts = 1000, temperature = 10.0"); + database.command("sql", "INSERT INTO LagDef SET ts = 2000, temperature = 20.0"); + database.command("sql", "INSERT INTO LagDef SET ts = 3000, temperature = 30.0"); + database.command("sql", "INSERT INTO LagDef SET ts = 4000, temperature = 40.0"); + database.command("sql", "INSERT INTO LagDef SET ts = 5000, temperature = 50.0"); + }); + + final ResultSet rs = database.query("sql", "SELECT ts.lag(temperature, 1, ts, -1) AS prev FROM LagDef"); + assertThat(rs.hasNext()).isTrue(); + @SuppressWarnings("unchecked") + final List prev = (List) rs.next().getProperty("prev"); + + assertThat(prev).hasSize(5); + assertThat(prev.get(0)).isEqualTo(-1); + assertThat(prev.get(1)).isEqualTo(10.0); + assertThat(prev.get(2)).isEqualTo(20.0); + assertThat(prev.get(3)).isEqualTo(30.0); + assertThat(prev.get(4)).isEqualTo(40.0); + } + + @Test + public void testLeadBasic() { + database.command("sql", "CREATE TIMESERIES TYPE LeadSensor TIMESTAMP ts FIELDS (temperature DOUBLE)"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO LeadSensor SET ts = 1000, temperature = 10.0"); + database.command("sql", "INSERT INTO LeadSensor SET ts = 2000, temperature = 20.0"); + database.command("sql", "INSERT INTO LeadSensor SET ts = 3000, temperature = 30.0"); + database.command("sql", "INSERT INTO LeadSensor SET ts = 4000, temperature = 40.0"); + database.command("sql", "INSERT INTO LeadSensor SET ts = 5000, temperature = 50.0"); + }); + + final ResultSet rs = database.query("sql", "SELECT ts.lead(temperature, 1, ts) AS next FROM LeadSensor"); + assertThat(rs.hasNext()).isTrue(); + @SuppressWarnings("unchecked") + final List next = (List) rs.next().getProperty("next"); + + assertThat(next).hasSize(5); + assertThat(next.get(0)).isEqualTo(20.0); + assertThat(next.get(1)).isEqualTo(30.0); + assertThat(next.get(2)).isEqualTo(40.0); + assertThat(next.get(3)).isEqualTo(50.0); + assertThat(next.get(4)).isNull(); + } + + @Test + public void testLeadWithDefault() { + database.command("sql", "CREATE TIMESERIES TYPE LeadDef TIMESTAMP ts FIELDS (temperature DOUBLE)"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO LeadDef SET ts = 1000, temperature = 10.0"); + database.command("sql", "INSERT INTO LeadDef SET ts = 2000, temperature = 20.0"); + database.command("sql", "INSERT INTO LeadDef SET ts = 3000, temperature = 30.0"); + database.command("sql", "INSERT INTO LeadDef SET ts = 4000, temperature = 40.0"); + database.command("sql", "INSERT INTO LeadDef SET ts = 5000, temperature = 50.0"); + }); + + final ResultSet rs = database.query("sql", "SELECT ts.lead(temperature, 1, ts, -1) AS next FROM LeadDef"); + assertThat(rs.hasNext()).isTrue(); + @SuppressWarnings("unchecked") + final List next = (List) rs.next().getProperty("next"); + + assertThat(next).hasSize(5); + assertThat(next.get(0)).isEqualTo(20.0); + assertThat(next.get(1)).isEqualTo(30.0); + assertThat(next.get(2)).isEqualTo(40.0); + assertThat(next.get(3)).isEqualTo(50.0); + assertThat(next.get(4)).isEqualTo(-1); + } + + @Test + public void testRowNumber() { + database.command("sql", "CREATE TIMESERIES TYPE RnSensor TIMESTAMP ts FIELDS (temperature DOUBLE)"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO RnSensor SET ts = 1000, temperature = 10.0"); + database.command("sql", "INSERT INTO RnSensor SET ts = 2000, temperature = 20.0"); + database.command("sql", "INSERT INTO RnSensor SET ts = 3000, temperature = 30.0"); + database.command("sql", "INSERT INTO RnSensor SET ts = 4000, temperature = 40.0"); + database.command("sql", "INSERT INTO RnSensor SET ts = 5000, temperature = 50.0"); + }); + + final ResultSet rs = database.query("sql", "SELECT ts.rowNumber(ts) AS rn FROM RnSensor"); + assertThat(rs.hasNext()).isTrue(); + @SuppressWarnings("unchecked") + final List rn = (List) rs.next().getProperty("rn"); + + assertThat(rn).hasSize(5); + assertThat(rn).containsExactly(1, 2, 3, 4, 5); + } + + @Test + public void testRankWithTies() { + database.command("sql", "CREATE TIMESERIES TYPE RankTies TIMESTAMP ts FIELDS (temperature DOUBLE)"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO RankTies SET ts = 1000, temperature = 10.0"); + database.command("sql", "INSERT INTO RankTies SET ts = 2000, temperature = 20.0"); + database.command("sql", "INSERT INTO RankTies SET ts = 3000, temperature = 20.0"); + database.command("sql", "INSERT INTO RankTies SET ts = 4000, temperature = 30.0"); + }); + + final ResultSet rs = database.query("sql", "SELECT ts.rank(temperature, ts) AS rnk FROM RankTies"); + assertThat(rs.hasNext()).isTrue(); + @SuppressWarnings("unchecked") + final List rnk = (List) rs.next().getProperty("rnk"); + + assertThat(rnk).hasSize(4); + assertThat(rnk).containsExactly(1, 2, 2, 4); + } + + @Test + public void testRankNoDuplicates() { + database.command("sql", "CREATE TIMESERIES TYPE RankUniq TIMESTAMP ts FIELDS (temperature DOUBLE)"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO RankUniq SET ts = 1000, temperature = 10.0"); + database.command("sql", "INSERT INTO RankUniq SET ts = 2000, temperature = 20.0"); + database.command("sql", "INSERT INTO RankUniq SET ts = 3000, temperature = 30.0"); + database.command("sql", "INSERT INTO RankUniq SET ts = 4000, temperature = 40.0"); + database.command("sql", "INSERT INTO RankUniq SET ts = 5000, temperature = 50.0"); + }); + + final ResultSet rs = database.query("sql", "SELECT ts.rank(temperature, ts) AS rnk FROM RankUniq"); + assertThat(rs.hasNext()).isTrue(); + @SuppressWarnings("unchecked") + final List rnk = (List) rs.next().getProperty("rnk"); + + assertThat(rnk).hasSize(5); + assertThat(rnk).containsExactly(1, 2, 3, 4, 5); + } + + @Test + public void testLagWithGroupBy() { + database.command("sql", "CREATE TIMESERIES TYPE LagBucket TIMESTAMP ts FIELDS (temperature DOUBLE)"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO LagBucket SET ts = 1000, temperature = 10.0"); + database.command("sql", "INSERT INTO LagBucket SET ts = 2000, temperature = 20.0"); + database.command("sql", "INSERT INTO LagBucket SET ts = 3000, temperature = 30.0"); + database.command("sql", "INSERT INTO LagBucket SET ts = 60000, temperature = 40.0"); + database.command("sql", "INSERT INTO LagBucket SET ts = 61000, temperature = 50.0"); + database.command("sql", "INSERT INTO LagBucket SET ts = 62000, temperature = 60.0"); + }); + + final ResultSet rs = database.query("sql", + "SELECT ts.timeBucket('60s', ts) AS tb, ts.lag(temperature, 1, ts) AS prev " + + "FROM LagBucket GROUP BY tb ORDER BY tb"); + + int count = 0; + while (rs.hasNext()) { + final Result row = rs.next(); + @SuppressWarnings("unchecked") + final List prev = (List) row.getProperty("prev"); + assertThat(prev).isNotNull(); + assertThat(prev).hasSize(3); + // First element in each bucket should be null (no previous) + assertThat(prev.get(0)).isNull(); + count++; + } + assertThat(count).isEqualTo(2); + } + + @Test + public void testWindowFunctionsOnEmptyType() { + database.command("sql", "CREATE TIMESERIES TYPE EmptyWin TIMESTAMP ts FIELDS (temperature DOUBLE)"); + + final ResultSet lagRs = database.query("sql", "SELECT ts.lag(temperature, 1, ts) AS prev FROM EmptyWin"); + if (lagRs.hasNext()) { + final Object prev = lagRs.next().getProperty("prev"); + if (prev instanceof List) + assertThat((List) prev).isEmpty(); + } + + final ResultSet leadRs = database.query("sql", "SELECT ts.lead(temperature, 1, ts) AS next FROM EmptyWin"); + if (leadRs.hasNext()) { + final Object next = leadRs.next().getProperty("next"); + if (next instanceof List) + assertThat((List) next).isEmpty(); + } + + final ResultSet rnRs = database.query("sql", "SELECT ts.rowNumber(ts) AS rn FROM EmptyWin"); + if (rnRs.hasNext()) { + final Object rn = rnRs.next().getProperty("rn"); + if (rn instanceof List) + assertThat((List) rn).isEmpty(); + } + + final ResultSet rankRs = database.query("sql", "SELECT ts.rank(temperature, ts) AS rnk FROM EmptyWin"); + if (rankRs.hasNext()) { + final Object rnk = rankRs.next().getProperty("rnk"); + if (rnk instanceof List) + assertThat((List) rnk).isEmpty(); + } + } + + @Test + public void testLagLeadWithTimeSeries() { + database.command("sql", "CREATE TIMESERIES TYPE TsWinSensor TIMESTAMP ts FIELDS (value DOUBLE)"); + + database.transaction(() -> { + for (int i = 1; i <= 10; i++) + database.command("sql", "INSERT INTO TsWinSensor SET ts = " + (i * 1000) + ", value = " + (i * 10.0)); + }); + + // Verify lag + final ResultSet lagRs = database.query("sql", "SELECT ts.lag(value, 1, ts) AS prev FROM TsWinSensor"); + assertThat(lagRs.hasNext()).isTrue(); + @SuppressWarnings("unchecked") + final List prev = (List) lagRs.next().getProperty("prev"); + assertThat(prev).hasSize(10); + assertThat(prev.get(0)).isNull(); + assertThat(prev.get(1)).isEqualTo(10.0); + assertThat(prev.get(9)).isEqualTo(90.0); + + // Verify lead + final ResultSet leadRs = database.query("sql", "SELECT ts.lead(value, 1, ts) AS next FROM TsWinSensor"); + assertThat(leadRs.hasNext()).isTrue(); + @SuppressWarnings("unchecked") + final List next = (List) leadRs.next().getProperty("next"); + assertThat(next).hasSize(10); + assertThat(next.get(0)).isEqualTo(20.0); + assertThat(next.get(8)).isEqualTo(100.0); + assertThat(next.get(9)).isNull(); + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/codec/DeltaOfDeltaCodecTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/codec/DeltaOfDeltaCodecTest.java new file mode 100644 index 0000000000..d18c5d0690 --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/codec/DeltaOfDeltaCodecTest.java @@ -0,0 +1,133 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries.codec; + +import org.junit.jupiter.api.Test; + +import java.util.Random; + +import static org.assertj.core.api.Assertions.assertThat; + +/** + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +class DeltaOfDeltaCodecTest { + + @Test + void testEmpty() { + assertThat(DeltaOfDeltaCodec.decode(DeltaOfDeltaCodec.encode(new long[0]))).isEmpty(); + assertThat(DeltaOfDeltaCodec.decode(DeltaOfDeltaCodec.encode(null))).isEmpty(); + } + + @Test + void testSingleValue() { + final long[] input = { 1000000000L }; + final byte[] encoded = DeltaOfDeltaCodec.encode(input); + assertThat(DeltaOfDeltaCodec.decode(encoded)).containsExactly(input); + } + + @Test + void testRegularIntervals() { + // Regular 10-second intervals — all delta-of-deltas are 0 + final long[] input = new long[1000]; + for (int i = 0; i < input.length; i++) + input[i] = 1_000_000_000L + i * 10_000_000_000L; + + final byte[] encoded = DeltaOfDeltaCodec.encode(input); + assertThat(DeltaOfDeltaCodec.decode(encoded)).containsExactly(input); + + // Should compress well: regular intervals encode to ~1 bit per sample after first two + assertThat(encoded.length).isLessThan(input.length * 8 / 4); + } + + @Test + void testMonotonicIncreasing() { + final long[] input = { 100, 200, 300, 400, 500, 600 }; + final byte[] encoded = DeltaOfDeltaCodec.encode(input); + assertThat(DeltaOfDeltaCodec.decode(encoded)).containsExactly(input); + } + + @Test + void testNonMonotonic() { + final long[] input = { 100, 300, 250, 400, 350, 500 }; + final byte[] encoded = DeltaOfDeltaCodec.encode(input); + assertThat(DeltaOfDeltaCodec.decode(encoded)).containsExactly(input); + } + + @Test + void testRandomTimestamps() { + final Random rng = new Random(42); + final long[] input = new long[500]; + input[0] = Math.abs(rng.nextLong() % 1_000_000_000_000L); + for (int i = 1; i < input.length; i++) + input[i] = input[i - 1] + Math.abs(rng.nextInt(10000)) + 1; + + final byte[] encoded = DeltaOfDeltaCodec.encode(input); + assertThat(DeltaOfDeltaCodec.decode(encoded)).containsExactly(input); + } + + @Test + void testTwoValues() { + final long[] input = { 100, 200 }; + final byte[] encoded = DeltaOfDeltaCodec.encode(input); + assertThat(DeltaOfDeltaCodec.decode(encoded)).containsExactly(input); + } + + @Test + void testLargeDeltaOfDelta() { + // Large jumps that require 64-bit encoding + final long[] input = { 0, 1_000_000_000_000L, 1_000_000_000_001L, 5_000_000_000_000L }; + final byte[] encoded = DeltaOfDeltaCodec.encode(input); + assertThat(DeltaOfDeltaCodec.decode(encoded)).containsExactly(input); + } + + @Test + void testZigZagEncoding() { + assertThat(DeltaOfDeltaCodec.zigZagEncode(0)).isEqualTo(0); + assertThat(DeltaOfDeltaCodec.zigZagEncode(-1)).isEqualTo(1); + assertThat(DeltaOfDeltaCodec.zigZagEncode(1)).isEqualTo(2); + assertThat(DeltaOfDeltaCodec.zigZagEncode(-2)).isEqualTo(3); + assertThat(DeltaOfDeltaCodec.zigZagDecode(DeltaOfDeltaCodec.zigZagEncode(63))).isEqualTo(63); + assertThat(DeltaOfDeltaCodec.zigZagDecode(DeltaOfDeltaCodec.zigZagEncode(-63))).isEqualTo(-63); + } + + @Test + void testAllSameTimestamp() { + final long[] input = { 42, 42, 42, 42, 42 }; + final byte[] encoded = DeltaOfDeltaCodec.encode(input); + assertThat(DeltaOfDeltaCodec.decode(encoded)).containsExactly(input); + } + + /** + * Regression test: dod == -64 must use the compact 7-bit ZigZag bucket, not the 64-bit fallback. + * ZigZag(-64) = 127 which fits in 7 bits. Previously the range check excluded -64. + */ + @Test + void testDodMinusSixtyFourUsesCompactEncoding() { + // Construct timestamps where dod == -64: delta[1]=100, delta[2]=36 → dod = 36 - 100 = -64 + final long[] input = { 1000L, 1100L, 1136L }; + final byte[] encoded = DeltaOfDeltaCodec.encode(input); + assertThat(DeltaOfDeltaCodec.decode(encoded)).containsExactly(input); + + // Verify it uses the compact bucket (encoded should be shorter than if using 64-bit fallback) + // 64-bit fallback: 4(count) + 8(first) + 8(firstDelta) + (4 + 64) / 8 = 29 bytes + // 7-bit bucket: 4(count) + 8(first) + 8(firstDelta) + (2 + 7 + 7) / 8 = ~22 bytes + assertThat(encoded.length).isLessThan(29); + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/codec/DictionaryCodecTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/codec/DictionaryCodecTest.java new file mode 100644 index 0000000000..125477cdf3 --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/codec/DictionaryCodecTest.java @@ -0,0 +1,120 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries.codec; + +import org.junit.jupiter.api.Test; + +import java.io.IOException; + +import static org.assertj.core.api.Assertions.assertThat; +import static org.assertj.core.api.Assertions.assertThatThrownBy; + +/** + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +class DictionaryCodecTest { + + @Test + void testEmpty() throws IOException { + assertThat(DictionaryCodec.decode(DictionaryCodec.encode(new String[0]))).isEmpty(); + assertThat(DictionaryCodec.decode(DictionaryCodec.encode(null))).isEmpty(); + } + + @Test + void testSingleValue() throws IOException { + final String[] input = { "sensor_a" }; + assertThat(DictionaryCodec.decode(DictionaryCodec.encode(input))).containsExactly(input); + } + + @Test + void testSingleUniqueRepeated() throws IOException { + final String[] input = { "host1", "host1", "host1", "host1", "host1" }; + final byte[] encoded = DictionaryCodec.encode(input); + assertThat(DictionaryCodec.decode(encoded)).containsExactly(input); + + // Very compact: 1 dict entry + 5 × 2-byte indices + assertThat(encoded.length).isLessThan(input.length * 10); + } + + @Test + void testMultipleUnique() throws IOException { + final String[] input = new String[100]; + for (int i = 0; i < input.length; i++) + input[i] = "sensor_" + (i % 10); + + assertThat(DictionaryCodec.decode(DictionaryCodec.encode(input))).containsExactly(input); + } + + @Test + void testEmptyStrings() throws IOException { + final String[] input = { "", "", "a", "", "b" }; + assertThat(DictionaryCodec.decode(DictionaryCodec.encode(input))).containsExactly(input); + } + + @Test + void testUnicodeStrings() throws IOException { + final String[] input = { "温度", "湿度", "温度", "气压", "湿度" }; + assertThat(DictionaryCodec.decode(DictionaryCodec.encode(input))).containsExactly(input); + } + + @Test + void testManyUniqueValues() throws IOException { + final String[] input = new String[1000]; + for (int i = 0; i < input.length; i++) + input[i] = "unique_tag_" + i; + + assertThat(DictionaryCodec.decode(DictionaryCodec.encode(input))).containsExactly(input); + } + + @Test + void testPreservesOrder() throws IOException { + final String[] input = { "c", "a", "b", "a", "c", "b" }; + assertThat(DictionaryCodec.decode(DictionaryCodec.encode(input))).containsExactly(input); + } + + @Test + void testMalformedDataThrowsIOException() { + final byte[] malformed = new byte[] { 0, 0, 0, 5 }; // count=5 but no data follows + assertThatThrownBy(() -> DictionaryCodec.decode(malformed)) + .isInstanceOf(IOException.class) + .hasMessageContaining("malformed"); + } + + @Test + void testUtf8LengthGuard() { + // A string whose UTF-8 encoding exceeds 65535 bytes must be rejected with a clear error. + // Each '豆' character encodes to 3 UTF-8 bytes, so 21845 repetitions = 65535 bytes. + final String longEntry = "豆".repeat(21846); // 21846 × 3 = 65538 bytes > 65535 + final String[] input = { longEntry }; + assertThatThrownBy(() -> DictionaryCodec.encode(input)) + .isInstanceOf(IllegalArgumentException.class) + .hasMessageContaining("too long"); + } + + @Test + void testDictionaryOverflow() { + // More than MAX_DICTIONARY_SIZE distinct values must be rejected. + final String[] input = new String[DictionaryCodec.MAX_DICTIONARY_SIZE + 1]; + for (int i = 0; i <= DictionaryCodec.MAX_DICTIONARY_SIZE; i++) + input[i] = "v_" + i; + assertThatThrownBy(() -> DictionaryCodec.encode(input)) + .isInstanceOf(IllegalArgumentException.class) + .hasMessageContaining("Dictionary overflow"); + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/codec/GorillaXORCodecTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/codec/GorillaXORCodecTest.java new file mode 100644 index 0000000000..261e60db03 --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/codec/GorillaXORCodecTest.java @@ -0,0 +1,135 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries.codec; + +import org.junit.jupiter.api.Test; + +import java.util.Random; + +import static org.assertj.core.api.Assertions.assertThat; + +/** + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +class GorillaXORCodecTest { + + @Test + void testEmpty() { + assertThat(GorillaXORCodec.decode(GorillaXORCodec.encode(new double[0]))).isEmpty(); + assertThat(GorillaXORCodec.decode(GorillaXORCodec.encode(null))).isEmpty(); + } + + @Test + void testSingleValue() { + final double[] input = { 22.5 }; + final byte[] encoded = GorillaXORCodec.encode(input); + assertThat(GorillaXORCodec.decode(encoded)).containsExactly(input); + } + + @Test + void testConstantValues() { + final double[] input = new double[100]; + java.util.Arrays.fill(input, 42.0); + + final byte[] encoded = GorillaXORCodec.encode(input); + assertThat(GorillaXORCodec.decode(encoded)).containsExactly(input); + + // Constant values should compress extremely well (1 bit per sample after first) + assertThat(encoded.length).isLessThan(input.length); + } + + @Test + void testSlowlyChangingValues() { + // Temperature-like data: small increments + final double[] input = new double[500]; + input[0] = 20.0; + for (int i = 1; i < input.length; i++) + input[i] = input[i - 1] + 0.1; + + final byte[] encoded = GorillaXORCodec.encode(input); + assertThat(GorillaXORCodec.decode(encoded)).containsExactly(input); + } + + @Test + void testRandomDoubles() { + final Random rng = new Random(42); + final double[] input = new double[300]; + for (int i = 0; i < input.length; i++) + input[i] = rng.nextDouble() * 1000.0; + + final byte[] encoded = GorillaXORCodec.encode(input); + assertThat(GorillaXORCodec.decode(encoded)).containsExactly(input); + } + + @Test + void testSpecialValues() { + final double[] input = { 0.0, -0.0, Double.NaN, Double.POSITIVE_INFINITY, Double.NEGATIVE_INFINITY, Double.MAX_VALUE, + Double.MIN_VALUE }; + final byte[] encoded = GorillaXORCodec.encode(input); + final double[] decoded = GorillaXORCodec.decode(encoded); + + assertThat(decoded.length).isEqualTo(input.length); + assertThat(decoded[0]).isEqualTo(0.0); + assertThat(Double.doubleToRawLongBits(decoded[1])).isEqualTo(Double.doubleToRawLongBits(-0.0)); + assertThat(decoded[2]).isNaN(); + assertThat(decoded[3]).isEqualTo(Double.POSITIVE_INFINITY); + assertThat(decoded[4]).isEqualTo(Double.NEGATIVE_INFINITY); + assertThat(decoded[5]).isEqualTo(Double.MAX_VALUE); + assertThat(decoded[6]).isEqualTo(Double.MIN_VALUE); + } + + @Test + void testTwoValues() { + final double[] input = { 1.0, 2.0 }; + final byte[] encoded = GorillaXORCodec.encode(input); + assertThat(GorillaXORCodec.decode(encoded)).containsExactly(input); + } + + @Test + void testNegativeValues() { + final double[] input = { -100.5, -100.3, -100.1, -99.9, -99.7 }; + final byte[] encoded = GorillaXORCodec.encode(input); + assertThat(GorillaXORCodec.decode(encoded)).containsExactly(input); + } + + /** + * Regression test: decoder must initialise prevLeading to Integer.MAX_VALUE, not 0. + * If prevLeading starts at 0, the '10' path (reuse block) may fire on the second pair + * without a prior '11' header having been written, producing wrong values. + * The encoder always emits '11' first, so the decoder must start with MAX_VALUE + * so that leading >= prevLeading is false and the '11' path is taken correctly. + */ + @Test + void testDecoderPrevLeadingInitialisedToMaxValue() { + // Construct two values whose XOR has the same leading/trailing zeros as + // "no prior block" — use a constant array where the third differs. + // 1.0 XOR 2.0 has many leading zeros; encoding must round-trip correctly. + final double[] input = { 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0 }; + assertThat(GorillaXORCodec.decode(GorillaXORCodec.encode(input))).containsExactly(input); + } + + @Test + void testDecodeBufferVariant() { + final double[] input = { 1.0, 2.0, 3.0, 4.0 }; + final byte[] encoded = GorillaXORCodec.encode(input); + final double[] output = new double[input.length]; + GorillaXORCodec.decode(encoded, output); + assertThat(output).containsExactly(input); + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/codec/Simple8bCodecTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/codec/Simple8bCodecTest.java new file mode 100644 index 0000000000..4f4906c624 --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/codec/Simple8bCodecTest.java @@ -0,0 +1,162 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries.codec; + +import org.junit.jupiter.api.Test; + +import java.io.IOException; +import java.util.Random; + +import static org.assertj.core.api.Assertions.assertThat; +import static org.assertj.core.api.Assertions.assertThatThrownBy; + +/** + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +class Simple8bCodecTest { + + @Test + void testEmpty() throws IOException { + assertThat(Simple8bCodec.decode(Simple8bCodec.encode(new long[0]))).isEmpty(); + assertThat(Simple8bCodec.decode(Simple8bCodec.encode(null))).isEmpty(); + } + + @Test + void testSingleValue() throws IOException { + final long[] input = { 42 }; + assertThat(Simple8bCodec.decode(Simple8bCodec.encode(input))).containsExactly(input); + } + + @Test + void testAllZeros() throws IOException { + final long[] input = new long[240]; + final byte[] encoded = Simple8bCodec.encode(input); + assertThat(Simple8bCodec.decode(encoded)).containsExactly(input); + + // 240 zeros should fit in a single 8-byte word + 4 bytes header + assertThat(encoded.length).isEqualTo(12); + } + + @Test + void testSmallInts() throws IOException { + // Values 0-1 (1 bit each) → 60 per word + final long[] input = new long[60]; + for (int i = 0; i < input.length; i++) + input[i] = i % 2; + + assertThat(Simple8bCodec.decode(Simple8bCodec.encode(input))).containsExactly(input); + } + + @Test + void testMediumInts() throws IOException { + // Values 0-255 (8 bits each) → 7 per word + final long[] input = new long[100]; + for (int i = 0; i < input.length; i++) + input[i] = i % 256; + + assertThat(Simple8bCodec.decode(Simple8bCodec.encode(input))).containsExactly(input); + } + + @Test + void testLargeInts() throws IOException { + // Values that need 30 bits → 2 per word + final long[] input = { 500_000_000L, 700_000_000L, 100_000_000L, 999_999_999L }; + assertThat(Simple8bCodec.decode(Simple8bCodec.encode(input))).containsExactly(input); + } + + @Test + void testVeryLargeInts() throws IOException { + // After zigzag encoding, max positive value that fits in 60 bits is (1L << 59) - 1 + final long[] input = { (1L << 59) - 1, (1L << 58) + 1, (1L << 58) - 1 }; + assertThat(Simple8bCodec.decode(Simple8bCodec.encode(input))).containsExactly(input); + } + + @Test + void testNegativeValues() throws IOException { + // Zigzag encoding allows negative values + final long[] input = { -1, -100, -1000, 0, 42, -42 }; + assertThat(Simple8bCodec.decode(Simple8bCodec.encode(input))).containsExactly(input); + } + + @Test + void testLargeNegativeValues() throws IOException { + // Values within the zigzag-encodable range: max zigzag output must fit in 60 bits + final long[] input = { -(1L << 58), -(1L << 57), -999_999_999L, 999_999_999L }; + assertThat(Simple8bCodec.decode(Simple8bCodec.encode(input))).containsExactly(input); + } + + @Test + void testMixedSizes() throws IOException { + final long[] input = { 0, 1, 255, 1000, 0, 0, 0, 50000, 1 }; + assertThat(Simple8bCodec.decode(Simple8bCodec.encode(input))).containsExactly(input); + } + + @Test + void testAllSameNonZero() throws IOException { + final long[] input = new long[100]; + java.util.Arrays.fill(input, 7L); + assertThat(Simple8bCodec.decode(Simple8bCodec.encode(input))).containsExactly(input); + } + + @Test + void testRandomValues() throws IOException { + final Random rng = new Random(42); + final long[] input = new long[200]; + for (int i = 0; i < input.length; i++) + input[i] = Math.abs(rng.nextInt(10000)); + + assertThat(Simple8bCodec.decode(Simple8bCodec.encode(input))).containsExactly(input); + } + + @Test + void testMalformedDataThrowsIOException() { + final byte[] malformed = new byte[] { 0, 0, 0, 5 }; // count=5 but no words follow + assertThatThrownBy(() -> Simple8bCodec.decode(malformed)) + .isInstanceOf(IOException.class) + .hasMessageContaining("malformed"); + } + + /** + * Regression test: values with |v| >= 2^59 must throw IllegalArgumentException rather than + * silently truncating via 60-bit ZigZag overflow. Previously encode() had no bounds check. + */ + @Test + void testOutOfRangeValueThrows() { + // ZigZag(-(2^59) - 1) exceeds 60 bits and must be rejected + final long outOfRange = -(1L << 59) - 1; + assertThatThrownBy(() -> Simple8bCodec.encode(new long[] { outOfRange })) + .isInstanceOf(IllegalArgumentException.class) + .hasMessageContaining("Simple-8b supported range"); + } + + @Test + void testBoundaryValueAccepted() throws IOException { + // ZigZag(-(2^59)) = (1L<<60)-1 which is exactly MAX_ZIGZAG_VALUE — must be accepted + final long boundary = -(1L << 59); + final long[] input = { boundary }; + assertThat(Simple8bCodec.decode(Simple8bCodec.encode(input))).containsExactly(input); + } + + @Test + void testMaxPositiveBoundaryAccepted() throws IOException { + // (2^59)-1 is the largest positive value: ZigZag((2^59)-1) = (1L<<60)-2 < MAX_ZIGZAG_VALUE + final long[] input = { (1L << 59) - 1 }; + assertThat(Simple8bCodec.decode(Simple8bCodec.encode(input))).containsExactly(input); + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/codec/SlidingBitReaderTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/codec/SlidingBitReaderTest.java new file mode 100644 index 0000000000..a72fe45f05 --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/codec/SlidingBitReaderTest.java @@ -0,0 +1,243 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries.codec; + +import org.junit.jupiter.api.Test; + +import java.util.Random; + +import static org.assertj.core.api.Assertions.assertThat; + +/** + * Tests for the sliding-window BitReader implementation. + * Validates correctness of bit-level reads, refill boundaries, and round-trip + * compatibility with GorillaXOR and DeltaOfDelta codecs. + */ +class SlidingBitReaderTest { + + @Test + void testReadBitAndReadBits1Match() { + // Write alternating 0s and 1s, read them back via readBit() and readBits(1) + final DeltaOfDeltaCodec.BitWriter writer = new DeltaOfDeltaCodec.BitWriter(16); + writer.writeBit(1); + writer.writeBit(0); + writer.writeBit(1); + writer.writeBit(1); + writer.writeBit(0); + + final byte[] data = writer.toByteArray(); + final DeltaOfDeltaCodec.BitReader reader = new DeltaOfDeltaCodec.BitReader(data); + + assertThat(reader.readBit()).isEqualTo(1); + assertThat((int) reader.readBits(1)).isEqualTo(0); + assertThat(reader.readBit()).isEqualTo(1); + assertThat((int) reader.readBits(1)).isEqualTo(1); + assertThat(reader.readBit()).isEqualTo(0); + } + + @Test + void testReadZeroBits() { + final DeltaOfDeltaCodec.BitWriter writer = new DeltaOfDeltaCodec.BitWriter(16); + writer.writeBits(0xAB, 8); + final byte[] data = writer.toByteArray(); + final DeltaOfDeltaCodec.BitReader reader = new DeltaOfDeltaCodec.BitReader(data); + + assertThat(reader.readBits(0)).isEqualTo(0); + assertThat(reader.readBits(8)).isEqualTo(0xAB); + } + + @Test + void testRead64Bits() { + // 64-bit reads are used for the header (count, first value, first delta) + final long value = 0x123456789ABCDEF0L; + final DeltaOfDeltaCodec.BitWriter writer = new DeltaOfDeltaCodec.BitWriter(16); + writer.writeBits(value, 64); + final byte[] data = writer.toByteArray(); + + final DeltaOfDeltaCodec.BitReader reader = new DeltaOfDeltaCodec.BitReader(data); + assertThat(reader.readBits(64)).isEqualTo(value); + } + + @Test + void testMultiple64BitReads() { + // Gorilla XOR header: 32-bit count + 64-bit first value + final DeltaOfDeltaCodec.BitWriter writer = new DeltaOfDeltaCodec.BitWriter(32); + writer.writeBits(42, 32); + writer.writeBits(Double.doubleToRawLongBits(3.14), 64); + writer.writeBits(0xFFL, 8); + + final byte[] data = writer.toByteArray(); + final DeltaOfDeltaCodec.BitReader reader = new DeltaOfDeltaCodec.BitReader(data); + + assertThat(reader.readBits(32)).isEqualTo(42); + assertThat(Double.longBitsToDouble(reader.readBits(64))).isEqualTo(3.14); + assertThat(reader.readBits(8)).isEqualTo(0xFF); + } + + @Test + void testRefillBoundary() { + // Write enough bits to force multiple refills + final DeltaOfDeltaCodec.BitWriter writer = new DeltaOfDeltaCodec.BitWriter(64); + // Write 57 bits then another 57 bits — this forces a refill mid-stream + final long val1 = (1L << 57) - 1; // all 1s in 57 bits + final long val2 = 0x1234567890ABCL; // arbitrary 49-bit value + writer.writeBits(val1, 57); + writer.writeBits(val2, 49); + + final byte[] data = writer.toByteArray(); + final DeltaOfDeltaCodec.BitReader reader = new DeltaOfDeltaCodec.BitReader(data); + + assertThat(reader.readBits(57)).isEqualTo(val1); + assertThat(reader.readBits(49)).isEqualTo(val2); + } + + @Test + void testVeryShortData() { + // 1 byte = 8 bits + final DeltaOfDeltaCodec.BitWriter writer = new DeltaOfDeltaCodec.BitWriter(4); + writer.writeBits(0b10110, 5); + final byte[] data = writer.toByteArray(); + + final DeltaOfDeltaCodec.BitReader reader = new DeltaOfDeltaCodec.BitReader(data); + assertThat(reader.readBits(5)).isEqualTo(0b10110); + } + + @Test + void testMixedBitAndBitsReads() { + // Simulates the Gorilla XOR decode pattern: readBit + readBit + readBits(N) + final DeltaOfDeltaCodec.BitWriter writer = new DeltaOfDeltaCodec.BitWriter(32); + writer.writeBit(1); // control bit + writer.writeBit(0); // case '10' indicator + writer.writeBits(0x1234567890AL, 45); // payload + + writer.writeBit(1); // control bit + writer.writeBit(1); // case '11' indicator + writer.writeBits(12, 6); // leading zeros + writer.writeBits(50, 6); // block size - 1 = 50, so blockSize = 51 + writer.writeBits(0x7FFFFFFFFFFFFL, 51); // XOR payload + + final byte[] data = writer.toByteArray(); + final DeltaOfDeltaCodec.BitReader reader = new DeltaOfDeltaCodec.BitReader(data); + + // Case '10' + assertThat(reader.readBit()).isEqualTo(1); + assertThat(reader.readBit()).isEqualTo(0); + assertThat(reader.readBits(45)).isEqualTo(0x1234567890AL); + + // Case '11' + assertThat(reader.readBit()).isEqualTo(1); + assertThat(reader.readBit()).isEqualTo(1); + assertThat(reader.readBits(6)).isEqualTo(12); + assertThat(reader.readBits(6)).isEqualTo(50); + assertThat(reader.readBits(51)).isEqualTo(0x7FFFFFFFFFFFFL); + } + + @Test + void testGorillaXorRoundTripLargeRandom() { + // Simulates benchmark data pattern: 20.0 + random * 15.0 + final Random rng = new Random(12345); + final double[] input = new double[65536]; + for (int i = 0; i < input.length; i++) + input[i] = 20.0 + rng.nextDouble() * 15.0; + + final byte[] encoded = GorillaXORCodec.encode(input); + final double[] decoded = GorillaXORCodec.decode(encoded); + assertThat(decoded).containsExactly(input); + + // Also test buffer-reuse variant + final double[] buf = new double[65536]; + final int count = GorillaXORCodec.decode(encoded, buf); + assertThat(count).isEqualTo(input.length); + for (int i = 0; i < count; i++) + assertThat(buf[i]).isEqualTo(input[i]); + } + + @Test + void testDeltaOfDeltaRoundTripLargeMonotonic() { + // Monotonically increasing timestamps at ~100ms intervals with jitter + final Random rng = new Random(54321); + final long[] input = new long[65536]; + input[0] = System.currentTimeMillis(); + for (int i = 1; i < input.length; i++) + input[i] = input[i - 1] + 100 + rng.nextInt(10) - 5; + + final byte[] encoded = DeltaOfDeltaCodec.encode(input); + final long[] decoded = DeltaOfDeltaCodec.decode(encoded); + assertThat(decoded).containsExactly(input); + + // Also test buffer-reuse variant + final long[] buf = new long[65536]; + final int count = DeltaOfDeltaCodec.decode(encoded, buf); + assertThat(count).isEqualTo(input.length); + for (int i = 0; i < count; i++) + assertThat(buf[i]).isEqualTo(input[i]); + } + + @Test + void testGorillaXorRoundTripSpecialValues() { + final double[] input = { + 0.0, -0.0, Double.NaN, Double.POSITIVE_INFINITY, Double.NEGATIVE_INFINITY, + Double.MAX_VALUE, Double.MIN_VALUE, Math.PI, Math.E, + 1.0, 1.0, 1.0, // consecutive identical values (zero XOR path) + -1.0, 2.0, -2.0 // sign changes (large XOR) + }; + + final byte[] encoded = GorillaXORCodec.encode(input); + final double[] decoded = GorillaXORCodec.decode(encoded); + + assertThat(decoded.length).isEqualTo(input.length); + for (int i = 0; i < input.length; i++) + assertThat(Double.doubleToRawLongBits(decoded[i])).isEqualTo(Double.doubleToRawLongBits(input[i])); + } + + @Test + void testDeltaOfDeltaRoundTripConstantDelta() { + // Perfectly regular timestamps — all delta-of-deltas are 0 + final long[] input = new long[10000]; + for (int i = 0; i < input.length; i++) + input[i] = 1000000L + i * 100L; + + final byte[] encoded = DeltaOfDeltaCodec.encode(input); + final long[] decoded = DeltaOfDeltaCodec.decode(encoded); + assertThat(decoded).containsExactly(input); + } + + @Test + void testDeltaOfDeltaRoundTripAllBuckets() { + // Exercise all encoding buckets: dod=0, |dod|<=63, |dod|<=255, |dod|<=2047, else + final long[] input = new long[20]; + input[0] = 1000; + input[1] = 1100; // delta=100 + input[2] = 1200; // delta=100, dod=0 + input[3] = 1310; // delta=110, dod=10 (bucket: |dod|<=63) + input[4] = 1410; // delta=100, dod=-10 + input[5] = 1710; // delta=300, dod=200 (bucket: |dod|<=255) + input[6] = 1810; // delta=100, dod=-200 + input[7] = 3810; // delta=2000, dod=1900 (bucket: |dod|<=2047) + input[8] = 3910; // delta=100, dod=-1900 + input[9] = 53910; // delta=50000, dod=49900 (bucket: raw 64-bit) + // Fill rest with regular increments + for (int i = 10; i < input.length; i++) + input[i] = input[i - 1] + 100; + + final byte[] encoded = DeltaOfDeltaCodec.encode(input); + final long[] decoded = DeltaOfDeltaCodec.decode(encoded); + assertThat(decoded).containsExactly(input); + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/promql/PromQLEvaluatorIntegrationTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/promql/PromQLEvaluatorIntegrationTest.java new file mode 100644 index 0000000000..2918597fc9 --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/promql/PromQLEvaluatorIntegrationTest.java @@ -0,0 +1,284 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries.promql; + +import com.arcadedb.TestHelper; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr; +import com.arcadedb.query.sql.executor.ResultSet; +import org.junit.jupiter.api.Test; + +import static org.assertj.core.api.Assertions.assertThat; +import static org.assertj.core.api.Assertions.assertThatThrownBy; + +/** + * End-to-end PromQL integration tests: parse → evaluate against live TimeSeries data. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +class PromQLEvaluatorIntegrationTest extends TestHelper { + + @Test + void testInstantVectorSelector() { + createTypeAndInsertData("cpu_usage"); + + final PromQLEvaluator evaluator = new PromQLEvaluator(getDatabaseInternal()); + final PromQLExpr expr = new PromQLParser("cpu_usage").parse(); + final PromQLResult result = evaluator.evaluateInstant(expr, 6000L); + + assertThat(result).isInstanceOf(PromQLResult.InstantVector.class); + final PromQLResult.InstantVector iv = (PromQLResult.InstantVector) result; + // Should find at least the latest sample within the 5-minute lookback window + assertThat(iv.samples()).isNotEmpty(); + } + + @Test + void testRateFunction() { + createTypeAndInsertData("http_requests_total"); + + final PromQLEvaluator evaluator = new PromQLEvaluator(getDatabaseInternal()); + final PromQLExpr expr = new PromQLParser("rate(http_requests_total[5m])").parse(); + final PromQLResult result = evaluator.evaluateInstant(expr, 6000L); + + assertThat(result).isInstanceOf(PromQLResult.InstantVector.class); + final PromQLResult.InstantVector iv = (PromQLResult.InstantVector) result; + // rate() over a counter-like series should produce non-negative values + for (final PromQLResult.VectorSample sample : iv.samples()) + assertThat(sample.value()).isGreaterThanOrEqualTo(0.0); + } + + @Test + void testBinaryExpressionWithScalar() { + createTypeAndInsertData("metric_a"); + + final PromQLEvaluator evaluator = new PromQLEvaluator(getDatabaseInternal()); + final PromQLExpr expr = new PromQLParser("metric_a * 2").parse(); + final PromQLResult result = evaluator.evaluateInstant(expr, 6000L); + + assertThat(result).isInstanceOf(PromQLResult.InstantVector.class); + final PromQLResult.InstantVector iv = (PromQLResult.InstantVector) result; + for (final PromQLResult.VectorSample sample : iv.samples()) + assertThat(sample.value()).isNotNaN(); + } + + @Test + void testSumAggregation() { + createTypeWithTags("tagged_metric"); + + final PromQLEvaluator evaluator = new PromQLEvaluator(getDatabaseInternal()); + final PromQLExpr expr = new PromQLParser("sum(tagged_metric)").parse(); + final PromQLResult result = evaluator.evaluateInstant(expr, 6000L); + + assertThat(result).isInstanceOf(PromQLResult.InstantVector.class); + final PromQLResult.InstantVector iv = (PromQLResult.InstantVector) result; + assertThat(iv.samples()).hasSize(1); + // Sum of latest samples per label combination + assertThat(iv.samples().getFirst().value()).isGreaterThan(0.0); + } + + @Test + void testSumByAggregation() { + createTypeWithTags("group_metric"); + + final PromQLEvaluator evaluator = new PromQLEvaluator(getDatabaseInternal()); + final PromQLExpr expr = new PromQLParser("sum by (host) (group_metric)").parse(); + final PromQLResult result = evaluator.evaluateInstant(expr, 6000L); + + assertThat(result).isInstanceOf(PromQLResult.InstantVector.class); + final PromQLResult.InstantVector iv = (PromQLResult.InstantVector) result; + // Should have one result per distinct host + assertThat(iv.samples()).hasSizeGreaterThanOrEqualTo(1); + } + + @Test + void testRangeQueryWithStep() { + createTypeAndInsertData("range_metric"); + + final PromQLEvaluator evaluator = new PromQLEvaluator(getDatabaseInternal()); + final PromQLExpr expr = new PromQLParser("range_metric").parse(); + final PromQLResult result = evaluator.evaluateRange(expr, 1000L, 5000L, 1000L); + + assertThat(result).isInstanceOf(PromQLResult.MatrixResult.class); + final PromQLResult.MatrixResult mr = (PromQLResult.MatrixResult) result; + assertThat(mr.series()).isNotEmpty(); + } + + @Test + void testEmptyResultForNonExistentType() { + final PromQLEvaluator evaluator = new PromQLEvaluator(getDatabaseInternal()); + final PromQLExpr expr = new PromQLParser("nonexistent_metric").parse(); + final PromQLResult result = evaluator.evaluateInstant(expr, 1000L); + + assertThat(result).isInstanceOf(PromQLResult.InstantVector.class); + final PromQLResult.InstantVector iv = (PromQLResult.InstantVector) result; + assertThat(iv.samples()).isEmpty(); + } + + @Test + void testEvaluateRangeStepZero() { + final PromQLEvaluator evaluator = new PromQLEvaluator(getDatabaseInternal()); + final PromQLExpr expr = new PromQLParser("42").parse(); + + assertThatThrownBy(() -> evaluator.evaluateRange(expr, 1000L, 5000L, 0L)) + .isInstanceOf(IllegalArgumentException.class) + .hasMessageContaining("stepMs must be positive"); + } + + @Test + void testEvaluateRangeInvertedBounds() { + // Regression: endMs < startMs previously returned empty results silently + final PromQLEvaluator evaluator = new PromQLEvaluator(getDatabaseInternal()); + final PromQLExpr expr = new PromQLParser("42").parse(); + + assertThatThrownBy(() -> evaluator.evaluateRange(expr, 5000L, 1000L, 1000L)) + .isInstanceOf(IllegalArgumentException.class) + .hasMessageContaining("endMs") + .hasMessageContaining("startMs"); + } + + @Test + void testReDoSPatternRejected() { + // Security: regex patterns with nested quantifiers must be rejected to prevent ReDoS attacks + createTypeWithTags("redos_metric"); + + final PromQLEvaluator evaluator = new PromQLEvaluator(getDatabaseInternal()); + // (a+)+ is the classic ReDoS pattern + final PromQLExpr expr = new PromQLParser("redos_metric{host=~\"(a+)+\"}").parse(); + + assertThatThrownBy(() -> evaluator.evaluateInstant(expr, 6000L)) + .isInstanceOf(IllegalArgumentException.class) + .hasMessageContaining("ReDoS"); + } + + @Test + void testScalarArithmetic() { + final PromQLEvaluator evaluator = new PromQLEvaluator(getDatabaseInternal()); + final PromQLExpr expr = new PromQLParser("2 + 3 * 4").parse(); + final PromQLResult result = evaluator.evaluateInstant(expr, 1000L); + + assertThat(result).isInstanceOf(PromQLResult.ScalarResult.class); + assertThat(((PromQLResult.ScalarResult) result).value()).isEqualTo(14.0); + } + + @Test + void testExtractLabelsWithTagBeforeTimestamp() { + // Regression: extractLabels / extractValue must work correctly even when the + // TIMESTAMP column is not at schema position 0. + // Schema: TAG(host) at index 0, TIMESTAMP(ts) at index 1, FIELD(value) at index 2. + // Row format from engine: [ts, host, value] — TIMESTAMP is always row[0]. + // Previously the code used row[schemaIndex] directly, so host would read the timestamp. + final String typeName = "promql_tag_first"; + // Use the builder API to create a type with TAG before TIMESTAMP + new com.arcadedb.schema.TimeSeriesTypeBuilder(getDatabaseInternal()) + .withName(typeName) + .withTag("host", com.arcadedb.schema.Type.STRING) + .withTimestamp("ts") + .withField("value", com.arcadedb.schema.Type.DOUBLE) + .withShards(1) + .create(); + + database.transaction(() -> { + database.command("sql", "INSERT INTO " + typeName + " SET ts = 1000, host = 'srv1', value = 42.0"); + database.command("sql", "INSERT INTO " + typeName + " SET ts = 2000, host = 'srv2', value = 84.0"); + }); + + final PromQLEvaluator evaluator = new PromQLEvaluator(getDatabaseInternal()); + final PromQLExpr expr = new PromQLParser(typeName + "{host=\"srv1\"}").parse(); + // eval at 6000ms so the 5-minute lookback window covers ts=1000 and ts=2000 + final PromQLResult result = evaluator.evaluateInstant(expr, 6000L); + + assertThat(result).isInstanceOf(PromQLResult.InstantVector.class); + final PromQLResult.InstantVector iv = (PromQLResult.InstantVector) result; + assertThat(iv.samples()).isNotEmpty(); + // The label "host" must resolve to "srv1", not to a timestamp number + assertThat(iv.samples().getFirst().labels()).containsEntry("host", "srv1"); + // The value must be a numeric double, not NaN + assertThat(iv.samples().getFirst().value()).isEqualTo(42.0); + } + + @Test + void testQueryUsesIterateQueryPath() { + // Verify that evaluateVectorSelector uses the lazy iterator path (iterateQuery) + // rather than the eager-loading query() path. We verify this indirectly by + // confirming that a large dataset is evaluated correctly. + createTypeAndInsertData("promql_iter_test"); + + final PromQLEvaluator evaluator = new PromQLEvaluator(getDatabaseInternal()); + // eval at 6000ms so the 5-minute lookback window covers the inserted data + final PromQLResult result = evaluator.evaluateInstant( + new PromQLParser("promql_iter_test").parse(), 6000L); + + assertThat(result).isInstanceOf(PromQLResult.InstantVector.class); + assertThat(((PromQLResult.InstantVector) result).samples()).isNotEmpty(); + } + + @Test + void testPromQLSqlFunction() { + createTypeAndInsertData("promql_sql_test"); + + // RETURN with a List unwraps each map entry into a separate result row. + // Each row has __value__ and any label properties as direct row properties. + try (final ResultSet rs = database.command("sql", "RETURN promql('promql_sql_test', 6000)")) { + assertThat(rs.hasNext()).isTrue(); + // First row should have a numeric __value__ + final Object sampleValue = rs.next().getProperty("__value__"); + assertThat(sampleValue).isNotNull().isInstanceOf(Double.class); + } + } + + @Test + void testPromQLSqlFunctionUsesCurrentTimeWhenNoArgument() { + createTypeAndInsertData("promql_sql_notime_test"); + + // Called without evalTimeMs — uses System.currentTimeMillis() internally. + // The data was inserted with timestamps 1000-5000 ms which are in the far past + // relative to current time, so results will be empty (lookback window is 5 minutes). + // RETURN of an empty list produces no rows — verify the query executes without error. + database.command("sql", "RETURN promql('promql_sql_notime_test')").close(); + } + + // --- Helper methods --- + + private com.arcadedb.database.DatabaseInternal getDatabaseInternal() { + return (com.arcadedb.database.DatabaseInternal) database; + } + + private void createTypeAndInsertData(final String typeName) { + database.command("sql", + "CREATE TIMESERIES TYPE " + typeName + " TIMESTAMP ts FIELDS (value DOUBLE)"); + + database.transaction(() -> { + for (int i = 1; i <= 5; i++) + database.command("sql", + "INSERT INTO " + typeName + " SET ts = " + (i * 1000) + ", value = " + (i * 10.0)); + }); + } + + private void createTypeWithTags(final String typeName) { + database.command("sql", + "CREATE TIMESERIES TYPE " + typeName + " TIMESTAMP ts TAGS (host STRING) FIELDS (value DOUBLE)"); + + database.transaction(() -> { + database.command("sql", "INSERT INTO " + typeName + " SET ts = 1000, host = 'a', value = 10.0"); + database.command("sql", "INSERT INTO " + typeName + " SET ts = 2000, host = 'b', value = 20.0"); + database.command("sql", "INSERT INTO " + typeName + " SET ts = 3000, host = 'a', value = 30.0"); + database.command("sql", "INSERT INTO " + typeName + " SET ts = 4000, host = 'b', value = 40.0"); + database.command("sql", "INSERT INTO " + typeName + " SET ts = 5000, host = 'a', value = 50.0"); + }); + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/promql/PromQLParserTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/promql/PromQLParserTest.java new file mode 100644 index 0000000000..7ec071fa45 --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/promql/PromQLParserTest.java @@ -0,0 +1,307 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries.promql; + +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.AggOp; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.AggregationExpr; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.BinaryExpr; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.BinaryOp; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.FunctionCallExpr; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.LabelMatcher; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.MatchOp; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.MatrixSelector; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.NumberLiteral; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.StringLiteral; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.UnaryExpr; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.VectorSelector; +import org.junit.jupiter.api.Test; + +import java.util.List; + +import static org.assertj.core.api.Assertions.assertThat; +import static org.assertj.core.api.Assertions.assertThatThrownBy; + +/** + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +class PromQLParserTest { + + @Test + void testSimpleSelector() { + final PromQLExpr expr = new PromQLParser("cpu_usage").parse(); + assertThat(expr).isInstanceOf(VectorSelector.class); + final VectorSelector vs = (VectorSelector) expr; + assertThat(vs.metricName()).isEqualTo("cpu_usage"); + assertThat(vs.matchers()).isEmpty(); + assertThat(vs.offsetMs()).isZero(); + } + + @Test + void testSelectorWithMatchers() { + final PromQLExpr expr = new PromQLParser("http_requests{job=\"api\",status!=\"500\"}").parse(); + assertThat(expr).isInstanceOf(VectorSelector.class); + final VectorSelector vs = (VectorSelector) expr; + assertThat(vs.metricName()).isEqualTo("http_requests"); + assertThat(vs.matchers()).hasSize(2); + assertThat(vs.matchers().get(0)).isEqualTo(new LabelMatcher("job", MatchOp.EQ, "api")); + assertThat(vs.matchers().get(1)).isEqualTo(new LabelMatcher("status", MatchOp.NEQ, "500")); + } + + @Test + void testRegexMatcher() { + final PromQLExpr expr = new PromQLParser("http_requests{job=~\"api.*\"}").parse(); + assertThat(expr).isInstanceOf(VectorSelector.class); + final VectorSelector vs = (VectorSelector) expr; + assertThat(vs.matchers()).hasSize(1); + assertThat(vs.matchers().getFirst().op()).isEqualTo(MatchOp.RE); + assertThat(vs.matchers().getFirst().value()).isEqualTo("api.*"); + } + + @Test + void testNegativeRegexMatcher() { + final PromQLExpr expr = new PromQLParser("http_requests{job!~\"test.*\"}").parse(); + final VectorSelector vs = (VectorSelector) expr; + assertThat(vs.matchers().getFirst().op()).isEqualTo(MatchOp.NRE); + } + + @Test + void testRangeVector() { + final PromQLExpr expr = new PromQLParser("http_requests[5m]").parse(); + assertThat(expr).isInstanceOf(MatrixSelector.class); + final MatrixSelector ms = (MatrixSelector) expr; + assertThat(ms.selector().metricName()).isEqualTo("http_requests"); + assertThat(ms.rangeMs()).isEqualTo(300_000); + } + + @Test + void testAggregationByBefore() { + final PromQLExpr expr = new PromQLParser("sum by (job) (http_requests)").parse(); + assertThat(expr).isInstanceOf(AggregationExpr.class); + final AggregationExpr agg = (AggregationExpr) expr; + assertThat(agg.op()).isEqualTo(AggOp.SUM); + assertThat(agg.groupLabels()).containsExactly("job"); + assertThat(agg.without()).isFalse(); + assertThat(agg.expr()).isInstanceOf(VectorSelector.class); + } + + @Test + void testAggregationByAfter() { + final PromQLExpr expr = new PromQLParser("sum(http_requests) by (job)").parse(); + assertThat(expr).isInstanceOf(AggregationExpr.class); + final AggregationExpr agg = (AggregationExpr) expr; + assertThat(agg.op()).isEqualTo(AggOp.SUM); + assertThat(agg.groupLabels()).containsExactly("job"); + } + + @Test + void testAggregationWithout() { + final PromQLExpr expr = new PromQLParser("avg without (instance) (cpu_usage)").parse(); + final AggregationExpr agg = (AggregationExpr) expr; + assertThat(agg.op()).isEqualTo(AggOp.AVG); + assertThat(agg.without()).isTrue(); + assertThat(agg.groupLabels()).containsExactly("instance"); + } + + @Test + void testFunctionCall() { + final PromQLExpr expr = new PromQLParser("rate(http_requests[5m])").parse(); + assertThat(expr).isInstanceOf(FunctionCallExpr.class); + final FunctionCallExpr fn = (FunctionCallExpr) expr; + assertThat(fn.name()).isEqualTo("rate"); + assertThat(fn.args()).hasSize(1); + assertThat(fn.args().getFirst()).isInstanceOf(MatrixSelector.class); + } + + @Test + void testBinaryExpression() { + final PromQLExpr expr = new PromQLParser("cpu_usage * 100").parse(); + assertThat(expr).isInstanceOf(BinaryExpr.class); + final BinaryExpr bin = (BinaryExpr) expr; + assertThat(bin.op()).isEqualTo(BinaryOp.MUL); + assertThat(bin.left()).isInstanceOf(VectorSelector.class); + assertThat(bin.right()).isInstanceOf(NumberLiteral.class); + assertThat(((NumberLiteral) bin.right()).value()).isEqualTo(100.0); + } + + @Test + void testNestedAggregationAndFunction() { + final PromQLExpr expr = new PromQLParser("sum(rate(http_requests_total[5m])) by (job)").parse(); + assertThat(expr).isInstanceOf(AggregationExpr.class); + final AggregationExpr agg = (AggregationExpr) expr; + assertThat(agg.op()).isEqualTo(AggOp.SUM); + assertThat(agg.groupLabels()).containsExactly("job"); + assertThat(agg.expr()).isInstanceOf(FunctionCallExpr.class); + } + + @Test + void testOffset() { + final PromQLExpr expr = new PromQLParser("http_requests offset 5m").parse(); + assertThat(expr).isInstanceOf(VectorSelector.class); + final VectorSelector vs = (VectorSelector) expr; + assertThat(vs.offsetMs()).isEqualTo(300_000); + } + + @Test + void testRangeWithOffset() { + final PromQLExpr expr = new PromQLParser("http_requests[5m] offset 1h").parse(); + assertThat(expr).isInstanceOf(MatrixSelector.class); + final MatrixSelector ms = (MatrixSelector) expr; + assertThat(ms.rangeMs()).isEqualTo(300_000); + assertThat(ms.selector().offsetMs()).isEqualTo(3_600_000); + } + + @Test + void testDurationParsing() { + assertThat(PromQLParser.parseDuration("5m")).isEqualTo(300_000); + assertThat(PromQLParser.parseDuration("1h30m")).isEqualTo(5_400_000); + assertThat(PromQLParser.parseDuration("2d")).isEqualTo(172_800_000); + assertThat(PromQLParser.parseDuration("1w")).isEqualTo(604_800_000); + assertThat(PromQLParser.parseDuration("30s")).isEqualTo(30_000); + } + + @Test + void testDurationParsingMilliseconds() { + // Regression: '500ms' was previously mis-parsed as 500 minutes (30,000,000 ms) instead of 500 ms + assertThat(PromQLParser.parseDuration("500ms")).isEqualTo(500); + assertThat(PromQLParser.parseDuration("1ms")).isEqualTo(1); + assertThat(PromQLParser.parseDuration("100ms")).isEqualTo(100); + // Combined: 1s + 500ms + assertThat(PromQLParser.parseDuration("1s500ms")).isEqualTo(1_500); + // 'm' followed by something other than 's' should still be minutes + assertThat(PromQLParser.parseDuration("2m")).isEqualTo(120_000); + assertThat(PromQLParser.parseDuration("2m30s")).isEqualTo(150_000); + } + + @Test + void testDurationParsingOverflow() { + // Values that overflow long when multiplied by the unit multiplier should throw. + // 300,000,000 years overflows: 300_000_000 * 31_536_000_000 > Long.MAX_VALUE + assertThatThrownBy(() -> PromQLParser.parseDuration("300000000y")) + .isInstanceOf(IllegalArgumentException.class) + .hasMessageContaining("too large"); + } + + @Test + void testRangeVectorWithMilliseconds() { + // Regression: metric[500ms] should have rangeMs = 500, not 30,000,000 + final PromQLExpr expr = new PromQLParser("http_requests[500ms]").parse(); + assertThat(expr).isInstanceOf(MatrixSelector.class); + assertThat(((MatrixSelector) expr).rangeMs()).isEqualTo(500); + } + + @Test + void testOperatorPrecedence() { + // 1 + 2 * 3 should be 1 + (2 * 3) + final PromQLExpr expr = new PromQLParser("1 + 2 * 3").parse(); + assertThat(expr).isInstanceOf(BinaryExpr.class); + final BinaryExpr bin = (BinaryExpr) expr; + assertThat(bin.op()).isEqualTo(BinaryOp.ADD); + assertThat(bin.left()).isInstanceOf(NumberLiteral.class); + assertThat(bin.right()).isInstanceOf(BinaryExpr.class); + assertThat(((BinaryExpr) bin.right()).op()).isEqualTo(BinaryOp.MUL); + } + + @Test + void testComparisonOperator() { + final PromQLExpr expr = new PromQLParser("cpu_usage > 80").parse(); + assertThat(expr).isInstanceOf(BinaryExpr.class); + final BinaryExpr bin = (BinaryExpr) expr; + assertThat(bin.op()).isEqualTo(BinaryOp.GT); + } + + @Test + void testUnaryNegation() { + final PromQLExpr expr = new PromQLParser("-cpu_usage").parse(); + assertThat(expr).isInstanceOf(UnaryExpr.class); + final UnaryExpr un = (UnaryExpr) expr; + assertThat(un.op()).isEqualTo('-'); + assertThat(un.expr()).isInstanceOf(VectorSelector.class); + } + + @Test + void testTopk() { + final PromQLExpr expr = new PromQLParser("topk(5, http_requests)").parse(); + assertThat(expr).isInstanceOf(AggregationExpr.class); + final AggregationExpr agg = (AggregationExpr) expr; + assertThat(agg.op()).isEqualTo(AggOp.TOPK); + assertThat(agg.param()).isInstanceOf(NumberLiteral.class); + assertThat(((NumberLiteral) agg.param()).value()).isEqualTo(5.0); + } + + @Test + void testStringLiteral() { + final PromQLExpr expr = new PromQLParser("\"hello world\"").parse(); + assertThat(expr).isInstanceOf(StringLiteral.class); + assertThat(((StringLiteral) expr).value()).isEqualTo("hello world"); + } + + @Test + void testNumberLiteral() { + final PromQLExpr expr = new PromQLParser("42.5").parse(); + assertThat(expr).isInstanceOf(NumberLiteral.class); + assertThat(((NumberLiteral) expr).value()).isEqualTo(42.5); + } + + @Test + void testParenthesizedExpression() { + final PromQLExpr expr = new PromQLParser("(cpu_usage + mem_usage) / 2").parse(); + assertThat(expr).isInstanceOf(BinaryExpr.class); + final BinaryExpr bin = (BinaryExpr) expr; + assertThat(bin.op()).isEqualTo(BinaryOp.DIV); + assertThat(bin.left()).isInstanceOf(BinaryExpr.class); + } + + @Test + void testMalformedExpression() { + assertThatThrownBy(() -> new PromQLParser("sum(").parse()) + .isInstanceOf(IllegalArgumentException.class); + } + + @Test + void testEmptyExpression() { + assertThatThrownBy(() -> new PromQLParser("").parse()) + .isInstanceOf(IllegalArgumentException.class); + } + + @Test + void testInvalidDuration() { + assertThatThrownBy(() -> PromQLParser.parseDuration("5")) + .isInstanceOf(IllegalArgumentException.class); + } + + @Test + void testMultipleFunctionArgs() { + final PromQLExpr expr = new PromQLParser("round(cpu_usage, 0.5)").parse(); + assertThat(expr).isInstanceOf(FunctionCallExpr.class); + final FunctionCallExpr fn = (FunctionCallExpr) expr; + assertThat(fn.name()).isEqualTo("round"); + assertThat(fn.args()).hasSize(2); + } + + @Test + void testSelectorWithMatchersAndRange() { + final PromQLExpr expr = new PromQLParser("http_requests{job=\"api\"}[5m]").parse(); + assertThat(expr).isInstanceOf(MatrixSelector.class); + final MatrixSelector ms = (MatrixSelector) expr; + assertThat(ms.selector().metricName()).isEqualTo("http_requests"); + assertThat(ms.selector().matchers()).hasSize(1); + assertThat(ms.rangeMs()).isEqualTo(300_000); + } +} diff --git a/engine/src/test/java/com/arcadedb/engine/timeseries/simd/TimeSeriesVectorOpsTest.java b/engine/src/test/java/com/arcadedb/engine/timeseries/simd/TimeSeriesVectorOpsTest.java new file mode 100644 index 0000000000..0a469a03df --- /dev/null +++ b/engine/src/test/java/com/arcadedb/engine/timeseries/simd/TimeSeriesVectorOpsTest.java @@ -0,0 +1,214 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.engine.timeseries.simd; + +import org.junit.jupiter.api.Test; +import org.junit.jupiter.params.ParameterizedTest; +import org.junit.jupiter.params.provider.MethodSource; + +import java.util.Random; +import java.util.stream.Stream; + +import static org.assertj.core.api.Assertions.assertThat; +import static org.assertj.core.api.Assertions.within; + +/** + * Tests both scalar and SIMD implementations produce identical results. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +class TimeSeriesVectorOpsTest { + + static Stream implementations() { + return Stream.of(new ScalarTimeSeriesVectorOps(), new SimdTimeSeriesVectorOps()); + } + + @ParameterizedTest + @MethodSource("implementations") + void testSumDouble(final TimeSeriesVectorOps ops) { + final double[] data = { 1.0, 2.0, 3.0, 4.0, 5.0 }; + assertThat(ops.sum(data, 0, 5)).isCloseTo(15.0, within(1e-10)); + assertThat(ops.sum(data, 1, 3)).isCloseTo(9.0, within(1e-10)); + } + + @ParameterizedTest + @MethodSource("implementations") + void testMinMaxDouble(final TimeSeriesVectorOps ops) { + final double[] data = { 5.0, 1.0, 3.0, -2.0, 4.0, 0.0, 7.0 }; + assertThat(ops.min(data, 0, 7)).isEqualTo(-2.0); + assertThat(ops.max(data, 0, 7)).isEqualTo(7.0); + assertThat(ops.min(data, 2, 3)).isEqualTo(-2.0); + assertThat(ops.max(data, 2, 3)).isEqualTo(4.0); + } + + @ParameterizedTest + @MethodSource("implementations") + void testSumLong(final TimeSeriesVectorOps ops) { + final long[] data = { 10, 20, 30, 40, 50 }; + assertThat(ops.sumLong(data, 0, 5)).isEqualTo(150); + assertThat(ops.sumLong(data, 2, 2)).isEqualTo(70); + } + + @ParameterizedTest + @MethodSource("implementations") + void testMinMaxLong(final TimeSeriesVectorOps ops) { + final long[] data = { 50, 10, 30, -20, 40, 0, 70 }; + assertThat(ops.minLong(data, 0, 7)).isEqualTo(-20); + assertThat(ops.maxLong(data, 0, 7)).isEqualTo(70); + } + + @ParameterizedTest + @MethodSource("implementations") + void testSingleElement(final TimeSeriesVectorOps ops) { + final double[] data = { 42.0 }; + assertThat(ops.sum(data, 0, 1)).isEqualTo(42.0); + assertThat(ops.min(data, 0, 1)).isEqualTo(42.0); + assertThat(ops.max(data, 0, 1)).isEqualTo(42.0); + } + + @ParameterizedTest + @MethodSource("implementations") + void testNonAlignedLength(final TimeSeriesVectorOps ops) { + // Length not a multiple of SIMD lane width + final double[] data = new double[17]; + for (int i = 0; i < data.length; i++) + data[i] = i + 1; + + assertThat(ops.sum(data, 0, 17)).isCloseTo(153.0, within(1e-10)); + assertThat(ops.min(data, 0, 17)).isEqualTo(1.0); + assertThat(ops.max(data, 0, 17)).isEqualTo(17.0); + } + + @ParameterizedTest + @MethodSource("implementations") + void testFilteredSum(final TimeSeriesVectorOps ops) { + final double[] data = { 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0 }; + // Bitmask: bits 0,2,4,6 set → select 1.0, 3.0, 5.0, 7.0 + final long[] bitmask = { 0b01010101L }; + assertThat(ops.sumFiltered(data, bitmask, 0, 8)).isCloseTo(16.0, within(1e-10)); + } + + @ParameterizedTest + @MethodSource("implementations") + void testCountFiltered(final TimeSeriesVectorOps ops) { + final long[] bitmask = { 0b01010101L }; + assertThat(ops.countFiltered(bitmask, 0, 8)).isEqualTo(4); + } + + @ParameterizedTest + @MethodSource("implementations") + void testGreaterThan(final TimeSeriesVectorOps ops) { + final double[] data = { 1.0, 5.0, 2.0, 8.0, 3.0, 6.0, 0.5, 4.0 }; + final long[] out = new long[1]; + ops.greaterThan(data, 3.0, out, 0, 8); + + // Elements > 3.0 at indices 1,3,5,7 → bits 1,3,5,7 + assertThat(out[0] & (1L << 1)).isNotZero(); + assertThat(out[0] & (1L << 3)).isNotZero(); + assertThat(out[0] & (1L << 5)).isNotZero(); + assertThat(out[0] & (1L << 7)).isNotZero(); + assertThat(out[0] & (1L << 0)).isZero(); + assertThat(out[0] & (1L << 2)).isZero(); + } + + @ParameterizedTest + @MethodSource("implementations") + void testBitmaskAndOr(final TimeSeriesVectorOps ops) { + final long[] a = { 0b1100L }; + final long[] b = { 0b1010L }; + final long[] andOut = new long[1]; + final long[] orOut = new long[1]; + + ops.bitmaskAnd(a, b, andOut, 1); + ops.bitmaskOr(a, b, orOut, 1); + + assertThat(andOut[0]).isEqualTo(0b1000L); + assertThat(orOut[0]).isEqualTo(0b1110L); + } + + @Test + void testScalarAndSimdProduceIdenticalResults() { + final ScalarTimeSeriesVectorOps scalar = new ScalarTimeSeriesVectorOps(); + final SimdTimeSeriesVectorOps simd = new SimdTimeSeriesVectorOps(); + + final Random rng = new Random(42); + final int size = 1000; + final double[] dblData = new double[size]; + final long[] longData = new long[size]; + for (int i = 0; i < size; i++) { + dblData[i] = rng.nextDouble() * 1000 - 500; + longData[i] = rng.nextLong(); + } + + // Basic aggregations + assertThat(simd.sum(dblData, 0, size)).isCloseTo(scalar.sum(dblData, 0, size), within(1e-6)); + assertThat(simd.min(dblData, 0, size)).isEqualTo(scalar.min(dblData, 0, size)); + assertThat(simd.max(dblData, 0, size)).isEqualTo(scalar.max(dblData, 0, size)); + assertThat(simd.sumLong(longData, 0, size)).isEqualTo(scalar.sumLong(longData, 0, size)); + assertThat(simd.minLong(longData, 0, size)).isEqualTo(scalar.minLong(longData, 0, size)); + assertThat(simd.maxLong(longData, 0, size)).isEqualTo(scalar.maxLong(longData, 0, size)); + + // With offset + assertThat(simd.sum(dblData, 100, 500)).isCloseTo(scalar.sum(dblData, 100, 500), within(1e-6)); + assertThat(simd.min(dblData, 100, 500)).isEqualTo(scalar.min(dblData, 100, 500)); + assertThat(simd.max(dblData, 100, 500)).isEqualTo(scalar.max(dblData, 100, 500)); + assertThat(simd.sumLong(longData, 100, 500)).isEqualTo(scalar.sumLong(longData, 100, 500)); + assertThat(simd.minLong(longData, 100, 500)).isEqualTo(scalar.minLong(longData, 100, 500)); + assertThat(simd.maxLong(longData, 100, 500)).isEqualTo(scalar.maxLong(longData, 100, 500)); + + // greaterThan parity + final int bitmaskWords = (size + 63) / 64; + final long[] scalarGt = new long[bitmaskWords]; + final long[] simdGt = new long[bitmaskWords]; + scalar.greaterThan(dblData, 0.0, scalarGt, 0, size); + simd.greaterThan(dblData, 0.0, simdGt, 0, size); + assertThat(simdGt).isEqualTo(scalarGt); + + // sumFiltered parity (using the greaterThan output as bitmask) + assertThat(simd.sumFiltered(dblData, scalarGt, 0, size)) + .isCloseTo(scalar.sumFiltered(dblData, scalarGt, 0, size), within(1e-6)); + + // countFiltered parity + assertThat(simd.countFiltered(scalarGt, 0, size)).isEqualTo(scalar.countFiltered(scalarGt, 0, size)); + + // bitmaskAnd / bitmaskOr parity + final long[] secondMask = new long[bitmaskWords]; + scalar.greaterThan(dblData, -100.0, secondMask, 0, size); + + final long[] scalarAnd = new long[bitmaskWords]; + final long[] simdAnd = new long[bitmaskWords]; + scalar.bitmaskAnd(scalarGt, secondMask, scalarAnd, bitmaskWords); + simd.bitmaskAnd(scalarGt, secondMask, simdAnd, bitmaskWords); + assertThat(simdAnd).isEqualTo(scalarAnd); + + final long[] scalarOr = new long[bitmaskWords]; + final long[] simdOr = new long[bitmaskWords]; + scalar.bitmaskOr(scalarGt, secondMask, scalarOr, bitmaskWords); + simd.bitmaskOr(scalarGt, secondMask, simdOr, bitmaskWords); + assertThat(simdOr).isEqualTo(scalarOr); + } + + @Test + void testProviderReturnsInstance() { + final TimeSeriesVectorOps ops = TimeSeriesVectorOpsProvider.getInstance(); + assertThat(ops).isNotNull(); + // Smoke test + assertThat(ops.sum(new double[] { 1.0, 2.0, 3.0 }, 0, 3)).isCloseTo(6.0, within(1e-10)); + } +} diff --git a/engine/src/test/java/com/arcadedb/schema/BuildFilteredQueryTest.java b/engine/src/test/java/com/arcadedb/schema/BuildFilteredQueryTest.java new file mode 100644 index 0000000000..e55fa258e1 --- /dev/null +++ b/engine/src/test/java/com/arcadedb/schema/BuildFilteredQueryTest.java @@ -0,0 +1,155 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.schema; + +import org.junit.jupiter.api.Test; + +import static org.assertj.core.api.Assertions.assertThat; +import static org.assertj.core.api.Assertions.assertThatThrownBy; + +/** + * Unit tests for {@link ContinuousAggregateRefresher#buildFilteredQuery}. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +class BuildFilteredQueryTest { + + @Test + void testWithGroupBy() { + final ContinuousAggregateImpl ca = buildCA( + "SELECT sensor_id, avg(temp) FROM SensorReading GROUP BY sensor_id"); + final String result = ContinuousAggregateRefresher.buildFilteredQuery(ca, 1000); + assertThat(result).isEqualTo( + "SELECT sensor_id, avg(temp) FROM SensorReading WHERE `ts` >= 1000 GROUP BY sensor_id"); + } + + @Test + void testWithOrderByNoGroupBy() { + final ContinuousAggregateImpl ca = buildCA( + "SELECT sensor_id, temp FROM SensorReading ORDER BY sensor_id"); + final String result = ContinuousAggregateRefresher.buildFilteredQuery(ca, 1000); + assertThat(result).isEqualTo( + "SELECT sensor_id, temp FROM SensorReading WHERE `ts` >= 1000 ORDER BY sensor_id"); + } + + @Test + void testWithOrderByAndGroupBy() { + final ContinuousAggregateImpl ca = buildCA( + "SELECT sensor_id, avg(temp) FROM SensorReading GROUP BY sensor_id ORDER BY sensor_id"); + final String result = ContinuousAggregateRefresher.buildFilteredQuery(ca, 1000); + // WHERE should be inserted before GROUP BY + assertThat(result).isEqualTo( + "SELECT sensor_id, avg(temp) FROM SensorReading WHERE `ts` >= 1000 GROUP BY sensor_id ORDER BY sensor_id"); + } + + @Test + void testWithExistingWhere() { + final ContinuousAggregateImpl ca = buildCA( + "SELECT sensor_id, avg(temp) FROM SensorReading WHERE active = true GROUP BY sensor_id"); + final String result = ContinuousAggregateRefresher.buildFilteredQuery(ca, 1000); + assertThat(result).isEqualTo( + "SELECT sensor_id, avg(temp) FROM SensorReading WHERE `ts` >= 1000 AND active = true GROUP BY sensor_id"); + } + + @Test + void testNoKeywordsAppendsAtEnd() { + final ContinuousAggregateImpl ca = buildCA( + "SELECT sensor_id, avg(temp) FROM SensorReading"); + final String result = ContinuousAggregateRefresher.buildFilteredQuery(ca, 1000); + assertThat(result).isEqualTo( + "SELECT sensor_id, avg(temp) FROM SensorReading WHERE `ts` >= 1000"); + } + + @Test + void testWithLimitNoGroupByNoOrderBy() { + final ContinuousAggregateImpl ca = buildCA( + "SELECT sensor_id, temp FROM SensorReading LIMIT 100"); + final String result = ContinuousAggregateRefresher.buildFilteredQuery(ca, 1000); + assertThat(result).isEqualTo( + "SELECT sensor_id, temp FROM SensorReading WHERE `ts` >= 1000 LIMIT 100"); + } + + @Test + void testWhereConditionStartsWithParenthesis() { + // Regression: WHERE(condition) without a space after WHERE caused "AND(condition)" — missing space + final ContinuousAggregateImpl ca = buildCA( + "SELECT sensor_id, avg(temp) FROM SensorReading WHERE(active = true) GROUP BY sensor_id"); + final String result = ContinuousAggregateRefresher.buildFilteredQuery(ca, 1000); + assertThat(result).isEqualTo( + "SELECT sensor_id, avg(temp) FROM SensorReading WHERE `ts` >= 1000 AND (active = true) GROUP BY sensor_id"); + } + + @Test + void testWatermarkZeroReturnsOriginal() { + final ContinuousAggregateImpl ca = buildCA( + "SELECT sensor_id FROM SensorReading ORDER BY sensor_id"); + final String result = ContinuousAggregateRefresher.buildFilteredQuery(ca, 0); + assertThat(result).isEqualTo("SELECT sensor_id FROM SensorReading ORDER BY sensor_id"); + } + + @Test + void testBlockCommentContainingWhereIsIgnored() { + // Regression: block comment containing WHERE must not be matched as the top-level WHERE + final ContinuousAggregateImpl ca = buildCA( + "SELECT sensor_id, avg(temp) /* WHERE not here */ FROM SensorReading GROUP BY sensor_id"); + final String result = ContinuousAggregateRefresher.buildFilteredQuery(ca, 1000); + assertThat(result).isEqualTo( + "SELECT sensor_id, avg(temp) /* WHERE not here */ FROM SensorReading WHERE `ts` >= 1000 GROUP BY sensor_id"); + } + + @Test + void testLineCommentContainingWhereIsIgnored() { + // Regression: line comment containing WHERE must not be matched as the top-level WHERE + final ContinuousAggregateImpl ca = buildCA( + "SELECT sensor_id, avg(temp) FROM SensorReading -- no WHERE needed\nGROUP BY sensor_id"); + final String result = ContinuousAggregateRefresher.buildFilteredQuery(ca, 1000); + assertThat(result).isEqualTo( + "SELECT sensor_id, avg(temp) FROM SensorReading -- no WHERE needed\nWHERE `ts` >= 1000 GROUP BY sensor_id"); + } + + @Test + void testLineCommentWithWhereKeywordIsNotMatched() { + // A -- comment containing WHERE should not be treated as a top-level WHERE clause + final ContinuousAggregateImpl ca = buildCA( + "SELECT avg(temp) FROM SensorReading -- WHERE clause not needed\nGROUP BY sensor_id"); + final String result = ContinuousAggregateRefresher.buildFilteredQuery(ca, 1000); + // Should insert before GROUP BY, not after comment's WHERE + assertThat(result).isEqualTo( + "SELECT avg(temp) FROM SensorReading -- WHERE clause not needed\nWHERE `ts` >= 1000 GROUP BY sensor_id"); + } + + @Test + void testDotInTimestampColumnIsRejected() { + // Regression: SAFE_COLUMN_NAME must not allow dots in column names (could allow injection) + final ContinuousAggregateImpl ca = new ContinuousAggregateImpl(null, "test_ca", + "SELECT avg(temp) FROM SensorReading GROUP BY sensor_id", + "test_backing", + "SensorReading", 3_600_000L, "hour", + "outer.inner"); // dot in timestamp column name + + assertThatThrownBy(() -> ContinuousAggregateRefresher.buildFilteredQuery(ca, 1000)) + .isInstanceOf(IllegalArgumentException.class) + .hasMessageContaining("Unsafe timestamp column name"); + } + + private static ContinuousAggregateImpl buildCA(final String query) { + return new ContinuousAggregateImpl(null, "test_ca", query, + "test_backing", "SensorReading", 3_600_000L, "hour", "ts"); + } +} diff --git a/network/src/main/java/com/arcadedb/remote/RemoteSchema.java b/network/src/main/java/com/arcadedb/remote/RemoteSchema.java index 4f234d9140..299f2ae07a 100644 --- a/network/src/main/java/com/arcadedb/remote/RemoteSchema.java +++ b/network/src/main/java/com/arcadedb/remote/RemoteSchema.java @@ -169,6 +169,36 @@ public MaterializedViewBuilder buildMaterializedView() { "buildMaterializedView() is not supported remotely. Use SQL CREATE MATERIALIZED VIEW instead."); } + @Override + public boolean existsContinuousAggregate(final String name) { + final ResultSet result = remoteDatabase.command("sql", + "SELECT FROM schema:continuousaggregates WHERE name = :name", Map.of("name", name)); + return result.hasNext(); + } + + @Override + public ContinuousAggregate getContinuousAggregate(final String name) { + throw new UnsupportedOperationException( + "getContinuousAggregate() is not supported remotely. Use SQL SELECT FROM schema:continuousaggregates instead."); + } + + @Override + public ContinuousAggregate[] getContinuousAggregates() { + throw new UnsupportedOperationException( + "getContinuousAggregates() is not supported remotely. Use SQL SELECT FROM schema:continuousaggregates instead."); + } + + @Override + public void dropContinuousAggregate(final String name) { + remoteDatabase.command("sql", "DROP CONTINUOUS AGGREGATE `" + name + "`"); + } + + @Override + public ContinuousAggregateBuilder buildContinuousAggregate() { + throw new UnsupportedOperationException( + "buildContinuousAggregate() is not supported remotely. Use SQL CREATE CONTINUOUS AGGREGATE instead."); + } + @Override public Bucket createBucket(final String bucketName) { final ResultSet result = remoteDatabase.command("sql", "create bucket `" + bucketName + "`"); @@ -353,6 +383,11 @@ public TypeBuilder buildEdgeType() { throw new UnsupportedOperationException(); } + @Override + public TimeSeriesTypeBuilder buildTimeSeriesType() { + throw new UnsupportedOperationException(); + } + @Deprecated @Override public DocumentType createDocumentType(String typeName, List buckets) { diff --git a/pom.xml b/pom.xml index f453adf8fd..fc2d06f1e9 100644 --- a/pom.xml +++ b/pom.xml @@ -174,6 +174,10 @@ --add-exports java.management/sun.management=ALL-UNNAMED + + --add-modules + jdk.incubator.vector @@ -215,7 +219,7 @@ --add-exports java.management/sun.management=ALL-UNNAMED --add-opens java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens java.base/java.nio.channels.spi=ALL-UNNAMED - --add-modules jdk.incubator.vector + --add-modules jdk.incubator.vector ${skipTests} diff --git a/server/pom.xml b/server/pom.xml index 34a01f1193..a97be12652 100644 --- a/server/pom.xml +++ b/server/pom.xml @@ -20,112 +20,118 @@ - 4.0.0 + 4.0.0 - - com.arcadedb - arcadedb-parent - 26.3.1-SNAPSHOT - ../pom.xml - + + com.arcadedb + arcadedb-parent + 26.3.1-SNAPSHOT + ../pom.xml + - arcadedb-server - jar - ArcadeDB Server + arcadedb-server + jar + ArcadeDB Server - - 4.2.19 - 1.16.3 - 2.2.43 - 2.1.38 - + + 4.2.19 + 1.16.3 + 2.2.43 + 2.1.38 + 1.1.10.8 + - - - noServerTest - - true - - - + + + noServerTest + + true + + + - - - - - org.apache.maven.plugins - maven-jar-plugin - - - default-test-jar - none - - - - - + + + + + org.apache.maven.plugins + maven-jar-plugin + + + default-test-jar + none + + + + + - - - com.arcadedb - arcadedb-engine - ${project.parent.version} - compile - - - com.arcadedb - arcadedb-network - ${project.parent.version} - compile - - - io.undertow - undertow-core - ${undertow-core.version} - - - org.slf4j - slf4j-api - ${slf4j.version} - compile - - - org.slf4j - slf4j-jdk14 - ${slf4j.version} - - - io.micrometer - micrometer-core - ${micrometer.version} - - - - io.swagger.core.v3 - swagger-core - ${swagger.version} - - - io.swagger.core.v3 - swagger-annotations - ${swagger.version} - - - io.swagger.core.v3 - swagger-models - ${swagger.version} - - - io.swagger.parser.v3 - swagger-parser - ${swagger-parser.version} - test - - - com.arcadedb - arcadedb-integration - ${project.parent.version} - test - - + + + com.arcadedb + arcadedb-engine + ${project.parent.version} + compile + + + com.arcadedb + arcadedb-network + ${project.parent.version} + compile + + + io.undertow + undertow-core + ${undertow-core.version} + + + org.slf4j + slf4j-api + ${slf4j.version} + compile + + + org.slf4j + slf4j-jdk14 + ${slf4j.version} + + + io.micrometer + micrometer-core + ${micrometer.version} + + + + io.swagger.core.v3 + swagger-core + ${swagger.version} + + + io.swagger.core.v3 + swagger-annotations + ${swagger.version} + + + io.swagger.core.v3 + swagger-models + ${swagger.version} + + + io.swagger.parser.v3 + swagger-parser + ${swagger-parser.version} + test + + + org.xerial.snappy + snappy-java + ${snappy.version} + + + com.arcadedb + arcadedb-integration + ${project.parent.version} + test + + diff --git a/server/src/main/java/com/arcadedb/server/http/HttpServer.java b/server/src/main/java/com/arcadedb/server/http/HttpServer.java index 3ce5dd5cae..57e674c2e3 100644 --- a/server/src/main/java/com/arcadedb/server/http/HttpServer.java +++ b/server/src/main/java/com/arcadedb/server/http/HttpServer.java @@ -52,6 +52,19 @@ import com.arcadedb.server.http.handler.PostQueryHandler; import com.arcadedb.server.http.handler.PostRollbackHandler; import com.arcadedb.server.http.handler.PostServerCommandHandler; +import com.arcadedb.server.http.handler.PostTimeSeriesQueryHandler; +import com.arcadedb.server.http.handler.PostTimeSeriesWriteHandler; +import com.arcadedb.server.http.handler.GetTimeSeriesLatestHandler; +import com.arcadedb.server.http.handler.GetPromQLQueryHandler; +import com.arcadedb.server.http.handler.GetPromQLQueryRangeHandler; +import com.arcadedb.server.http.handler.GetPromQLLabelsHandler; +import com.arcadedb.server.http.handler.GetPromQLLabelValuesHandler; +import com.arcadedb.server.http.handler.GetPromQLSeriesHandler; +import com.arcadedb.server.http.handler.GetGrafanaHealthHandler; +import com.arcadedb.server.http.handler.GetGrafanaMetadataHandler; +import com.arcadedb.server.http.handler.PostGrafanaQueryHandler; +import com.arcadedb.server.http.handler.PostPrometheusWriteHandler; +import com.arcadedb.server.http.handler.PostPrometheusReadHandler; import com.arcadedb.server.http.ssl.SslUtils; import com.arcadedb.server.http.ssl.TlsProtocol; import com.arcadedb.server.http.ws.WebSocketConnectionHandler; @@ -194,6 +207,19 @@ private PathHandler setupRoutes() { .get("/server/groups", new GetGroupsHandler(this)) .post("/server/groups", new PostGroupHandler(this)) .delete("/server/groups", new DeleteGroupHandler(this)) + .post("/ts/{database}/write", new PostTimeSeriesWriteHandler(this)) + .post("/ts/{database}/query", new PostTimeSeriesQueryHandler(this)) + .get("/ts/{database}/latest", new GetTimeSeriesLatestHandler(this)) + .get("/ts/{database}/grafana/health", new GetGrafanaHealthHandler(this)) + .get("/ts/{database}/grafana/metadata", new GetGrafanaMetadataHandler(this)) + .post("/ts/{database}/grafana/query", new PostGrafanaQueryHandler(this)) + .post("/ts/{database}/prom/write", new PostPrometheusWriteHandler(this)) + .post("/ts/{database}/prom/read", new PostPrometheusReadHandler(this)) + .get("/ts/{database}/prom/api/v1/query", new GetPromQLQueryHandler(this)) + .get("/ts/{database}/prom/api/v1/query_range", new GetPromQLQueryRangeHandler(this)) + .get("/ts/{database}/prom/api/v1/labels", new GetPromQLLabelsHandler(this)) + .get("/ts/{database}/prom/api/v1/label/{name}/values", new GetPromQLLabelValuesHandler(this)) + .get("/ts/{database}/prom/api/v1/series", new GetPromQLSeriesHandler(this)) ); // MCP routes are always registered; the handler checks isEnabled() at request time to support runtime toggling diff --git a/server/src/main/java/com/arcadedb/server/http/handler/AbstractBinaryHttpHandler.java b/server/src/main/java/com/arcadedb/server/http/handler/AbstractBinaryHttpHandler.java new file mode 100644 index 0000000000..f079216cb5 --- /dev/null +++ b/server/src/main/java/com/arcadedb/server/http/handler/AbstractBinaryHttpHandler.java @@ -0,0 +1,69 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.server.http.handler; + +import com.arcadedb.log.LogManager; +import com.arcadedb.server.http.HttpServer; +import io.undertow.server.HttpServerExchange; +import io.undertow.util.StatusCodes; + +import java.util.concurrent.atomic.AtomicReference; +import java.util.logging.Level; + +/** + * Base handler for endpoints that receive binary (non-JSON) request bodies. + * Captures raw bytes from the request instead of interpreting them as a string. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public abstract class AbstractBinaryHttpHandler extends AbstractServerHttpHandler { + + protected byte[] rawBytes; + + public AbstractBinaryHttpHandler(final HttpServer httpServer) { + super(httpServer); + } + + @Override + protected boolean mustExecuteOnWorkerThread() { + return true; + } + + @Override + protected boolean requiresJsonPayload() { + return false; + } + + @Override + protected String parseRequestPayload(final HttpServerExchange e) { + if (!e.isInIoThread() && !e.isBlocking()) + e.startBlocking(); + + final AtomicReference result = new AtomicReference<>(); + e.getRequestReceiver().receiveFullBytes( + (exchange, data) -> result.set(data), + (exchange, err) -> { + LogManager.instance().log(this, Level.SEVERE, "receiveFullBytes completed with an error: %s", err, err.getMessage()); + exchange.setStatusCode(StatusCodes.INTERNAL_SERVER_ERROR); + exchange.getResponseSender().send("Invalid Request"); + }); + rawBytes = result.get(); + return null; // no string payload needed + } +} diff --git a/server/src/main/java/com/arcadedb/server/http/handler/AbstractServerHttpHandler.java b/server/src/main/java/com/arcadedb/server/http/handler/AbstractServerHttpHandler.java index f41524d8f0..983810c9c8 100644 --- a/server/src/main/java/com/arcadedb/server/http/handler/AbstractServerHttpHandler.java +++ b/server/src/main/java/com/arcadedb/server/http/handler/AbstractServerHttpHandler.java @@ -153,7 +153,7 @@ public void handleRequest(final HttpServerExchange exchange) { JSONObject payload = null; if (mustExecuteOnWorkerThread()) { final String payloadAsString = parseRequestPayload(exchange); - if (payloadAsString != null && !payloadAsString.isBlank()) + if (requiresJsonPayload() && payloadAsString != null && !payloadAsString.isBlank()) try { payload = new JSONObject(payloadAsString.trim()); } catch (Exception e) { @@ -297,6 +297,10 @@ protected boolean mustExecuteOnWorkerThread() { return false; } + protected boolean requiresJsonPayload() { + return true; + } + protected String encodeError(final String message) { return message.replace("\\\\", " ").replace('\n', ' '); } diff --git a/server/src/main/java/com/arcadedb/server/http/handler/GetGrafanaHealthHandler.java b/server/src/main/java/com/arcadedb/server/http/handler/GetGrafanaHealthHandler.java new file mode 100644 index 0000000000..82875d1e0f --- /dev/null +++ b/server/src/main/java/com/arcadedb/server/http/handler/GetGrafanaHealthHandler.java @@ -0,0 +1,57 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.server.http.handler; + +import com.arcadedb.serializer.json.JSONObject; +import com.arcadedb.server.http.HttpServer; +import com.arcadedb.server.security.ServerSecurityUser; +import io.undertow.server.HttpServerExchange; + +import java.util.Deque; + +/** + * Grafana health-check endpoint. + * Endpoint: GET /api/v1/ts/{database}/grafana/health + */ +public class GetGrafanaHealthHandler extends AbstractServerHttpHandler { + + public GetGrafanaHealthHandler(final HttpServer httpServer) { + super(httpServer); + } + + @Override + protected ExecutionResponse execute(final HttpServerExchange exchange, final ServerSecurityUser user, + final JSONObject payload) throws Exception { + + final Deque databaseParam = exchange.getQueryParameters().get("database"); + if (databaseParam == null || databaseParam.isEmpty()) + return new ExecutionResponse(400, "{ \"error\" : \"Database parameter is required\"}"); + + final String databaseName = databaseParam.getFirst(); + + // Verify the database exists (will throw if not) + httpServer.getServer().getDatabase(databaseName, false, false); + + final JSONObject result = new JSONObject(); + result.put("status", "ok"); + result.put("database", databaseName); + + return new ExecutionResponse(200, result.toString()); + } +} diff --git a/server/src/main/java/com/arcadedb/server/http/handler/GetGrafanaMetadataHandler.java b/server/src/main/java/com/arcadedb/server/http/handler/GetGrafanaMetadataHandler.java new file mode 100644 index 0000000000..95c299df65 --- /dev/null +++ b/server/src/main/java/com/arcadedb/server/http/handler/GetGrafanaMetadataHandler.java @@ -0,0 +1,95 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.server.http.handler; + +import com.arcadedb.database.DatabaseInternal; +import com.arcadedb.engine.timeseries.AggregationType; +import com.arcadedb.engine.timeseries.ColumnDefinition; +import com.arcadedb.schema.DocumentType; +import com.arcadedb.schema.LocalTimeSeriesType; +import com.arcadedb.serializer.json.JSONArray; +import com.arcadedb.serializer.json.JSONObject; +import com.arcadedb.server.http.HttpServer; +import com.arcadedb.server.security.ServerSecurityUser; +import io.undertow.server.HttpServerExchange; + +import java.util.Deque; + +/** + * Grafana metadata endpoint — discovers TimeSeries types, fields, and tags. + * Endpoint: GET /api/v1/ts/{database}/grafana/metadata + */ +public class GetGrafanaMetadataHandler extends AbstractServerHttpHandler { + + public GetGrafanaMetadataHandler(final HttpServer httpServer) { + super(httpServer); + } + + @Override + protected ExecutionResponse execute(final HttpServerExchange exchange, final ServerSecurityUser user, + final JSONObject payload) throws Exception { + + final Deque databaseParam = exchange.getQueryParameters().get("database"); + if (databaseParam == null || databaseParam.isEmpty()) + return new ExecutionResponse(400, "{ \"error\" : \"Database parameter is required\"}"); + + final DatabaseInternal database = httpServer.getServer().getDatabase(databaseParam.getFirst(), false, false); + + final JSONArray typesArray = new JSONArray(); + + for (final DocumentType docType : database.getSchema().getTypes()) { + if (!(docType instanceof LocalTimeSeriesType tsType) || tsType.getEngine() == null) + continue; + + final JSONObject typeObj = new JSONObject(); + typeObj.put("name", tsType.getName()); + + final JSONArray fieldsArray = new JSONArray(); + final JSONArray tagsArray = new JSONArray(); + + for (final ColumnDefinition col : tsType.getTsColumns()) { + if (col.getRole() == ColumnDefinition.ColumnRole.TIMESTAMP) + continue; + + final JSONObject colObj = new JSONObject(); + colObj.put("name", col.getName()); + colObj.put("dataType", col.getDataType().name()); + + if (col.getRole() == ColumnDefinition.ColumnRole.TAG) + tagsArray.put(colObj); + else + fieldsArray.put(colObj); + } + + typeObj.put("fields", fieldsArray); + typeObj.put("tags", tagsArray); + typesArray.put(typeObj); + } + + final JSONArray aggTypes = new JSONArray(); + for (final AggregationType at : AggregationType.values()) + aggTypes.put(at.name()); + + final JSONObject result = new JSONObject(); + result.put("types", typesArray); + result.put("aggregationTypes", aggTypes); + + return new ExecutionResponse(200, result.toString()); + } +} diff --git a/server/src/main/java/com/arcadedb/server/http/handler/GetPromQLLabelValuesHandler.java b/server/src/main/java/com/arcadedb/server/http/handler/GetPromQLLabelValuesHandler.java new file mode 100644 index 0000000000..bab5a68d45 --- /dev/null +++ b/server/src/main/java/com/arcadedb/server/http/handler/GetPromQLLabelValuesHandler.java @@ -0,0 +1,101 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.server.http.handler; + +import com.arcadedb.database.DatabaseInternal; +import com.arcadedb.engine.timeseries.ColumnDefinition; +import com.arcadedb.engine.timeseries.TimeSeriesEngine; +import com.arcadedb.schema.DocumentType; +import com.arcadedb.schema.LocalTimeSeriesType; +import com.arcadedb.serializer.json.JSONObject; +import com.arcadedb.server.http.HttpServer; +import com.arcadedb.server.security.ServerSecurityUser; +import io.undertow.server.HttpServerExchange; + +import java.util.ArrayList; +import java.util.Collections; +import java.util.Deque; +import java.util.LinkedHashSet; +import java.util.List; +import java.util.Set; + +/** + * HTTP handler for listing PromQL label values. + * Endpoint: GET /api/v1/ts/{database}/prom/api/v1/label/{name}/values + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public class GetPromQLLabelValuesHandler extends AbstractServerHttpHandler { + + public GetPromQLLabelValuesHandler(final HttpServer httpServer) { + super(httpServer); + } + + @Override + protected ExecutionResponse execute(final HttpServerExchange exchange, final ServerSecurityUser user, + final JSONObject payload) throws Exception { + + final Deque databaseParam = exchange.getQueryParameters().get("database"); + if (databaseParam == null || databaseParam.isEmpty()) + return new ExecutionResponse(400, PromQLResponseFormatter.formatError("bad_data", "Database parameter is required")); + + final Deque nameParam = exchange.getQueryParameters().get("name"); + if (nameParam == null || nameParam.isEmpty()) + return new ExecutionResponse(400, PromQLResponseFormatter.formatError("bad_data", "Label name parameter is required")); + + final String labelName = nameParam.getFirst(); + final DatabaseInternal database = httpServer.getServer().getDatabase(databaseParam.getFirst(), false, false); + + final Set values = new LinkedHashSet<>(); + + if ("__name__".equals(labelName)) { + // Return all TimeSeries type names + for (final DocumentType type : database.getSchema().getTypes()) + if (type instanceof LocalTimeSeriesType tsType && tsType.getEngine() != null) + values.add(type.getName()); + } else { + // Scan types that have this TAG column, query distinct values + for (final DocumentType type : database.getSchema().getTypes()) { + if (!(type instanceof LocalTimeSeriesType tsType) || tsType.getEngine() == null) + continue; + final List columns = tsType.getTsColumns(); + final int colIdx = findColumnIndex(labelName, columns); + if (colIdx < 0) + continue; + + final TimeSeriesEngine engine = tsType.getEngine(); + final List rows = engine.query(Long.MIN_VALUE, Long.MAX_VALUE, null, null); + for (final Object[] row : rows) { + if (colIdx < row.length && row[colIdx] != null) + values.add(row[colIdx].toString()); + } + } + } + + final List sorted = new ArrayList<>(values); + Collections.sort(sorted); + return new ExecutionResponse(200, PromQLResponseFormatter.formatLabelsResponse(sorted)); + } + + private int findColumnIndex(final String name, final List columns) { + for (int i = 0; i < columns.size(); i++) + if (columns.get(i).getRole() == ColumnDefinition.ColumnRole.TAG && columns.get(i).getName().equals(name)) + return i; + return -1; + } +} diff --git a/server/src/main/java/com/arcadedb/server/http/handler/GetPromQLLabelsHandler.java b/server/src/main/java/com/arcadedb/server/http/handler/GetPromQLLabelsHandler.java new file mode 100644 index 0000000000..70132e1378 --- /dev/null +++ b/server/src/main/java/com/arcadedb/server/http/handler/GetPromQLLabelsHandler.java @@ -0,0 +1,72 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.server.http.handler; + +import com.arcadedb.database.DatabaseInternal; +import com.arcadedb.engine.timeseries.ColumnDefinition; +import com.arcadedb.schema.DocumentType; +import com.arcadedb.schema.LocalTimeSeriesType; +import com.arcadedb.serializer.json.JSONObject; +import com.arcadedb.server.http.HttpServer; +import com.arcadedb.server.security.ServerSecurityUser; +import io.undertow.server.HttpServerExchange; + +import java.util.ArrayList; +import java.util.Collections; +import java.util.Deque; +import java.util.LinkedHashSet; +import java.util.List; +import java.util.Set; + +/** + * HTTP handler for listing PromQL label names. + * Endpoint: GET /api/v1/ts/{database}/prom/api/v1/labels + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public class GetPromQLLabelsHandler extends AbstractServerHttpHandler { + + public GetPromQLLabelsHandler(final HttpServer httpServer) { + super(httpServer); + } + + @Override + protected ExecutionResponse execute(final HttpServerExchange exchange, final ServerSecurityUser user, + final JSONObject payload) throws Exception { + + final Deque databaseParam = exchange.getQueryParameters().get("database"); + if (databaseParam == null || databaseParam.isEmpty()) + return new ExecutionResponse(400, PromQLResponseFormatter.formatError("bad_data", "Database parameter is required")); + + final DatabaseInternal database = httpServer.getServer().getDatabase(databaseParam.getFirst(), false, false); + final Set labelNames = new LinkedHashSet<>(); + labelNames.add("__name__"); + + for (final DocumentType type : database.getSchema().getTypes()) { + if (!(type instanceof LocalTimeSeriesType tsType) || tsType.getEngine() == null) + continue; + for (final ColumnDefinition col : tsType.getTsColumns()) + if (col.getRole() == ColumnDefinition.ColumnRole.TAG) + labelNames.add(col.getName()); + } + + final List sorted = new ArrayList<>(labelNames); + Collections.sort(sorted); + return new ExecutionResponse(200, PromQLResponseFormatter.formatLabelsResponse(sorted)); + } +} diff --git a/server/src/main/java/com/arcadedb/server/http/handler/GetPromQLQueryHandler.java b/server/src/main/java/com/arcadedb/server/http/handler/GetPromQLQueryHandler.java new file mode 100644 index 0000000000..3fe85939f1 --- /dev/null +++ b/server/src/main/java/com/arcadedb/server/http/handler/GetPromQLQueryHandler.java @@ -0,0 +1,77 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.server.http.handler; + +import com.arcadedb.database.DatabaseInternal; +import com.arcadedb.engine.timeseries.promql.PromQLEvaluator; +import com.arcadedb.engine.timeseries.promql.PromQLParser; +import com.arcadedb.engine.timeseries.promql.PromQLResult; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr; +import com.arcadedb.serializer.json.JSONObject; +import com.arcadedb.server.http.HttpServer; +import com.arcadedb.server.security.ServerSecurityUser; +import io.undertow.server.HttpServerExchange; + +import java.util.Deque; + +/** + * HTTP handler for PromQL instant queries. + * Endpoint: GET /api/v1/ts/{database}/prom/api/v1/query + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public class GetPromQLQueryHandler extends AbstractServerHttpHandler { + + public GetPromQLQueryHandler(final HttpServer httpServer) { + super(httpServer); + } + + @Override + protected ExecutionResponse execute(final HttpServerExchange exchange, final ServerSecurityUser user, + final JSONObject payload) throws Exception { + + final Deque databaseParam = exchange.getQueryParameters().get("database"); + if (databaseParam == null || databaseParam.isEmpty()) + return new ExecutionResponse(400, PromQLResponseFormatter.formatError("bad_data", "Database parameter is required")); + + final String query = getQueryParameter(exchange, "query"); + if (query == null || query.isBlank()) + return new ExecutionResponse(400, PromQLResponseFormatter.formatError("bad_data", "Missing required parameter: query")); + + final String timeStr = getQueryParameter(exchange, "time"); + final long evalTimeMs; + if (timeStr != null && !timeStr.isBlank()) + evalTimeMs = (long) (Double.parseDouble(timeStr) * 1000); + else + evalTimeMs = System.currentTimeMillis(); + + final DatabaseInternal database = httpServer.getServer().getDatabase(databaseParam.getFirst(), false, false); + + try { + final PromQLExpr expr = new PromQLParser(query).parse(); + final String lookbackStr = getQueryParameter(exchange, "lookback_delta"); + final PromQLEvaluator evaluator = lookbackStr != null && !lookbackStr.isBlank() + ? new PromQLEvaluator(database, PromQLParser.parseDuration(lookbackStr)) + : new PromQLEvaluator(database); + final PromQLResult result = evaluator.evaluateInstant(expr, evalTimeMs); + return new ExecutionResponse(200, PromQLResponseFormatter.formatSuccess(result)); + } catch (final IllegalArgumentException e) { + return new ExecutionResponse(400, PromQLResponseFormatter.formatError("bad_data", e.getMessage())); + } + } +} diff --git a/server/src/main/java/com/arcadedb/server/http/handler/GetPromQLQueryRangeHandler.java b/server/src/main/java/com/arcadedb/server/http/handler/GetPromQLQueryRangeHandler.java new file mode 100644 index 0000000000..b264882d82 --- /dev/null +++ b/server/src/main/java/com/arcadedb/server/http/handler/GetPromQLQueryRangeHandler.java @@ -0,0 +1,95 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.server.http.handler; + +import com.arcadedb.database.DatabaseInternal; +import com.arcadedb.engine.timeseries.promql.PromQLEvaluator; +import com.arcadedb.engine.timeseries.promql.PromQLParser; +import com.arcadedb.engine.timeseries.promql.PromQLResult; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr; +import com.arcadedb.serializer.json.JSONObject; +import com.arcadedb.server.http.HttpServer; +import com.arcadedb.server.security.ServerSecurityUser; +import io.undertow.server.HttpServerExchange; + +import java.util.Deque; + +/** + * HTTP handler for PromQL range queries. + * Endpoint: GET /api/v1/ts/{database}/prom/api/v1/query_range + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public class GetPromQLQueryRangeHandler extends AbstractServerHttpHandler { + + public GetPromQLQueryRangeHandler(final HttpServer httpServer) { + super(httpServer); + } + + @Override + protected ExecutionResponse execute(final HttpServerExchange exchange, final ServerSecurityUser user, + final JSONObject payload) throws Exception { + + final Deque databaseParam = exchange.getQueryParameters().get("database"); + if (databaseParam == null || databaseParam.isEmpty()) + return new ExecutionResponse(400, PromQLResponseFormatter.formatError("bad_data", "Database parameter is required")); + + final String query = getQueryParameter(exchange, "query"); + if (query == null || query.isBlank()) + return new ExecutionResponse(400, PromQLResponseFormatter.formatError("bad_data", "Missing required parameter: query")); + + final String startStr = getQueryParameter(exchange, "start"); + final String endStr = getQueryParameter(exchange, "end"); + final String stepStr = getQueryParameter(exchange, "step"); + + if (startStr == null || endStr == null || stepStr == null) + return new ExecutionResponse(400, + PromQLResponseFormatter.formatError("bad_data", "Missing required parameters: start, end, step")); + + final long startMs = (long) (Double.parseDouble(startStr) * 1000); + final long endMs = (long) (Double.parseDouble(endStr) * 1000); + final long stepMs = parseStep(stepStr); + + if (stepMs <= 0) + return new ExecutionResponse(400, PromQLResponseFormatter.formatError("bad_data", "Step must be positive")); + + final DatabaseInternal database = httpServer.getServer().getDatabase(databaseParam.getFirst(), false, false); + + try { + final PromQLExpr expr = new PromQLParser(query).parse(); + final String lookbackStr = getQueryParameter(exchange, "lookback_delta"); + final PromQLEvaluator evaluator = lookbackStr != null && !lookbackStr.isBlank() + ? new PromQLEvaluator(database, PromQLParser.parseDuration(lookbackStr)) + : new PromQLEvaluator(database); + final PromQLResult result = evaluator.evaluateRange(expr, startMs, endMs, stepMs); + return new ExecutionResponse(200, PromQLResponseFormatter.formatSuccess(result)); + } catch (final IllegalArgumentException e) { + return new ExecutionResponse(400, PromQLResponseFormatter.formatError("bad_data", e.getMessage())); + } + } + + private long parseStep(final String step) { + try { + // Try as plain seconds (e.g. "60") + return (long) (Double.parseDouble(step) * 1000); + } catch (final NumberFormatException e) { + // Try as duration (e.g. "1m") + return PromQLParser.parseDuration(step); + } + } +} diff --git a/server/src/main/java/com/arcadedb/server/http/handler/GetPromQLSeriesHandler.java b/server/src/main/java/com/arcadedb/server/http/handler/GetPromQLSeriesHandler.java new file mode 100644 index 0000000000..a45dc2dd33 --- /dev/null +++ b/server/src/main/java/com/arcadedb/server/http/handler/GetPromQLSeriesHandler.java @@ -0,0 +1,115 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.server.http.handler; + +import com.arcadedb.database.DatabaseInternal; +import com.arcadedb.engine.timeseries.ColumnDefinition; +import com.arcadedb.engine.timeseries.TimeSeriesEngine; +import com.arcadedb.engine.timeseries.promql.PromQLEvaluator; +import com.arcadedb.engine.timeseries.promql.PromQLParser; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.LabelMatcher; +import com.arcadedb.engine.timeseries.promql.ast.PromQLExpr.VectorSelector; +import com.arcadedb.schema.DocumentType; +import com.arcadedb.schema.LocalTimeSeriesType; +import com.arcadedb.serializer.json.JSONObject; +import com.arcadedb.server.http.HttpServer; +import com.arcadedb.server.security.ServerSecurityUser; +import io.undertow.server.HttpServerExchange; + +import java.util.ArrayList; +import java.util.Deque; +import java.util.LinkedHashMap; +import java.util.LinkedHashSet; +import java.util.List; +import java.util.Map; +import java.util.Set; + +/** + * HTTP handler for PromQL series lookup. + * Endpoint: GET /api/v1/ts/{database}/prom/api/v1/series + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public class GetPromQLSeriesHandler extends AbstractServerHttpHandler { + + public GetPromQLSeriesHandler(final HttpServer httpServer) { + super(httpServer); + } + + @Override + protected ExecutionResponse execute(final HttpServerExchange exchange, final ServerSecurityUser user, + final JSONObject payload) throws Exception { + + final Deque databaseParam = exchange.getQueryParameters().get("database"); + if (databaseParam == null || databaseParam.isEmpty()) + return new ExecutionResponse(400, PromQLResponseFormatter.formatError("bad_data", "Database parameter is required")); + + final Deque matchParams = exchange.getQueryParameters().get("match[]"); + if (matchParams == null || matchParams.isEmpty()) + return new ExecutionResponse(400, + PromQLResponseFormatter.formatError("bad_data", "Missing required parameter: match[]")); + + final String startStr = getQueryParameter(exchange, "start"); + final String endStr = getQueryParameter(exchange, "end"); + final long startMs = startStr != null ? (long) (Double.parseDouble(startStr) * 1000) : Long.MIN_VALUE; + final long endMs = endStr != null ? (long) (Double.parseDouble(endStr) * 1000) : Long.MAX_VALUE; + + final DatabaseInternal database = httpServer.getServer().getDatabase(databaseParam.getFirst(), false, false); + final Set seenKeys = new LinkedHashSet<>(); + final List> seriesList = new ArrayList<>(); + + for (final String matchStr : matchParams) { + try { + final PromQLExpr expr = new PromQLParser(matchStr).parse(); + if (!(expr instanceof VectorSelector vs)) + continue; + + final String typeName = PromQLEvaluator.sanitizeTypeName(vs.metricName()); + if (!database.getSchema().existsType(typeName)) + continue; + + final DocumentType docType = database.getSchema().getType(typeName); + if (!(docType instanceof LocalTimeSeriesType tsType) || tsType.getEngine() == null) + continue; + + final TimeSeriesEngine engine = tsType.getEngine(); + final List columns = tsType.getTsColumns(); + final List rows = engine.query(startMs, endMs, null, null); + + for (final Object[] row : rows) { + final Map labels = new LinkedHashMap<>(); + labels.put("__name__", vs.metricName()); + for (int i = 0; i < columns.size(); i++) { + final ColumnDefinition col = columns.get(i); + if (col.getRole() == ColumnDefinition.ColumnRole.TAG && row[i] != null) + labels.put(col.getName(), row[i].toString()); + } + + final String key = labels.toString(); + if (seenKeys.add(key)) + seriesList.add(labels); + } + } catch (final IllegalArgumentException ignored) { + // Skip malformed match patterns + } + } + + return new ExecutionResponse(200, PromQLResponseFormatter.formatSeriesResponse(seriesList)); + } +} diff --git a/server/src/main/java/com/arcadedb/server/http/handler/GetTimeSeriesLatestHandler.java b/server/src/main/java/com/arcadedb/server/http/handler/GetTimeSeriesLatestHandler.java new file mode 100644 index 0000000000..c5f82146a1 --- /dev/null +++ b/server/src/main/java/com/arcadedb/server/http/handler/GetTimeSeriesLatestHandler.java @@ -0,0 +1,122 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.server.http.handler; + +import com.arcadedb.database.DatabaseInternal; +import com.arcadedb.engine.timeseries.ColumnDefinition; +import com.arcadedb.engine.timeseries.TagFilter; +import com.arcadedb.engine.timeseries.TimeSeriesEngine; +import com.arcadedb.schema.DocumentType; +import com.arcadedb.schema.LocalTimeSeriesType; +import com.arcadedb.serializer.json.JSONArray; +import com.arcadedb.serializer.json.JSONObject; +import com.arcadedb.server.http.HttpServer; +import com.arcadedb.server.security.ServerSecurityUser; +import io.undertow.server.HttpServerExchange; + +import java.util.Deque; +import java.util.List; + +/** + * HTTP handler for retrieving the latest TimeSeries value. + * Endpoint: GET /api/v1/ts/{database}/latest?type=weather&tag=location:us-east + */ +public class GetTimeSeriesLatestHandler extends AbstractServerHttpHandler { + + public GetTimeSeriesLatestHandler(final HttpServer httpServer) { + super(httpServer); + } + + @Override + protected ExecutionResponse execute(final HttpServerExchange exchange, final ServerSecurityUser user, + final JSONObject payload) throws Exception { + + final Deque databaseParam = exchange.getQueryParameters().get("database"); + if (databaseParam == null || databaseParam.isEmpty()) + return new ExecutionResponse(400, "{ \"error\" : \"Database parameter is required\"}"); + + final String typeName = getQueryParameter(exchange, "type"); + if (typeName == null || typeName.isBlank()) + return new ExecutionResponse(400, "{ \"error\" : \"'type' query parameter is required\"}"); + + final DatabaseInternal database = httpServer.getServer().getDatabase(databaseParam.getFirst(), false, false); + + if (!database.getSchema().existsType(typeName)) + return new ExecutionResponse(400, "{ \"error\" : \"Type '" + typeName + "' does not exist\"}"); + + final DocumentType docType = database.getSchema().getType(typeName); + if (!(docType instanceof LocalTimeSeriesType tsType) || tsType.getEngine() == null) + return new ExecutionResponse(400, "{ \"error\" : \"Type '" + typeName + "' is not a TimeSeries type\"}"); + + final TimeSeriesEngine engine = tsType.getEngine(); + final List columns = tsType.getTsColumns(); + + // Build tag filter from query param + final TagFilter tagFilter = buildTagFilter(exchange, columns); + + // Query full range and take last element + final List rows = engine.query(Long.MIN_VALUE, Long.MAX_VALUE, null, tagFilter); + + // Build column names + final JSONArray colNames = new JSONArray(); + for (final ColumnDefinition col : columns) + colNames.put(col.getName()); + + final JSONObject result = new JSONObject(); + result.put("type", typeName); + result.put("columns", colNames); + + if (rows.isEmpty()) { + result.put("latest", JSONObject.NULL); + } else { + final Object[] lastRow = rows.get(rows.size() - 1); + final JSONArray latestArray = new JSONArray(); + for (final Object val : lastRow) + latestArray.put(val); + result.put("latest", latestArray); + } + + return new ExecutionResponse(200, result.toString()); + } + + private TagFilter buildTagFilter(final HttpServerExchange exchange, final List columns) { + final String tagParam = getQueryParameter(exchange, "tag"); + if (tagParam == null || tagParam.isBlank()) + return null; + + final int colonIdx = tagParam.indexOf(':'); + if (colonIdx <= 0) + return null; + + final String tagName = tagParam.substring(0, colonIdx); + final String tagValue = tagParam.substring(colonIdx + 1); + + // columnIndex for TagFilter is among non-timestamp columns (0-based) + int nonTsIdx = 0; + for (final ColumnDefinition col : columns) { + if (col.getRole() == ColumnDefinition.ColumnRole.TIMESTAMP) + continue; + if (col.getRole() == ColumnDefinition.ColumnRole.TAG && col.getName().equals(tagName)) + return TagFilter.eq(nonTsIdx, tagValue); + nonTsIdx++; + } + + return null; + } +} diff --git a/server/src/main/java/com/arcadedb/server/http/handler/PostGrafanaQueryHandler.java b/server/src/main/java/com/arcadedb/server/http/handler/PostGrafanaQueryHandler.java new file mode 100644 index 0000000000..cfacb3e8f9 --- /dev/null +++ b/server/src/main/java/com/arcadedb/server/http/handler/PostGrafanaQueryHandler.java @@ -0,0 +1,272 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.server.http.handler; + +import com.arcadedb.database.DatabaseInternal; +import com.arcadedb.engine.timeseries.AggregationType; +import com.arcadedb.engine.timeseries.ColumnDefinition; +import com.arcadedb.engine.timeseries.MultiColumnAggregationRequest; +import com.arcadedb.engine.timeseries.MultiColumnAggregationResult; +import com.arcadedb.engine.timeseries.TagFilter; +import com.arcadedb.engine.timeseries.TimeSeriesEngine; +import com.arcadedb.schema.DocumentType; +import com.arcadedb.schema.LocalTimeSeriesType; +import com.arcadedb.schema.Type; +import com.arcadedb.serializer.json.JSONArray; +import com.arcadedb.serializer.json.JSONObject; +import com.arcadedb.server.http.HttpServer; +import com.arcadedb.server.security.ServerSecurityUser; +import io.undertow.server.HttpServerExchange; + +import java.util.ArrayList; +import java.util.Deque; +import java.util.List; + +/** + * Grafana DataFrame query endpoint. + * Endpoint: POST /api/v1/ts/{database}/grafana/query + * + * Accepts multi-target queries and returns Grafana DataFrame wire format (columnar arrays with schema metadata). + */ +public class PostGrafanaQueryHandler extends AbstractServerHttpHandler { + + public PostGrafanaQueryHandler(final HttpServer httpServer) { + super(httpServer); + } + + @Override + protected boolean mustExecuteOnWorkerThread() { + return true; + } + + @Override + protected ExecutionResponse execute(final HttpServerExchange exchange, final ServerSecurityUser user, + final JSONObject payload) throws Exception { + + final Deque databaseParam = exchange.getQueryParameters().get("database"); + if (databaseParam == null || databaseParam.isEmpty()) + return new ExecutionResponse(400, "{ \"error\" : \"Database parameter is required\"}"); + + if (payload == null || !payload.has("targets")) + return new ExecutionResponse(400, "{ \"error\" : \"'targets' array is required\"}"); + + final DatabaseInternal database = httpServer.getServer().getDatabase(databaseParam.getFirst(), false, false); + + final long fromTs = payload.getLong("from", Long.MIN_VALUE); + final long toTs = payload.getLong("to", Long.MAX_VALUE); + final int maxDataPoints = payload.getInt("maxDataPoints", 0); + + final JSONArray targets = payload.getJSONArray("targets"); + final JSONObject results = new JSONObject(); + + for (int t = 0; t < targets.length(); t++) { + final JSONObject target = targets.getJSONObject(t); + final String refId = target.getString("refId", "A"); + final String typeName = target.getString("type"); + + if (!database.getSchema().existsType(typeName)) { + results.put(refId, buildErrorFrame("Type '" + typeName + "' does not exist")); + continue; + } + + final DocumentType docType = database.getSchema().getType(typeName); + if (!(docType instanceof LocalTimeSeriesType tsType) || tsType.getEngine() == null) { + results.put(refId, buildErrorFrame("Type '" + typeName + "' is not a TimeSeries type")); + continue; + } + + final TimeSeriesEngine engine = tsType.getEngine(); + final List columns = tsType.getTsColumns(); + + // Build tag filter + final TagFilter tagFilter = target.has("tags") + ? TimeSeriesHandlerUtils.buildTagFilter(target.getJSONObject("tags"), columns) + : null; + + final JSONObject frameResult; + if (target.has("aggregation")) + frameResult = executeAggregation(target, engine, columns, fromTs, toTs, maxDataPoints, tagFilter); + else + frameResult = executeRawQuery(target, engine, columns, fromTs, toTs, tagFilter); + + results.put(refId, frameResult); + } + + final JSONObject response = new JSONObject(); + response.put("results", results); + return new ExecutionResponse(200, response.toString()); + } + + private JSONObject executeRawQuery(final JSONObject target, final TimeSeriesEngine engine, + final List columns, final long fromTs, final long toTs, + final TagFilter tagFilter) throws Exception { + + final int[] columnIndices = target.has("fields") + ? TimeSeriesHandlerUtils.resolveColumnIndices(target.getJSONArray("fields"), columns) + : null; + + final List rows = engine.query(fromTs, toTs, columnIndices, tagFilter); + + // Build schema fields and columnar data + final List selectedColumns = new ArrayList<>(); + if (columnIndices == null) { + selectedColumns.addAll(columns); + } else { + for (final int idx : columnIndices) + selectedColumns.add(columns.get(idx)); + } + + final JSONArray schemaFields = new JSONArray(); + for (final ColumnDefinition col : selectedColumns) { + final JSONObject field = new JSONObject(); + field.put("name", col.getName()); + field.put("type", grafanaFieldType(col)); + schemaFields.put(field); + } + + // Transpose rows to columnar format + final int numCols = selectedColumns.size(); + final JSONArray[] columnArrays = new JSONArray[numCols]; + for (int c = 0; c < numCols; c++) + columnArrays[c] = new JSONArray(); + + for (final Object[] row : rows) { + for (int c = 0; c < numCols; c++) + columnArrays[c].put(row[c]); + } + + final JSONArray valuesArray = new JSONArray(); + for (final JSONArray col : columnArrays) + valuesArray.put(col); + + return buildFrame(schemaFields, valuesArray); + } + + private JSONObject executeAggregation(final JSONObject target, final TimeSeriesEngine engine, + final List columns, final long fromTs, final long toTs, + final int maxDataPoints, final TagFilter tagFilter) throws Exception { + + final JSONObject aggJson = target.getJSONObject("aggregation"); + final JSONArray requestsJson = aggJson.getJSONArray("requests"); + + // Determine bucket interval: explicit or auto-calculated from maxDataPoints + long bucketInterval = aggJson.getLong("bucketInterval", 0); + if (bucketInterval <= 0 && maxDataPoints > 0 && fromTs != Long.MIN_VALUE && toTs != Long.MAX_VALUE) + bucketInterval = Math.max(1, (toTs - fromTs) / maxDataPoints); + if (bucketInterval <= 0) + bucketInterval = 60000; // fallback: 1 minute + + final List requests = new ArrayList<>(); + final List aliases = new ArrayList<>(); + + for (int i = 0; i < requestsJson.length(); i++) { + final JSONObject req = requestsJson.getJSONObject(i); + final String fieldName = req.getString("field"); + final AggregationType aggType = AggregationType.valueOf(req.getString("type")); + final String alias = req.getString("alias", fieldName + "_" + aggType.name().toLowerCase()); + + final int colIndex = TimeSeriesHandlerUtils.findColumnIndex(fieldName, columns); + if (colIndex < 0) + return buildErrorFrame("Field '" + fieldName + "' not found in type"); + + requests.add(new MultiColumnAggregationRequest(colIndex, aggType, alias)); + aliases.add(alias); + } + + final MultiColumnAggregationResult aggResult = engine.aggregateMulti(fromTs, toTs, requests, bucketInterval, + tagFilter); + + final List timestamps = aggResult.getBucketTimestamps(); + + // Schema: time + one field per aggregation + final JSONArray schemaFields = new JSONArray(); + final JSONObject timeField = new JSONObject(); + timeField.put("name", "time"); + timeField.put("type", "time"); + schemaFields.put(timeField); + + for (final String alias : aliases) { + final JSONObject field = new JSONObject(); + field.put("name", alias); + field.put("type", "number"); + schemaFields.put(field); + } + + // Columnar data: timestamps column + one column per aggregation + final JSONArray timeValues = new JSONArray(); + for (final long ts : timestamps) + timeValues.put(ts); + + final JSONArray[] aggColumns = new JSONArray[aliases.size()]; + for (int r = 0; r < aliases.size(); r++) + aggColumns[r] = new JSONArray(); + + for (final long ts : timestamps) { + for (int r = 0; r < requests.size(); r++) + aggColumns[r].put(aggResult.getValue(ts, r)); + } + + final JSONArray valuesArray = new JSONArray(); + valuesArray.put(timeValues); + for (final JSONArray col : aggColumns) + valuesArray.put(col); + + return buildFrame(schemaFields, valuesArray); + } + + private static JSONObject buildFrame(final JSONArray schemaFields, final JSONArray values) { + final JSONObject schema = new JSONObject(); + schema.put("fields", schemaFields); + + final JSONObject data = new JSONObject(); + data.put("values", values); + + final JSONObject frame = new JSONObject(); + frame.put("schema", schema); + frame.put("data", data); + + final JSONArray frames = new JSONArray(); + frames.put(frame); + + final JSONObject result = new JSONObject(); + result.put("frames", frames); + return result; + } + + private static JSONObject buildErrorFrame(final String message) { + final JSONObject result = new JSONObject(); + result.put("error", message); + result.put("frames", new JSONArray()); + return result; + } + + private static String grafanaFieldType(final ColumnDefinition col) { + if (col.getRole() == ColumnDefinition.ColumnRole.TIMESTAMP) + return "time"; + + final Type dt = col.getDataType(); + return switch (dt) { + case DOUBLE, FLOAT, INTEGER, SHORT, LONG, BYTE, DECIMAL -> "number"; + case BOOLEAN -> "boolean"; + case STRING -> "string"; + case DATETIME, DATE -> "time"; + default -> "string"; + }; + } +} diff --git a/server/src/main/java/com/arcadedb/server/http/handler/PostPrometheusReadHandler.java b/server/src/main/java/com/arcadedb/server/http/handler/PostPrometheusReadHandler.java new file mode 100644 index 0000000000..afd7123cad --- /dev/null +++ b/server/src/main/java/com/arcadedb/server/http/handler/PostPrometheusReadHandler.java @@ -0,0 +1,238 @@ +/* + * Copyright © 2021-present Arcade Data Ltd (info@arcadedata.com) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * SPDX-FileCopyrightText: 2021-present Arcade Data Ltd (info@arcadedata.com) + * SPDX-License-Identifier: Apache-2.0 + */ +package com.arcadedb.server.http.handler; + +import com.arcadedb.database.DatabaseInternal; +import com.arcadedb.engine.timeseries.ColumnDefinition; +import com.arcadedb.engine.timeseries.TagFilter; +import com.arcadedb.engine.timeseries.TimeSeriesEngine; +import com.arcadedb.schema.DocumentType; +import com.arcadedb.schema.LocalTimeSeriesType; +import com.arcadedb.serializer.json.JSONObject; +import com.arcadedb.server.http.HttpServer; +import com.arcadedb.server.http.handler.prometheus.PrometheusTypes; +import com.arcadedb.server.http.handler.prometheus.PrometheusTypes.Label; +import com.arcadedb.server.http.handler.prometheus.PrometheusTypes.LabelMatcher; +import com.arcadedb.server.http.handler.prometheus.PrometheusTypes.MatchType; +import com.arcadedb.server.http.handler.prometheus.PrometheusTypes.Query; +import com.arcadedb.server.http.handler.prometheus.PrometheusTypes.QueryResult; +import com.arcadedb.server.http.handler.prometheus.PrometheusTypes.ReadRequest; +import com.arcadedb.server.http.handler.prometheus.PrometheusTypes.ReadResponse; +import com.arcadedb.server.http.handler.prometheus.PrometheusTypes.Sample; +import com.arcadedb.server.http.handler.prometheus.PrometheusTypes.TimeSeries; +import com.arcadedb.server.security.ServerSecurityUser; +import io.undertow.server.HttpServerExchange; +import io.undertow.util.HttpString; +import org.xerial.snappy.Snappy; + +import java.nio.ByteBuffer; +import java.util.ArrayList; +import java.util.Deque; +import java.util.LinkedHashMap; +import java.util.List; +import java.util.Map; + +/** + * HTTP handler for Prometheus remote_read protocol. + * Endpoint: POST /api/v1/ts/{database}/prom/read + *

+ * Receives Snappy-compressed protobuf ReadRequest messages, + * queries the TimeSeries engine, and returns Snappy-compressed protobuf ReadResponse. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public class PostPrometheusReadHandler extends AbstractBinaryHttpHandler { + + public PostPrometheusReadHandler(final HttpServer httpServer) { + super(httpServer); + } + + @Override + protected ExecutionResponse execute(final HttpServerExchange exchange, final ServerSecurityUser user, + final JSONObject payload) throws Exception { + + final Deque databaseParam = exchange.getQueryParameters().get("database"); + if (databaseParam == null || databaseParam.isEmpty()) + return new ExecutionResponse(400, "{ \"error\" : \"Database parameter is required\"}"); + + if (rawBytes == null || rawBytes.length == 0) + return new ExecutionResponse(400, "{ \"error\" : \"Request body is empty\"}"); + + // Snappy decompress + final byte[] decompressed; + try { + decompressed = Snappy.uncompress(rawBytes); + } catch (final Exception e) { + return new ExecutionResponse(400, "{ \"error\" : \"Invalid Snappy-compressed data\"}"); + } + + final ReadRequest readRequest = ReadRequest.decode(decompressed); + final DatabaseInternal database = httpServer.getServer().getDatabase(databaseParam.getFirst(), false, false); + + final List queryResults = new ArrayList<>(); + + for (final Query query : readRequest.getQueries()) { + // Find __name__ matcher to determine which type to query + String metricName = null; + final List tagMatchers = new ArrayList<>(); + + for (final LabelMatcher matcher : query.getMatchers()) { + if ("__name__".equals(matcher.name()) && matcher.type() == MatchType.EQ) + metricName = matcher.value(); + else + tagMatchers.add(matcher); + } + + if (metricName == null) { + queryResults.add(new QueryResult(List.of())); + continue; + } + + final String typeName = PostPrometheusWriteHandler.sanitizeTypeName(metricName); + + if (!database.getSchema().existsType(typeName)) { + queryResults.add(new QueryResult(List.of())); + continue; + } + + final DocumentType docType = database.getSchema().getType(typeName); + if (!(docType instanceof LocalTimeSeriesType tsType) || tsType.getEngine() == null) { + queryResults.add(new QueryResult(List.of())); + continue; + } + + final TimeSeriesEngine engine = tsType.getEngine(); + final List columns = tsType.getTsColumns(); + + // Build TagFilter from label matchers (EQ only for now) + // TagFilter.matches() accesses row[columnIndex + 1], so the index must be + // the zero-based position among non-timestamp columns. + TagFilter tagFilter = null; + for (final LabelMatcher matcher : tagMatchers) { + if (matcher.type() != MatchType.EQ) + continue; + + final String colName = PostPrometheusWriteHandler.sanitizeColumnName(matcher.name()); + final int nonTsIndex = findNonTimestampColumnIndex(columns, colName); + if (nonTsIndex < 0) + continue; + + if (tagFilter == null) + tagFilter = TagFilter.eq(nonTsIndex, matcher.value()); + else + tagFilter = tagFilter.and(nonTsIndex, matcher.value()); + } + + // Query the engine + final List rows = engine.query(query.getStartTimestampMs(), query.getEndTimestampMs(), null, tagFilter); + + // Group by label combination → TimeSeries + final Map> grouped = new LinkedHashMap<>(); + for (final Object[] row : rows) { + final String key = buildLabelKey(columns, row); + grouped.computeIfAbsent(key, k -> new ArrayList<>()).add(row); + } + + // Convert to Prometheus TimeSeries + final List seriesList = new ArrayList<>(); + for (final Map.Entry> entry : grouped.entrySet()) { + final List groupRows = entry.getValue(); + final Object[] firstRow = groupRows.getFirst(); + + // Build labels — row[i] corresponds to columns.get(i) + final List

+ * Receives Snappy-compressed protobuf WriteRequest messages, + * auto-creates TimeSeries types as needed, and inserts samples. + * + * @author Luca Garulli (l.garulli@arcadedata.com) + */ +public class PostPrometheusWriteHandler extends AbstractBinaryHttpHandler { + + public PostPrometheusWriteHandler(final HttpServer httpServer) { + super(httpServer); + } + + @Override + protected ExecutionResponse execute(final HttpServerExchange exchange, final ServerSecurityUser user, + final JSONObject payload) throws Exception { + + exchange.getResponseHeaders().put(new HttpString("X-Prometheus-Remote-Write-Version"), "0.1.0"); + + final Deque databaseParam = exchange.getQueryParameters().get("database"); + if (databaseParam == null || databaseParam.isEmpty()) + return new ExecutionResponse(400, "{ \"error\" : \"Database parameter is required\"}"); + + if (rawBytes == null || rawBytes.length == 0) + return new ExecutionResponse(400, "{ \"error\" : \"Request body is empty\"}"); + + // Snappy decompress + final byte[] decompressed; + try { + decompressed = Snappy.uncompress(rawBytes); + } catch (final Exception e) { + return new ExecutionResponse(400, "{ \"error\" : \"Invalid Snappy-compressed data\"}"); + } + + // Decode protobuf WriteRequest + final WriteRequest writeRequest = WriteRequest.decode(decompressed); + if (writeRequest.getTimeSeries().isEmpty()) + return new ExecutionResponse(204, ""); + + final DatabaseInternal database = httpServer.getServer().getDatabase(databaseParam.getFirst(), false, false); + + database.begin(); + try { + for (final TimeSeries ts : writeRequest.getTimeSeries()) { + final String metricName = ts.getMetricName(); + if (metricName == null || metricName.isEmpty()) + continue; + + // Sanitize metric name: dots/hyphens → underscores + final String typeName = sanitizeTypeName(metricName); + + // Auto-create type if needed + final LocalTimeSeriesType tsType = getOrCreateType(database, typeName, ts.getLabels()); + final TimeSeriesEngine engine = tsType.getEngine(); + final List columns = tsType.getTsColumns(); + + // Insert each sample + for (final Sample sample : ts.getSamples()) { + final long[] timestamps = new long[] { sample.timestampMs() }; + final Object[][] columnValues = new Object[columns.size() - 1][1]; + + int colIdx = 0; + for (int i = 0; i < columns.size(); i++) { + final ColumnDefinition col = columns.get(i); + if (col.getRole() == ColumnDefinition.ColumnRole.TIMESTAMP) + continue; + + Object value; + if (col.getRole() == ColumnDefinition.ColumnRole.TAG) + value = findLabelValue(ts.getLabels(), col.getName()); + else + value = sample.value(); // the "value" field + columnValues[colIdx][0] = value; + colIdx++; + } + + engine.appendSamples(timestamps, columnValues); + } + } + database.commit(); + } catch (final Exception e) { + database.rollback(); + throw e; + } + + return new ExecutionResponse(204, ""); + } + + private LocalTimeSeriesType getOrCreateType(final DatabaseInternal database, final String typeName, + final List

+ +
+ POST + /api/v1/ts/{db}/write + Ingest via Line Protocol +
+
+ POST + /api/v1/ts/{db}/query + Query timeseries data +
+
+ GET + /api/v1/ts/{db}/latest + Get latest value +
+
+ GET + /api/v1/ts/{db}/grafana/health + Grafana health check +
+
+ GET + /api/v1/ts/{db}/grafana/metadata + Grafana metadata +
+
+ POST + /api/v1/ts/{db}/grafana/query + Grafana DataFrame query +
+
+
@@ -307,6 +342,95 @@
HTTP API Reference
responseCode: "200 OK", responseBody: '{\n "result": true\n}', notes: null + }, + tsWrite: { + method: "POST", path: "/api/v1/ts/{database}/write", auth: true, tryIt: true, + contentType: "text/plain", + docsAnchor: "http-timeseries", + title: "Ingest TimeSeries Data (Line Protocol)", + description: "Ingests time-series data using InfluxDB Line Protocol format. Each line represents one data point: measurement,tag1=val1 field1=value1,field2=value2 timestamp. This is the recommended method for bulk ingestion.", + params: [ + { name: "database", desc: "Database name" } + ], + queryParams: [ + { name: "precision", desc: "Timestamp precision: ns (nanoseconds, default), us (microseconds), ms (milliseconds), s (seconds)" } + ], + requestHeaders: [ + { name: "Content-Type", value: "text/plain", desc: "Line Protocol is plain text, not JSON" } + ], + requestBody: "stocks,symbol=TSLA open=250.64,close=252.10,high=253.50,low=249.80,volume=125000i 1700000000000000000\nstocks,symbol=AAPL open=195.20,close=196.50,high=197.00,low=194.80,volume=89000i 1700000000000000000", + responseCode: "204 No Content", + responseBody: null, + notes: "Returns 204 No Content on success (InfluxDB convention). Unknown measurement names (no matching TimeSeries type) are silently skipped. Integer fields require an i suffix (e.g. volume=125000i). Multiple lines can be sent in a single request for batch ingestion." + }, + tsQuery: { + method: "POST", path: "/api/v1/ts/{database}/query", auth: true, tryIt: true, + docsAnchor: "http-timeseries", + title: "Query TimeSeries Data", + description: "Queries time-series data with optional time range filtering, field projection, tag filtering, and aggregation with configurable bucket intervals. Returns either raw rows or aggregated buckets depending on whether an aggregation block is provided.", + params: [ + { name: "database", desc: "Database name" } + ], + requestBody: '{\n "type": "stocks",\n "from": 1700000000000,\n "to": 1700100000000,\n "fields": ["open", "close", "volume"],\n "tags": { "symbol": "TSLA" },\n "aggregation": {\n "bucketInterval": 3600000,\n "requests": [\n { "field": "close", "type": "AVG", "alias": "avg_close" },\n { "field": "volume", "type": "SUM", "alias": "total_vol" }\n ]\n },\n "limit": 10000\n}', + responseCode: "200 OK", + responseBody: '// Raw query (no aggregation):\n{\n "type": "stocks",\n "columns": ["ts", "symbol", "open", "close", "volume"],\n "rows": [\n [1700000000000, "TSLA", 250.64, 252.10, 125000],\n [1700000060000, "TSLA", 251.30, 253.00, 130000]\n ],\n "count": 2\n}\n\n// Aggregated query:\n{\n "type": "stocks",\n "aggregations": ["avg_close", "total_vol"],\n "buckets": [\n { "timestamp": 1700000000000, "values": [252.55, 255000] },\n { "timestamp": 1700003600000, "values": [254.10, 310000] }\n ],\n "count": 2\n}', + notes: "Request fields: type (required) — TimeSeries type name. from/to (optional) — epoch ms range. fields (optional) — subset of columns. tags (optional) — filter by tag values. aggregation (optional) — aggregate with bucket interval. Supported aggregation types: AVG, SUM, MIN, MAX, COUNT. limit (optional, default 20000) — max rows for raw queries." + }, + tsLatest: { + method: "GET", path: "/api/v1/ts/{database}/latest", auth: true, tryIt: true, + docsAnchor: "http-timeseries", + title: "Get Latest TimeSeries Value", + description: "Returns the most recent data point for a TimeSeries type. Optionally filter by a single tag value to get the latest for a specific dimension (e.g. a specific sensor or stock symbol).", + params: [ + { name: "database", desc: "Database name" } + ], + queryParams: [ + { name: "type", desc: "TimeSeries type name (required)" }, + { name: "tag", desc: "Optional tag filter in format tagName:value (e.g. symbol:TSLA)" } + ], + requestBody: null, + responseCode: "200 OK", + responseBody: '{\n "type": "stocks",\n "columns": ["ts", "symbol", "open", "close", "high", "low", "volume"],\n "latest": [1700100000000, "TSLA", 255.30, 256.10, 257.00, 254.80, 142000]\n}\n\n// When no data exists:\n{\n "type": "stocks",\n "columns": ["ts", "symbol", "open", "close", "high", "low", "volume"],\n "latest": null\n}', + notes: "Returns \"latest\": null if no data exists. The tag filter format is key:value (e.g. ?tag=symbol:TSLA)." + }, + tsGrafanaHealth: { + method: "GET", path: "/api/v1/ts/{database}/grafana/health", auth: true, tryIt: true, + docsAnchor: "http-timeseries", + title: "Grafana Health Check", + description: "Verifies the database exists and the Grafana endpoints are reachable. Use this as the health check URL when configuring a Grafana Infinity datasource.", + params: [ + { name: "database", desc: "Database name" } + ], + requestBody: null, + responseCode: "200 OK", + responseBody: '{\n "status": "ok",\n "database": "mydb"\n}', + notes: "Returns 200 with status ok if the database exists." + }, + tsGrafanaMetadata: { + method: "GET", path: "/api/v1/ts/{database}/grafana/metadata", auth: true, tryIt: true, + docsAnchor: "http-timeseries", + title: "Grafana Metadata", + description: "Discovers all TimeSeries types in the database with their fields, tags, and available aggregation types. Use this to configure Grafana panel queries.", + params: [ + { name: "database", desc: "Database name" } + ], + requestBody: null, + responseCode: "200 OK", + responseBody: '{\n "types": [\n {\n "name": "weather",\n "fields": [{ "name": "temperature", "dataType": "DOUBLE" }],\n "tags": [{ "name": "location", "dataType": "STRING" }]\n }\n ],\n "aggregationTypes": ["SUM", "AVG", "MIN", "MAX", "COUNT"]\n}', + notes: "Lists only types that are TimeSeries types with an active engine. Non-TimeSeries types are excluded." + }, + tsGrafanaQuery: { + method: "POST", path: "/api/v1/ts/{database}/grafana/query", auth: true, tryIt: true, + docsAnchor: "http-timeseries", + title: "Grafana DataFrame Query", + description: "Queries TimeSeries data and returns results in Grafana DataFrame wire format (columnar arrays with schema metadata). Supports multiple targets (one per Grafana panel query), raw and aggregated queries, tag filtering, and automatic bucket interval calculation.", + params: [ + { name: "database", desc: "Database name" } + ], + requestBody: '{\n "from": 1700000000000,\n "to": 1700086400000,\n "maxDataPoints": 1000,\n "targets": [\n {\n "refId": "A",\n "type": "weather",\n "fields": ["temperature"],\n "tags": { "location": "us-east" },\n "aggregation": {\n "bucketInterval": 60000,\n "requests": [\n { "field": "temperature", "type": "AVG", "alias": "avg_temp" }\n ]\n }\n }\n ]\n}', + responseCode: "200 OK", + responseBody: '{\n "results": {\n "A": {\n "frames": [{\n "schema": {\n "fields": [\n { "name": "time", "type": "time" },\n { "name": "avg_temp", "type": "number" }\n ]\n },\n "data": {\n "values": [\n [1700000000000, 1700000060000],\n [23.5, 24.1]\n ]\n }\n }]\n }\n }\n}', + notes: "Request fields: from/to (optional) — epoch ms shared across all targets. maxDataPoints (optional) — auto-calculates bucketInterval when aggregation is requested but interval is omitted. targets[].refId — Grafana panel query ID. targets[].type (required) — TimeSeries type name. targets[].fields (optional) — subset of columns. targets[].tags (optional) — tag filter. targets[].aggregation (optional) — aggregation with bucket interval and requests. Supported aggregation types: AVG, SUM, MIN, MAX, COUNT." } }; @@ -359,6 +483,17 @@
HTTP API Reference
html += "
"; } + // Query Parameters + if (ep.queryParams && ep.queryParams.length > 0) { + html += "
"; + html += "
Query Parameters
"; + html += ""; + for (var i = 0; i < ep.queryParams.length; i++) + html += ""; + html += "
NameDescription
" + ep.queryParams[i].name + "" + ep.queryParams[i].desc + "
"; + html += "
"; + } + // Request Headers if (ep.requestHeaders && ep.requestHeaders.length > 0) { html += "
"; @@ -488,10 +623,24 @@
HTTP API Reference
html += "
"; } + // Query parameters + if (ep.queryParams && ep.queryParams.length > 0) { + html += "
"; + html += ""; + for (var i = 0; i < ep.queryParams.length; i++) { + var qp = ep.queryParams[i]; + html += "
"; + html += "" + escapeHtml(qp.name) + ""; + html += ""; + html += "
"; + } + html += "
"; + } + // Request body if (ep.requestBody) { html += "
"; - html += ""; + html += ""; html += ""; html += "
"; } @@ -542,6 +691,19 @@
HTTP API Reference
} } + // Append query parameters + if (ep.queryParams) { + var qpParts = []; + for (var i = 0; i < ep.queryParams.length; i++) { + var qp = ep.queryParams[i]; + var qpVal = (document.getElementById('apiQP_' + key + '_' + qp.name) || {}).value; + if (qpVal && qpVal.trim()) + qpParts.push(encodeURIComponent(qp.name) + '=' + encodeURIComponent(qpVal.trim())); + } + if (qpParts.length > 0) + url += '?' + qpParts.join('&'); + } + // Build auth header var authType = document.getElementById('apiAuth_' + key).value; var headers = {}; @@ -577,11 +739,14 @@
HTTP API Reference
var startTime = Date.now(); + var ct = body ? (ep.contentType || 'application/json') : undefined; + $.ajax({ type: ep.method, url: url, data: body, - contentType: body ? 'application/json' : undefined, + contentType: ct, + processData: false, headers: headers, timeout: 30000 }) diff --git a/studio/src/main/resources/static/css/studio.css b/studio/src/main/resources/static/css/studio.css index 159af8ee3e..c4c571b164 100644 --- a/studio/src/main/resources/static/css/studio.css +++ b/studio/src/main/resources/static/css/studio.css @@ -193,6 +193,8 @@ table, #tab-query { overflow-y: hidden !important; + display: flex; + flex-direction: column; } #tab-database { @@ -1502,6 +1504,25 @@ div.dt-search { white-space: pre; } +.ts-ingest-code { + background-color: var(--bg-code); + color: var(--text-code); + padding: 12px 14px; + border-radius: 6px; + font-family: 'SFMono-Regular', Consolas, 'Liberation Mono', Menlo, monospace; + font-size: 0.78rem; + line-height: 1.5; + overflow-x: auto; + white-space: pre; + margin-bottom: 10px; +} + +.ts-ingest-code code { + color: inherit; + background: none; + padding: 0; +} + .api-detail-response-code { display: inline-block; font-size: 0.75rem; diff --git a/studio/src/main/resources/static/index.html b/studio/src/main/resources/static/index.html index 3696428d9c..27bc22ce57 100644 --- a/studio/src/main/resources/static/index.html +++ b/studio/src/main/resources/static/index.html @@ -149,6 +149,7 @@ + @@ -175,6 +176,14 @@ +