-
Notifications
You must be signed in to change notification settings - Fork 0
Guide Data Pipelines
ARO provides a map/reduce style data pipeline for filtering, transforming, and aggregating collections. All operations are type-safe, with results typed via OpenAPI schemas.
ARO supports four core data operations:
| Operation | Purpose | Example |
|---|---|---|
| Retrieve | Retrieve and filter data | <Retrieve> the <users> from the <repository>... |
| Filter | Filter existing collection | <Filter> the <active> from the <users>... |
| Map | Transform to different type | <Map> the <summaries> from the <users>. |
| Reduce | Aggregate to single value | <Reduce> the <total> from the <orders> with sum(<amount>). |
Type annotations are optional since ARO infers result types. When needed, use as Type after the result:
(* Without type annotation - inferred *)
<Filter> the <active-users> from the <users> where <active> is true.
(* With explicit type annotation *)
<Filter> the <active-users> as List<User> from the <users> where <active> is true.
(* Alternative colon syntax also works *)
<Filter> the <active-users: List<User>> from the <users> where <active> is true.
See the Type System Specification for details.
Retrieves data from repositories with optional filtering, sorting, and pagination.
(* Basic retrieve *)
<Retrieve> the <users: List<User>> from the <user-repository>.
(* With filter *)
<Retrieve> the <active-users: List<User>> from the <users>
where <status> is "active".
(* With sorting *)
<Retrieve> the <recent-users: List<User>> from the <users>
order by <created-at> desc.
(* With pagination *)
<Retrieve> the <page: List<User>> from the <users>
order by <name> asc
limit 20
offset 40.
(* Combined *)
<Retrieve> the <top-customers: List<User>> from the <users>
where <tier> is "premium"
order by <total-purchases> desc
limit 10.
Filters an existing collection with a predicate.
(* Filter by equality *)
<Filter> the <admins: List<User>> from the <users>
where <role> is "admin".
(* Filter by comparison *)
<Filter> the <high-value: List<Order>> from the <orders>
where <amount> > 1000.
(* Filter with multiple conditions *)
<Filter> the <active-premium: List<User>> from the <users>
where <status> is "active" and <tier> is "premium".
| Operator | Description | Example |
|---|---|---|
is, =
|
Equality | <status> is "active" |
is not, !=
|
Inequality | <role> is not "guest" |
>, >=, <, <=
|
Comparison | <age> >= 18 |
in |
Set membership | <status> in ["a", "b"] |
not in |
Set exclusion | <status> not in <excluded> |
between |
Range | <price> between 10 and 100 |
contains |
Substring | <name> contains "test" |
starts with |
Prefix match | <email> starts with "admin" |
ends with |
Suffix match | <file> ends with ".pdf" |
matches |
Regex pattern | <email> matches /^admin@/i |
The in and not in operators accept either a CSV string or an array variable:
(* Using CSV string *)
<Filter> the <pending> from the <orders>
where <status> in "pending,processing".
(* Using array variable *)
<Create> the <exclude-statuses> with ["cancelled", "refunded"].
<Filter> the <active> from the <orders>
where <status> not in <exclude-statuses>.
The matches operator supports regex literals with flags:
(* Filter users with admin emails *)
<Filter> the <admins: List<User>> from the <users>
where <email> matches /^admin@|@admin\./i.
(* Filter valid email addresses *)
<Filter> the <valid-emails: List<User>> from the <users>
where <email> matches /^[\w.+-]+@[\w.-]+\.[a-zA-Z]{2,}$/i.
Transforms a collection to a different OpenAPI-defined type. The runtime automatically maps fields with matching names.
(* Map User to UserSummary *)
<Map> the <summaries: List<UserSummary>> from the <users>.
Requirements:
- Target type must be defined in
openapi.yamlcomponents/schemas - Fields with matching names are automatically copied
- Missing optional fields are omitted
- Missing required fields cause an error
# openapi.yaml
components:
schemas:
User:
type: object
properties:
id: { type: string }
name: { type: string }
email: { type: string }
password-hash: { type: string }
created-at: { type: string }
UserSummary:
type: object
properties:
id: { type: string }
name: { type: string }
email: { type: string }When mapping List<User> to List<UserSummary>, only id, name, and email are copied. Sensitive fields like password-hash are excluded.
Aggregates a collection to a single value using aggregation functions.
(* Count items *)
<Reduce> the <user-count: Integer> from the <users>
with count().
(* Sum numeric field *)
<Reduce> the <total-revenue: Float> from the <orders>
with sum(<amount>).
(* Average *)
<Reduce> the <avg-price: Float> from the <products>
with avg(<price>).
(* Min/Max *)
<Reduce> the <highest-score: Float> from the <scores>
with max(<value>).
(* With filter *)
<Reduce> the <pending-count: Integer> from the <orders>
where <status> is "pending"
with count().
| Function | Description | Example |
|---|---|---|
count() |
Number of items | with count() |
sum(field) |
Sum of numeric field | with sum(<amount>) |
avg(field) |
Average of numeric field | with avg(<price>) |
min(field) |
Minimum value | with min(<date>) |
max(field) |
Maximum value | with max(<score>) |
first() |
First element | with first() |
last() |
Last element | with last() |
Chain operations to build complex data transformations:
(Generate Report: Analytics) {
(* Step 1: Retrieve recent orders *)
<Retrieve> the <recent-orders: List<Order>> from the <orders>
where <created-at> > now().minus(30.days)
order by <created-at> desc.
(* Step 2: Filter high-value orders *)
<Filter> the <high-value: List<Order>> from the <recent-orders>
where <amount> > 1000.
(* Step 3: Map to summaries *)
<Map> the <summaries: List<OrderSummary>> from the <high-value>.
(* Step 4: Calculate total *)
<Reduce> the <total: Float> from the <high-value>
with sum(<amount>).
<Return> an <OK: status> with {
orders: <summaries>,
total: <total>,
count: <high-value>.count()
}.
}
Sort results by one or more fields:
(* Single field, ascending *)
<Retrieve> the <users: List<User>> from the <repository>
order by <name> asc.
(* Single field, descending *)
<Retrieve> the <recent: List<Order>> from the <orders>
order by <created-at> desc.
(* Multiple fields *)
<Retrieve> the <products: List<Product>> from the <catalog>
order by <category> asc, <price> desc.
Limit results with offset for pagination:
(* First page: items 1-20 *)
<Retrieve> the <page1: List<User>> from the <users>
order by <name> asc
limit 20.
(* Second page: items 21-40 *)
<Retrieve> the <page2: List<User>> from the <users>
order by <name> asc
limit 20
offset 20.
(* Third page: items 41-60 *)
<Retrieve> the <page3: List<User>> from the <users>
order by <name> asc
limit 20
offset 40.
openapi: 3.0.3
info:
title: Order Analytics
version: 1.0.0
components:
schemas:
Order:
type: object
properties:
id: { type: string }
customer-id: { type: string }
customer-name: { type: string }
amount: { type: number }
status: { type: string }
region: { type: string }
created-at: { type: string, format: date-time }
required: [id, customer-id, amount, status]
OrderSummary:
type: object
properties:
id: { type: string }
customer-name: { type: string }
amount: { type: number }
required: [id, customer-name, amount](* Application entry point *)
(Application-Start: Order Analytics) {
<Log> "Order Analytics ready" to the <console>.
<Return> an <OK: status> for the <startup>.
}
(* Analytics report generation *)
(Generate Report: Order Analytics) {
(* Retrieve recent orders *)
<Retrieve> the <recent: List<Order>> from the <orders>
where <created-at> > now().minus(30.days)
order by <created-at> desc.
(* Calculate metrics *)
<Reduce> the <total-revenue: Float> from the <recent>
with sum(<amount>).
<Reduce> the <order-count: Integer> from the <recent>
with count().
<Reduce> the <avg-order: Float> from the <recent>
with avg(<amount>).
(* Filter pending orders *)
<Filter> the <pending: List<Order>> from the <recent>
where <status> is "pending".
<Reduce> the <pending-count: Integer> from the <pending>
with count().
(* Map to summaries for response *)
<Map> the <summaries: List<OrderSummary>> from the <recent>.
<Return> an <OK: status> with {
orders: <summaries>,
metrics: {
total-revenue: <total-revenue>,
order-count: <order-count>,
avg-order-value: <avg-order>,
pending-count: <pending-count>
}
}.
}
ARO's data pipelines follow these principles:
- Type-First: All results are typed via OpenAPI schemas
- No SQL Complexity: No JOINs, subqueries, or CTEs
- Pipeline Style: Chain simple operations for complex transformations
- Predictable Performance: Simple operations with clear cost
For complex data needs, use multiple feature sets and compose results in your business logic.
Fundamentals
- The Basics
- Feature Sets
- Actions
- Variables
- Type System
- Control Flow
- Error Handling
- Computations
- Dates
- Concurrency
Runtime & Events
I/O & Communication
Advanced