Skip to content

Conversation

@D4ndellion
Copy link
Member

@D4ndellion D4ndellion commented Oct 25, 2024

This isn't complete yet, but:

image

using this (thrown together from other tests):

#First part:
import duckdb
import polars as pl
import time

class MyDuckDB():
    def __init__(self):
        con = duckdb.connect()
        con.execute("SET TIME ZONE 'UTC';")
        con.execute("""CREATE TABLE ts1 ("timestamp" TIMESTAMPTZ, "value" INTEGER)""")
        ts_1 = pl.read_csv("ts1.csv", try_parse_dates=True).with_columns(pl.col("timestamp").dt.replace_time_zone("UTC"))
        con.append("ts1", df=ts_1.to_pandas())
        con.execute("""CREATE TABLE ts2 ("timestamp" TIMESTAMPTZ, "value" INTEGER)""")
        ts_2 = pl.read_csv("ts2.csv", try_parse_dates=True).with_columns(pl.col("timestamp").dt.replace_time_zone("UTC"))
        con.append("ts2", df=ts_2.to_pandas())
        self.con = con

    def query(self, sql:str) -> pl.DataFrame:
        # We execute the query and return it as a Polars DataFrame.
        # Chrontext expects this method to exist in the provided class.
        df = self.con.execute(sql).pl()
        return df

my_db = MyDuckDB()

#Second part:
from sqlalchemy import MetaData, Table, Column, bindparam
metadata = MetaData()
ts1_table = Table(
    "ts1",
    metadata,
    Column("timestamp"),
    Column("value")
)
ts2_table = Table(
    "ts2",
    metadata,
    Column("timestamp"),
    Column("value")
)
ts1 = ts1_table.select().add_columns(
    bindparam("id1", "ts1").label("id"),
)
ts2 = ts2_table.select().add_columns(
    bindparam("id2", "ts2").label("id"),
)
sql = ts1.union(ts2)

#Third part
from chrontext import VirtualizedPythonDatabase

vdb = VirtualizedPythonDatabase(
    database=my_db,
    resource_sql_map={"my_resource": sql},
    sql_dialect="postgres"
)

#Fourth part
from chrontext import Prefix, Variable, Template, Parameter, RDFType, Triple, XSD
ct = Prefix("ct", "https://github.com/DataTreehouse/chrontext#")
xsd = XSD()
id = Variable("id")
timestamp = Variable("timestamp")
value = Variable("value")
dp = Variable("dp")
resources = {
    "my_resource": Template(
        iri=ct.suf("my_resource"),
        parameters=[
            Parameter(id, rdf_type=RDFType.Literal(xsd.string)),
            Parameter(timestamp, rdf_type=RDFType.Literal(xsd.dateTime)),
            Parameter(value, rdf_type=RDFType.Literal(xsd.double)),
        ],
        instances=[
            Triple(id, ct.suf("hasDataPoint"), dp),
            Triple(dp, ct.suf("hasValue"), value),
            Triple(dp, ct.suf("hasTimestamp"), timestamp)
        ]
    )}

#Fifth part
from chrontext import Engine, SparqlEmbeddedOxigraph
oxigraph_store = SparqlEmbeddedOxigraph(rdf_file="my_graph.ttl", path="oxigraph_db_tutorial")
engine = Engine(
    resources,
    virtualized_python_database=vdb,
    sparql_embedded_oxigraph=oxigraph_store)
engine.init()

#Sixth part
q = """
PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
PREFIX chrontext:<https://github.com/DataTreehouse/chrontext#>
PREFIX types:<http://example.org/types#>
SELECT ?w (SUM(?v) as ?sum_v) WHERE {
    ?w types:hasSensor ?s .
    ?s a types:ThingCounter .
    ?s chrontext:hasTimeseries ?ts .
    ?ts chrontext:hasDataPoint ?dp .
    ?dp chrontext:hasTimestamp ?t .
    ?dp chrontext:hasValue ?v .
    FILTER(?t > "2022-06-01T08:46:53Z"^^xsd:dateTime) .
} GROUP BY ?w
"""
df = engine.query(q)
assert df.shape == (2,2)
print(df)

engine.serve_web("0.0.0.0:3000")
input("Press to exit")

TODO:

  • Move into separate crate?
  • Gate behind feature?
  • Vendor or build yasgui from source

Copy link
Member Author

@D4ndellion D4ndellion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some specific questions. Other feedback also requested though


#[derive(Clone)]
struct AppState {
sparql_engine: Arc<(dyn SparqlQueryable)>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason I might be missing as to maybe implementing this over engine instead of SparqlQueryables?

I think this is only letting you run queries on some underlying sparql database, and I'm worried about how this might interact with the virtualization stuff.

the Engine::query return type (DataFrame, HashMap<String, RDFNodeType>, Vec<Context>) is pretty complicated, so I need to take a deeper dive into how this works

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be implemented over engine, yes.

The DataFrame, HashMap<String, RDFNodeType> representation is a column-based encoding of a result.
For each variable, there is a column. The map holds the RDF type of the column. In case the variable has multiple types, there is a Struct-column with multiple columns for that type.

There is https://github.com/DataTreehouse/maplib/blob/main/lib/representation/src/polars_to_rdf.rs which maps the df and types to a row based result of the kind we need here. Might need a bit of cleaning up though, but should be fairly well tested.

Allows querying the database via http
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants