Open General Information Network here after referred as OpenGIN is an open-source platform designed to build a time-aware digital twin of an eco-system by defining its entities, relationships and data according to a specification. OpenGIN core supports a vide variety of data formats to provide efficient querying to simulate the digital twin. Underneath OpenGIN uses a polyglot database definition which supports to represent the changes of an eco-system through time-travelling.
- Temporal Data Model: The use of
TimeBasedValueenables temporal data management, allowing the system to track both when data is valid (business time) and when it was recorded (system time) - Flexible Schema: The
metadatafield usesgoogle.protobuf.Anyto support arbitrary key-value pairs without schema constraints - Immutable Core Fields: Fields like
id,kind, andcreatedare read-only, ensuring data integrity - Graph-Ready Structure: The
relationshipsfield enables graph-based data modeling and traversal - Storage Awareness: The
attributesof each entity can be stored in the most suitable storage format via a Polyglot database.
An Entity is defined by a set of core parameters: Metadata, Relationships, and Attributes. The core parameters are defined such that they are common to any data stored through the system.
- The Metadata contains any unstructured data associated with an Entity.
- The Relationships refer to how an Entity is connected with other entities.
- The Attributes can be data in any format such as tabular, unstructured, graph, or blob. These can also be interpreted as the datasets owned by an entity.
The format of the Entity is as follows:
Id: Unique Read-only identifier for the entityKind: Read-only entity type classificationCreated Time: Read-only timestamp indicating entity creation timeTerminated Time: Nullable timestamp indicating when the entity was terminatedName(TimeBasedValue): TimeBased value representing the entity's nameMetadata(map<string, Any>): Flexible key-value map to store arbitrary metadataAttributes(map<string, List<TimeBasedValue>>): TimeBased attributes stored as listsRelationships(map<string, Relationship>): Relationships to other entities
The core attributes of an Entity are Id, Kind, Created Time, Terminated Time, and Name.
Metadata is very useful when we have to store unstructured values that can be subject to change from one entity to the other. Attributes are defined in such a way that they have the generic capability to store data of any storage type (we will look into modeling various types of storage in the Attribute section).
Relationships are defined as a map in which the key is a unique identity and the value is a relationship. Note that the relationship contains the type (referred to as name) where one entity can have many relationships of the same type. This resembles the connections an entity has with other entities. The general idea of this specification is to provide a generic model to represent a workflow, an event, or static content in the real world.
OpenGIN introduces a type system to interpret and represent the entities in a generalized manner. The type system defined in OpenGIN follows the definition of MIME (Multipurpose Internet Mail Extensions) and is named as Kind.
Kind refers to a classification of various entities based on the nature of existence. It is defined by following the MIME type definition, where a major and a minor component together define a Kind.
Major: Base category of the typeMinor: Sub-category of the type
For instance, when we define an entity like a "Department of Education," the major of the Kind could be Organization, and the minor of the Kind could be Department. This information needs to be determined before creating a dataset for insertion. Once the major and the minor are selected for an entity, they cannot be changed once it is inserted into the system.
Any value except for immutable types is defined as a TimeBasedValue. This value has a start and an end time. One of the major purposes of Opengin is to record with time sensitivity. Also, the value of a record is defined as Any (protobuf) in order to support all data types and custom data types as decided by the user.
It enables temporal versioning of values with the following fields:
Start Time: Timestamp when the value becomes activeEnd Time: Timestamp when the value becomes inactive (nullable)Value(Any): Value to be stored of any type
Example: A
TimeBasedValuewould be: Start Time=2025-01-10, End Time=N/A, Value=Facebook Handle of a user.
Metadata in OpenGIN provides a flexible mechanism to store unstructured, key-value data associated with entities. Unlike Attributes which are time-based and stored in PostgreSQL, metadata is schema-less and stored in MongoDB, making it ideal for storing arbitrary information that doesn't require temporal tracking or complex querying.
Metadata is defined as a map<string, Any> where:
- Key: A string identifier for the metadata field
- Value: Any protobuf
Anytype, allowing for maximum flexibility
Relationship defines the connection between two entities. Any parameter that changes with time is easy to process with Opengin. Likewise, a Relationship also contains the temporal values in the definition along with a direction.
A Relationship can be defined from one entity to another only in one direction, but it could be queried as an incoming or an outgoing relationship. "Incoming" refers to a relationship originated from another entity towards the referred entity, and "outgoing" refers to that of the opposite.
Id: Unique identifier for the relationshipRelated Entity Id: ID of the related entityName: Name or type of the relationshipStart Time: Timestamp when the relationship beginsEnd Time: Timestamp when the relationship ends (nullable)Direction: Direction of the relationship (Incoming or Outgoing)
Example: A Relationship definition could include a scenario where we need to define a relationship between an organization and its employees. The entities here are Employee and Organization. The Relationship is HIRED_AS at a given time. The start time refers to the moment this employee gets into the organization. When that employee no longer continues to work, this relationship comes to an end, and the end time is updated.
OpenGIN considers that an Entity has a sense of belonging to data that originated through it or which are part of its core definition. To represent this, OpenGIN supports various storage types since various data can take various formats. Thus, one of the main objectives of OpenGIN is to provide a variety of storage formats.
This is motivated by two main reasons:
- Storage Representation: Representing entities and their connections in a traditional primary-key and foreign-key approach through a tabular data storage format may not be practical when those connections get denser (Vicknair et al., 2010). At scale, this implies the usage of a high-performance graph database.
- Efficient Data Ownership: There is a necessity to efficiently store various datasets owned by each entity. These attributes can be in various forms, such as structured, unstructured, graph, or blob data.
From the aforementioned cases, the necessity of a polyglot database is justified.
OpenGIN automatically detects and classifies the core storage types when attributes are entered into the system. The system uses a hierarchical detection approach with the following precedence order for the four core storage types:
- Graph
- Tabular
- Document
- Blob1
- Temporal Data: Native support for time-based values (startTime, endTime) for attributes and relationships.
- Graph Capabilities: Powerful relationship traversal and querying.
- Scalability: Microservices architecture allows independent scaling of components.
- Strict Contracts: Uses Protobuf for internal communication and OpenAPI for external REST APIs.
1. CORE Service
Core API is the heart of the OpenGIN. The OpenGIN specification, data model, query executors, database handlers, data types are defined in this layer. This layer can be directly used to develop applications but we encourage users to work with the Read and Ingestion APIs for application development.
2. Read API
Read API is the read-only API which can be used to query data from OpenGIN. This API is recommended to be used with data applications which are designed only to browse data.
Ingestion API is a write-only API which mainly handles the data ingress. Representing entities, relationships and datasets can be done through this API.
Create
curl -X POST http://localhost:8080/entities \
-H "Content-Type: application/json" \
-d '{
"id": "12345",
"kind": {
"major": "example",
"minor": "test"
},
"created": "2024-03-17T10:00:00Z",
"terminated": "",
"name": {
"startTime": "2024-03-17T10:00:00Z",
"endTime": "",
"value": {
"typeUrl": "type.googleapis.com/google.protobuf.StringValue",
"value": "entity-name"
}
},
"metadata": [
{"key": "owner", "value": "test-user"},
{"key": "version", "value": "1.0"},
{"key": "developer", "value": "V8A"}
],
"attributes": [],
"relationships": []
}'Read
curl -X GET http://localhost:8080/entities/12345Update
TODO: The update creates a new record and that's a bug, please fix it.
curl -X PUT http://localhost:8080/entities/12345 \
-H "Content-Type: application/json" \
-d '{
"id": "12345",
"kind": {
"major": "example",
"minor": "test"
},
"created": "2024-03-18T00:00:00Z",
"name": {
"startTime": "2024-03-18T00:00:00Z",
"value": "entity-name"
},
"metadata": [
{"key": "version", "value": "5.0"}
]
}'Delete
curl -X DELETE http://localhost:8080/entities/12345Retrieve Metadata
curl -X GET "http://localhost:8081/v1/entities/12345/metadata"Footnotes
-
Blob storage format has not yet been released. β©