From af6e5d43c2b54a1021b601c0eb34aefe46f6c77a Mon Sep 17 00:00:00 2001 From: Tom Cowland Date: Tue, 3 Jan 2023 12:46:05 +0000 Subject: [PATCH 1/5] Add "OpenFileIO" proposal for discussion --- OpenFileIO.md | 105 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 105 insertions(+) create mode 100644 OpenFileIO.md diff --git a/OpenFileIO.md b/OpenFileIO.md new file mode 100644 index 0000000..1ebe9e5 --- /dev/null +++ b/OpenFileIO.md @@ -0,0 +1,105 @@ +# "OpenFileIO" Proposal + +## Abstract + +This proposal calls for the creation of a minimal, open source (C/C++) +library based on the prior art of the OpenImageIO `IOProxy`[^1] and +Houdini File System[^2] APIs. Providing a common cross-platform, +cross-application extensible abstraction of a random access byte stream. +Its purpose is to aid the migration away from `fopen` et. al. as the +de-facto low-level data access mechanism within digital content creation +tooling and infrastructure. This is required to allow secure data access +via other means without the need for complex user-space filesystem +emulations or elaborate data orchestration techniques. + +> N.B. "OpenFileIO" is used colloquially as a place-holder name. + +## Problem Statement + +It is currently difficult to provide data access to processing +operations in media production pipelines though anything other than the +standard filesystem. + +This hampers industry adoption of modern storage, security and compute +architectures where data may be held in other forms - such +access-controlled object stores exposed through HTTP endpoints. + +## Background + +Within the Media and Entertainment sector Digital Content Creation (DCC) +tools require data to be read, processed and written as part of their +operation. Traditionally data is accessed via the standard filesystem +system calls, often targeting a network mount of a shared storage +resource. With the advent of modern distributed systems architectures, +there is a strong desire to use other mechanism for the secure +transport and storage of necessary data. + +The source code of the DCC tools involved is often proprietary and so +can not directly be modified to use alternate transport mechanisms. In +addition, filesystem semantics are often baked into application logic. +Consequently, there have been numerous attempts to wrap these alternate +technologies so they can be accessed through the filesystem itself (e.g. +FUSE[^3] layers, or indirectly via out-of-band data orchestration). All +of these require significant engineering effort and long-term +maintenance and often result in a fragile and incomplete solution. + +Conversation with the community surfaced that there are two notable +cases of prior-art[^1][^2] in this space. They form established and +validated solutions to this problem by providing light-weight shims that +present file-like interfaces bridging to custom data access +implementations. This makes their adoption into large, legacy code bases +a feasible proposition. + +All of these solutions are however, part of large (and in some cases +proprietary) projects. Consultation with their respective owners has +shown a strong willingness to collaborate on a distilled, +community-owned evolution of the core premise of these existing ideas. + +## Proposal + +Create a new, minimal, self-contained Apache 2.0 licensed project within +the ASWF community[^4] based on established solutions[^1][^2], to be +used as a base abstraction for byte level I/O within DCC tools and +supporting libraries where appropriate. + +Based on the nature of the existing solutions, this would most likely +look like a minimal API surface plus a C plugin system that uses URI +schemes to dispatch calls at run time to a suitable user-provided +handler. Default implementation for `file` would probably be provided. + +The library will be designed to facilitate low-friction integration into +existing filesystem-oriented codebases with minimal boilerplate/business +logic changes. + +### Goals + +- "Drop-in" replacement for existing filesystem based **byte stream** + access, that allows data access via arbitrary tertiary mechanisms. +- Easy migration path for existing solutions[^1][^2]. +- Run-time extensible to allow new data sources to be used with existing + binaries. +- Minimal API surface area +- Maximum language compatibility. + +### Out of Scope + +In order to ensure initial integration is feasible in real-world +projects, scope is deliberately limited to direct byte stream access. +The following topics could be investigated at a later date when and if +scenarios demand: + +- Full filesystem modelling (e.g. directories, resource iteration) +- Additional authorization/authentication flows. + +## Next steps + +- [ ] Ratify proposal by community members (by Feb 2023). +- [ ] Establish TSC. +- [ ] Formally define initial scope. + +## References + +[^1]: https://github.com/OpenImageIO/oiio/blob/3b8807df854253911934caaa9587013c47209f35/src/include/OpenImageIO/filesystem.h#L403 +[^2]: https://www.sidefx.com/docs/hdk/_h_d_k__f_s.html +[^3]: https://www.kernel.org/doc/html/latest/filesystems/fuse.html +[^4]: https://www.aswf.io From 1fd07f4b3f2e9a0d73a77fbf90ab471bb40899ed Mon Sep 17 00:00:00 2001 From: Tom Cowland Date: Fri, 6 Jan 2023 12:18:54 +0000 Subject: [PATCH 2/5] fixup! Add "OpenFileIO" proposal for discussion Co-authored-by: Sam Hudson --- OpenFileIO.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/OpenFileIO.md b/OpenFileIO.md index 1ebe9e5..748155f 100644 --- a/OpenFileIO.md +++ b/OpenFileIO.md @@ -21,7 +21,7 @@ operations in media production pipelines though anything other than the standard filesystem. This hampers industry adoption of modern storage, security and compute -architectures where data may be held in other forms - such +architectures where data may be held in other forms - such as access-controlled object stores exposed through HTTP endpoints. ## Background From 5a3cf4d3f71168da4623277eabd51616a5b600ec Mon Sep 17 00:00:00 2001 From: Tom Cowland Date: Fri, 6 Jan 2023 12:20:20 +0000 Subject: [PATCH 3/5] fixup! Add "OpenFileIO" proposal for discussion --- OpenFileIO.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/OpenFileIO.md b/OpenFileIO.md index 748155f..2ca3fa4 100644 --- a/OpenFileIO.md +++ b/OpenFileIO.md @@ -80,6 +80,8 @@ logic changes. binaries. - Minimal API surface area - Maximum language compatibility. +- Don't re-invent wheels, this is more about ease of extension than a + new low-level interface. ### Out of Scope From 17d326b100621f90591f3d53439a5653c9f70d70 Mon Sep 17 00:00:00 2001 From: Tom Cowland Date: Fri, 6 Jan 2023 18:08:36 +0000 Subject: [PATCH 4/5] fixup! Add "OpenFileIO" proposal for discussion --- OpenFileIO.md | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/OpenFileIO.md b/OpenFileIO.md index 2ca3fa4..af10bb1 100644 --- a/OpenFileIO.md +++ b/OpenFileIO.md @@ -86,9 +86,15 @@ logic changes. ### Out of Scope In order to ensure initial integration is feasible in real-world -projects, scope is deliberately limited to direct byte stream access. -The following topics could be investigated at a later date when and if -scenarios demand: +projects, the initial scope is deliberately limited to direct byte +stream access. + +It seems an important premise to separate out the fundamentals of data +access once some URL has been derived, and how that URL or its peers are +determined in the first place. + +The following topics could be investigated at a later stage and +added (potentially as additional "layers"): - Full filesystem modelling (e.g. directories, resource iteration) - Additional authorization/authentication flows. From b49bd940c188d0b58d81657a3b5f9f8fd360a5f6 Mon Sep 17 00:00:00 2001 From: Tom Cowland Date: Wed, 11 Jan 2023 14:29:37 +0000 Subject: [PATCH 5/5] fixup! Add "OpenFileIO" proposal for discussion --- OpenFileIO.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/OpenFileIO.md b/OpenFileIO.md index af10bb1..b313514 100644 --- a/OpenFileIO.md +++ b/OpenFileIO.md @@ -71,6 +71,9 @@ The library will be designed to facilitate low-friction integration into existing filesystem-oriented codebases with minimal boilerplate/business logic changes. +We hope this consideration will go some way to making the migration of +existing codebases a viable proposition. + ### Goals - "Drop-in" replacement for existing filesystem based **byte stream**