FeatureCloud provide an advantageous platform which is accessible at FeatureCloud.ai
In an OO fashion, just by extending two classes, developers can use FeatureCloud Template for implementing one-shot or iterative applications. This template consists of three main classes to interact with FC Controller and execute the app-level tasks. Generally, two types of clients are used in FeatureCloud Template:
- Client: Every participant in the FeatureCloud platform is considered a client who should perform local tasks and communicate some intermediary results with the coordinator. No raw data are supposed to be exchanged among clients and the coordinator.
- Coordinator: One of the clients who can receive results of other clients, aggregate, and broadcast them.
Using the AppLogic class, users can define different states and make a flow to move from one to another.
Each state should be added to the states attributes, while there is no predefined order for executing states,
the flow direction will be handled using CustomLogic class. With current_state, developers know the flow and
determine which state they desire to move in.
We categorize attributes in the AppLogic class as follows:
- Controlling the flow:
states: Python dictionary that keeps names of states, as keys, and methods, as values.current_stateName of the current state, or the next state that a developer wants.status_available: Boolean attribute to signal the availability of data to the FeatureCloud Controller to share it.status_finished: Boolean variable to signal the end of app's execution to the FeatureCloud Controller.thread:iteration: Number of executed iterations.progress: Short descriptor of internal progress of app instance for the FeatureCloud Controller.
- General
id: ID of each participant, regardless of being client or coordinator.coordinator: Boolean flag indicating whether the running container is a coordinator or not.clients: Contains IDs of all participating clients.
- Data management:
- For communicating data:
data_incoming: list of data that was received.data_outgoing: list of data that should be shared.
- For I/O from the docker container:
INPUT_DIR: path to the directory inside the docker container for reading the input files.OUTPUT_DIR: path to the directory inside the docker container for writing the results.mode: Primarily used for indicating whether input files are stored in one folder or multiple folders.dir: The folder containing the input files.splits: A dictionary of possible splits(folder names containing the input data that are used for training)
- For communicating data:
AppLogic includes methods that we categorie them into two groups, one responsible for communicating data among
FeatureCloud clients, another to configure the app. In later category we have followings:
lazy_initializing: Developers can initialize some attributes in an arbitrary time. Currently, there are two attributes that should be initialized usinglazy_initialzingmethod becuase their value should be read from theconfig.ymlfile:mode: Eitherdierectoryorfileto determine how data is stored in the container.dir: The path to the directory containing input files.
finalize_config: Regardingmodeof input files, there will be some split keys to keep track of input data splits, and also corresponding intermediary or final results, in different states of the app. Accordingly,finalize_configconfigures split keys and creates output directory in the container. It is recommended to be called right afterlazy_initializing.app_flow: is responsible to run the developed state machine for clients and the coordinator, report states list, and controlling the flow.
These are the four methods in AppLogic class that facilitate communicating data between coordinator and clients.
send_to_server: should be called only for clients to send their data to the coordinator.get_clients_data: Should be called only for the coordinator to wait for the clients until receiving their data. For each split, corresponding clients' data will be yield back alongside splits name. Developers can treat the output as a generator, which gets the next splits data on demand.wait_for_server: Should be called only for clients to wait for coordinator until receiving broadcasted data.broadcast: should be called only for the coordinator to broadcast the same date to all clients.
There are two recommended way to implement apps using this template. One is extending AppLogic to define states, transition between them, communicate results to other clients, and app computations. And separating app flow and data communication from computations, which will be elaborated next in terms of defining CustomLogic and CustomApp.
CustomLogic is an extension of AppLogic, which defines all the states, determines the first state, and, more importantly,
implements the flow between states. Besides, controlling the flow, generally, we categorize states' tasks as
operational and/or communicational. For communicational states responsible for sharing or receiving data,
the method will be fully implemented and assigned to the state in CustomLogic class. For others, only the flow
related part will be implemented here, and the operation happens in CustomApp class. All the data-related
attributes, shared among clients, should be introduced in CustomLogic.
parameters: A dictionary that can contain any data that should be shared.workflows_states: A dictionary that can signal any messages to the coordinator or vice versa.
Methods are highly diverse regarding the target application; however, almost every application should include initializing and finalizing state and method.
init_stateread_inputfinal_step
CustomApp is an extension of CustomLogic that introduces all the required attributes and methods to execute the
app's task. Each state's method call its corresponding superclass method in CustomLogic to change the flow to
the next state, which was previously implemented.