Active Data

Active Data features

Active Data is a model and software that help managing distributed data life cycles.
For any data item, its life cycle represents the operational steps it will go through from its creation to its deletion from a system: creation, writes, reads, replication, transfers, derivations, deletion, etc.

Life cycle model

Active Data offers a meta-model to formally represent life cycles. An Active Data model looks a lot like a Petri Network: data states are represented by cercles and one possible change from one state to another is represented by a rectangle. Cercles are called Places and rectangles are called transitions. Tokens on places represent the state of a data item and replication is represented with several tokens.
Multiple tokens on different places represent the distributed state of data in several systems at the same time.

Life cycle models can be composed together to model more complex life cycles.

For example, says that you have several replicas of the same data object on a storage service and you want to transfer it. The transfer service creates a life cycle just for the transfer (e.g. start, transferring, pause, restart, success, failureā€¦); the storage service creates a different life cycle (e.g. create, write, read, move, replicate, split, deleteā€¦).
Composing both, the whole life cycle will be exposed on a single view and Active Data automatically makes sense of the different data identifiers in different systems.

Programming model

Active Data allows users to react to life cycle progression. Reacting can be getting notifications or executing arbitrary code when a transition in the model occurs.

Code is executed by clients independently of other clients in an event driven fashion. Data management tasks implemented with Active Data require very few code and are implicitly parallel.

Runtime library

Active Data is implemented as a stand-alone Java library that provides:

  • an API to implement life cycle models;
  • an API to provide handler code to be automatically executed when transitions are published;
  • a centralized service, the Active Data Service to receive transition publications from clients, forward them to other clients and orchestrate everything;
  • an API to publish life cycle transitions to the Active Data Service.

The library is very lightweight, which makes it easy to integrate to your applications and to existing data management software.

Error checking

The Active Data Service takes life cycle models as input and is able to dynamically check that changes of data states are valid with respect to the model.
The service maintains the global state of all data life cycles in the system; when a system performs an action on data, the service checks the action against the current global data state and decides if it legal or not. If the current data state is incompatible with the action reported by the system, an exception is raised.

This feature enables to quickly detect abnormal operations and prevent your data to be in a state that is consistent with a particular system but not your personal workflow.

Life cycle catalog

Active Data provides a life cycle catalog that maintains the global state of your data. Say that your favourite data item is stored on an infrastructure and a computation using it is running on another; the state of both replicas can be queried from the Active Data Service using the client API. Decisions can be made locally based on the global state of your data. Because the service knows every life cycle transition that ocures and performs error checking, the the life cycles it maintains remains consistent over time.