Software

The Interactive Execution Environment (IEE) is a one-stop entry point which enables access to the PROCESS infrastructure and execution of computational tasks on the underlying HPC resources.

IEE works by providing access to configurable computational pipelines which represent individual use cases. Each pipeline may be launched an arbitrary number of times, with vatrying parameters. IEE takes care of marshalling the required resources, scheduling data transfers and executing the computational steps which make up the use cases. IEE can schedule computations on traditional HPC resources (computing clusters) and cloud resources (via the Cloudify extension). It can also communicate with bespoke REST interfaces offered by providers of external computational services – such as with the AgroCopernicus UC.

IEE manages the required delegation of security credentials and abstracts away all details related to interaction with various types of computational resources.

Rimrock enables management of scientific computation with the use of modern interfaces based on REST (Representational State Transfer). By using REST, access to services, applications and advanced scripts deployed on the infrastructure becomes straightforward. Main advantages of the presented service include ability to use any technology and integrate solution for secure access to Computing Infrastructures.

The Rimrock service allows to use its functionalities independently of any programming language chosen to build applications on top of the computing infrastructure. It is therefore possible to create web and desktop applications as well as prepare advanced computation scripts. An interesting approach also supported by the service is the ability to develop web applications, which can be run solely in the user web browser, minimizing the role of server-side software.

Before PROCESS it has been successfully used during development of a web application in the domain of energy sector, allowing for harnessing the computing power of the PLGrid infrastructure for analysis of different scenarios of building a national power grid and their influence on the environment and human health. Currently it is a basis for multiple other Use Cases available in the scope of the PROCESS Project providing smooth integration with the required resources.
All data exchanged with the Rimrock service is fully secured with an encrypted connection and for user authorization a temporary user certificate is used ensuring that long-lived credentials would not leak giving permanent access to the attacker.

Moving data in the exascale era is very time consuming because moving data is bound by the underlying physical limitations. One exabyte of data takes around 3 years to transfer on the faster, state of the art interconnects. At the same time projects like the Square Kilometer Array are expected to generate an exabyte of data every day or so. This puts a lot of strain on how to handle and process the data which is globally distributed and difficult to move.

Our approach, in PROCESS, is that the data services can not be viewed as just software components but a more holistic approach is needed which takes into consideration the underlying infrastructure, the flow of data, the processing and the services. This leads to a smart infrastructure where we can, for example, move the compute to the data instead of moving the data to compute or split the processing of data several parts each done on different parts of the network.

LOBCDER introduces the concept of micro-infrastructure which uses container technologies to combine data services with infrastructure to create a smart data infrastructure. This means that development of services, now, consists of two parts; the description of the underlying container infrastructure (micro-infrastructure) and the development of the different software services that fit into the micro-infrastructure.

Datanet

DataNet enables lightweight metadata management backed by a flexible database which allows convenient access to the stored objects. One of the main goals of DataNet is to make it usable from the largest set of languages and platforms possible. That is why we used the HTTP protocol as a basis for transferring data between backing servers and the service, and − to make it even more convenient − we applied the REST methodology to structure the messages sent to and from the repositories, which makes the integration process straightforward.

Originally the DataNet was developed and tested in the scope of the PLGrid project. At that point it utilized the state of art Platform-as-a-Service component which ensures scaling and database service provisioning for structured data. In the scope of the PROCESS project the solution has been reengineered to reflect on the rapid progress in development of such platforms and was based on the solution for orchestration of the Docker containers that ensures ability to run DataNet on wide range of infrastructures. Platform also has been extended to support non-structured metadata.

The DataNet also features in-transit encryption to ensure the security of the metadata while being moved between the components. The pluggable security mechanism also ensures that access to the platform is restricted to the appropriate group of people to prevent leakage of the stored data. The solution’s API enable both integrated with other PROCESS components such as the IEE Portal as well as direct access from external components.

Cloudify is an open source cloud orchestration platform (*), designed to automate the deployment, configuration and remediation of application and network services across hybrid cloud and stack environments. It uses OASIS TOSCA templates written in YAML (called blueprints in Cloudify) for defining applications, services and dependencies among them. These blueprint files describe also the execution plans for the lifecycle of the application for installing, starting, terminating, orchestrating and monitoring the application stack. Cloudify uses the blueprint as input that describes the deployment plan and is responsible for executing it on the cloud environment

Basic terms/features:

  • Cloudify orchestrates services (as Jupyter portal, DISPEL gateway, cloud storages, virtual clusters -HPC/Spark/Kubernetes-), not jobs
  • Blueprints (packages of package containing TOSCA template + scripts/data of service) are uploaded to Cloudify manager server before service deployments
  • Deployments (instances of blueprints) are not instantiated immediately after creation, only abstractions/handles of the deployments are created
  • Workflow executions (actions of deployments) are created by default in every TOSCA template

Rest API:
For for all operations related to service orchestratio. The API is used by command-line clients (CLI) or scientific gateways (GUI) for deployment and management of services. After describing services in TOSCA templates, users can deploy/undeploy instances of services described in the blueprints via deployment API or execute a specific workflow

More info:
About Cloudify: https://docs.cloudify.co/4.5.0/about/
Details of REST API: https://docs.cloudify.co/4.5.0/developer/apis/rest-service/

(*): Process for automated configuration, deployment and other management activities of services and applications in the cloud. It can automate the execution of different service workflows including deployment, initialization, start/stop, scaling, healing of services based on standardized descriptions of composed services, relations between components and their requirements.

Brane is a programmatic approach to constructing research infrastructures based on the separation of concerns principle: that is, different tooling and abstractions provided for each level of the technical stack and associated roles. Therefore, top-level applications can be written in a DSL by domain scientists, while underlying (optimized) routines are implemented by the relevant experts.
The result: a simpler and easier to use solution while capturing and controling the entire, distributed, technical stack.

Brane features:

  • A simple DSL that can be used with limited to none programming experience.
  • Interactive computing, with a detach/attach mechanisms and visual monitoring.
  • A programmatic approach to constructing research infrastructures.
  • A high-performance event-driven runtime based on microservices.

 

TO KNOW MORE ABOUT THE PORTFOLIO OF PROCESS SOFTWARE, VISIT: