The Elyra open source project for JupyterLab aims to simplify common data science tasks. Its most popular feature is the Visual Pipeline Editor, which is used to create pipelines without the need for coding. You can run these pipelines in JupyterLab on Kubeflow Pipelines or Apache Airflow.
Elyra 3.0 extends the pipeline capabilities by adding experimental support for custom components. Before I dive into specifics and outline why support is still experimental in the initial releases, let’s recap a few concepts.
A pipeline comprises nodes that are connected with each other to form a graph. The graph defines dependencies between the nodes, governing the order in which the nodes are run. The example pipeline shown below executes a Python script and several Jupyter Notebooks.
Nodes are implemented using components. To create the pipeline shown above, you need components that can run Python scripts and Jupyter Notebooks. Most components are configurable to make them reusable. For file-based components, such a configuration might include the file name and the container image where the file is executed in.
In Elyra, processing of Jupyter Notebooks, Python scripts, and R scripts is implemented by using a single component. This component is referred to as a generic component because it is supported in all runtime environments.
The pipeline editor exposes this component under different names in the palette, which is located on the left side of the pipeline editor.
Pipelines that only include generic components are referred to as generic pipelines because you can run them in any runtime environment that Elyra supports.
Take a look at the tutorials if you are new to Elyra and would like to learn more about how to use the Visual Pipeline Editor to create a pipeline. If you’ve used Elyra before, we recommend reviewing the recently published best practices topic in the User Guide. We’ve only now gotten around to documenting some of the things that make your life easier!
Experimental support for custom components
Custom components are supported for Kubeflow Pipelines and Apache Airflow. They should implement a single task only, such as load data, train a model, or send an email. Information about custom components is stored in a local registry, which is exposed in the pipeline editor palette.
The following image depicts the Visual Pipeline Editor for Apache Airflow pipelines. Note that the palette contains generic components and custom components for Apache Airflow.
To get you going quickly, the component registry includes a few example custom components. You can also create and add your own components or add third-party components.
Pipelines that are associated with a single runtime are called runtime-specific pipelines.
Get started with pipelines
After you’ve installed Elyra, it is easy to get started. The JupyterLab launcher now includes under the Elyra category tiles for each pipeline type: generic pipelines, Kubeflow Pipeline pipelines, and Apache Airflow pipelines.
Select the pipeline editor that you need, and you are ready to assemble a basic pipeline. Note that it is not possible to convert a pipeline from one type to another.
Opportunities for growth
In the initial 3.0 release, Elyra’s support for custom components is rather limited. Many features are still under development, planned for a future release, or in the backlog without a specific target release. Some of the high-priority features for the next releases are:
- Data exchange between custom components: Components commonly produce outputs that other components require as input. Currently, custom components are isolated from each other and cannot exchange data. (Data exchange between generic components is already supported.)
- Data exchange between generic components and custom components: Same as above.
- Manage component registry: Provide a UI and/or CLI that allows for the addition, editing, or deletion of components. Currently, components can only be managed manually.
For an up-to-date feature status, refer to this forum thread.
Use Watson Studio services in pipelines
Pipelines can also take advantage of external services using custom components. If you are looking for a managed solution for Watson Studio services, check out this IBM Watson Pipeline article. It illustrates how to run notebooks, refine data, run AutoAI experiments, and deploy a model.
Your opportunity to help us improve Elyra
Elyra is a fairly new open source project that is currently maintained by a small community of JupyterLab enthusiasts. We welcome contributions of any kind, such as feedback, bug reports, bug fixes, features, or documentation. To learn more about how you can make a difference, refer to the Getting help topic in the documentation.
On behalf of the community: Thank You!