Jupyter Enterprise Gateway
A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across an Apache Spark cluster.
Jupyter Enterprise Gateway is a lightweight, multi-tenant, scalable and secure gateway. With Jupyter Enterprise Gateway, you can enable Jupyter Notebooks to share resources across an Apache Spark cluster and extend Jupyter Kernel Gateway with enterprise-level capabilities, such as optimized cluster resource utilization and multi-user support.
Jupyter Enterprise Gateway is based on three primary themes:
Optimized Resource Allocation
- Provides the ability to run Jupyter Notebook kernels in Spark YARN cluster mode to better utilize cluster resources.
- Implements a pluggable architecture for additional Spark resource managers.
Multi-user support with user impersonation within Kerberos-enabled clusters
- Provides enhanced security and user isolation for all kernel activity utilizing Spark’s –proxy-user functionality.
- Enables the same user ID for notebook and batch jobs.
- Implements a Kerberos-enabled cluster to provide rich support for user impersonation.
- Encrypts all kernel communication.
Currently, all notebook-based offerings launch their kernels local to the server providing the service. In large Apache Spark installations, this equates to many resource-intensive applications running on the same server — YARN client mode — which introduces a bottleneck for teams of data scientists performing work.
Jupyter Enterprise Gateway introduces the ability to launch kernels as managed resources within Spark clusters — that is, YARN cluster mode — which was previously not possible for Jupyter kernels. This enables the number of kernels to increase linearly based on the available cluster resources, as demonstrated in the graph below:
To accomplish these distributed capabilities, we wrap the target kernel’s invocation with what we call “kernel launchers.” This enables us to implement additional capabilities without any modification to the underlying kernel implementations (such as auto-creation of Spark contexts for kernels that don’t provide that functionality). In addition, the way to launch a given kernel is conveyed within the kernelspec file, which we’ve also extended within Jupyter Enterprise Gateway. As a result, we include kernel launchers and kernelspec files for the following kernels (all of which include automatic and delayed Spark context initialization):
- Python/Spark 2.x with iPython Kernel
- Scala 2.11/Spark 2.x with Apache Toree Kernel
- R/Spark 2.x with IRKernel Kernel
We’ll be looking to update the following topic areas in the future:
Kernel configuration profile
- Enable a client to request different resource configurations for kernels (for example, small, medium, large).
- Update profiles so that they are defined by administrators and enabled for users and/or groups.
- Provide a dashboard with running kernels
- Update lifecycle management, including time running, stop/kill, profile management, and other functions
- Add support for other resource managers such as Kubernetes
- User environments
- High availability
- Batch REST APIs
Why should I contribute to Jupyter Enterprise Gateway?
We’re pleased with the progress we’ve made with Jupyter Enterprise Gateway, but we’re not satisfied. There’s a lot more to accomplish, and we need your help. If you’re interested in Jupyter Enterprise Gateway and what it has to offer, we’d love for you to join our community and make the project even better. By sharing your insight and experience, you’ll help solidify Jupyter Enterprise Gateway’s presence within the larger data science ecosystem, and that’s something we can all benefit from and be proud of.
What technology problem will I help solve?
Jupyter Enterprise Gateway has identified a gap in data analytics tooling: how to fully leverage cluster resources within an enterprise while providing data scientists autonomy over their notebooks. Through Jupyter Enterprise Gateway, corporate enterprises and cloud providers alike can maximize the amount of resource-intensive work they accomplish, increasing their productivity and improving user experiences.
This work isn’t easy, and there are a lot of problems to overcome that haven’t previously been encountered. That’s the challenge — how can we best optimize resource utilization given the basic requirements and constraints of the Jupyter ecosystem?
Visit the Jupyter Enterprise Gateway website! You can download the latest build, find out what Jupyter Enterprise Gateway is all about, and where we’re headed.