We are pleased to announce our new open source project, Jupyter Enterprise Gateway!
Enterprise Gateway is a lightweight, multi-tenant, scalable, and secure gateway that enables Jupyter Notebooks to share resources across an Apache Spark cluster. Built on the Jupyter Kernel Gateway (JKG), the new Enterprise Gateway extends JKG’s headless web server functionality for interacting with notebook kernels and introduces new enterprise-level capabilities.
Founded in academia, the Jupyter projects provide a rich and popular set of applications for interacting with and iterating on large and complex applications. Although these projects have been truly ground-breaking, academia and corporate enterprises often have different needs, so we set out to fulfill the requirements for the enterprise by using existing offerings.
When we first attempted to integrate Jupyter Kernel Gateway into an enterprise application, we immediately faced challenges that couldn’t easily be addressed. Data scientists tend to run frequent and large workloads against large Apache Spark clusters. As a result, we quickly found that the JKG server became a bottleneck because the co-located Spark driver application for these kinds of workloads (in this case, the kernel process running on behalf of notebook cells) was extremely resource-intensive. Add in a team or organization of data scientists and you quickly saturate the compute resources of the Kernel Gateway server. We needed a better solution.
Jupyter Enterprise Gateway enables the ability to run kernels, that is, Spark driver applications, as managed resources (currently using YARN). As a result, the kernel is no longer fixed to the gateway server and is instead assigned to run across a cluster of servers, which is best determined by the associated resource manager. This solution off-loads expensive resources on the single gateway server and distributes the running kernels across the enterprise, dramatically increasing the number of simultaneously running kernels.
We soon encountered another issue, the kernel process ran under the same user id as the gateway process. This introduced issues across the notebook user domain because each notebook user tended to use specific libraries and packages within their notebooks, leading to consistency conflicts within the team.
By running as a managed resource, Jupyter Enterprise Gateway leverages the capabilities of the resource manager and isolates notebook users to their own space or sandbox. In some cases, depending on the capabilities of the underlying resource manager, you can configure actual impersonation, enhancing the overall user experience and security.
Jupyter Enterprise Gateway increases security measures. Although kernels are running within the isolated network of managed services, cloud providers still require network security. As a result, traffic sent over the internal ZeroMQ sockets is always encrypted.
More to come
We’re very excited to participate in the evolution of the Jupyter Enterprise Gateway. Planned updates include:
- Kernel configuration profiles
- Administration UI
- Support for additional resource managers
- User environments
Check out the project overview today. We welcome your contributions as we embark on this journey together!