Taxonomy Icon

Deep Learning

About this video

Training deep neural network models requires a highly tuned system with the right combination of software, drivers, compute, memory, network, and storage resources. Deep learning frameworks such as TensorFlow, PyTorch, Caffe, Torch, Theano, and MXNet have contributed to the popularity of deep learning by reducing the effort and skills needed to design, train, and use deep learning models. Fabric for Deep Learning (FfDL, pronounced &fiddle&) provides a consistent way to run these deep learning frameworks as a service on Kubernetes.

In this talk we will dive into the open source project FfDL. It uses a microservices architecture to reduce coupling between components, keep each component simple and as stateless as possible, isolate component failures, and allow each component to be developed, tested, deployed, scaled, and upgraded independently.