Serverless computing is an interesting evolution of Platform-as-a-Service where the unit of computation is a function rather than a whole application. This matches well for some cloud workloads that react to events, e.g., IoT applications, and since no servers are provisioned, the resulting application should only consume the needed resources. This potentially means reduced costs and also reduced operational efforts. However, what is not clear is whether this reduction of resource usage and operational efforts are valid or even verifiable and whether these reductions are consistent across all serverless computing platforms.
As cloud computing transforms from owning the infrastructure with infrastructure-as-a-Service (Iaas), to leasing the platform with Platform-as-a-Service (PaaS), to utilizing applications or software-as-a-service, one step in the hierarchy between the last two is to control modular functions as services. Serverless computing is in line with the trends of cloud computing and moving towards modularizing applications as small micro services. The advantage of serverless environments is in delegating the job of running service functions to cloud provider, thereby allowing them to decide how to manage execution. Amazonâ€™s lambda service, Iron.io workers, Google Cloud functions, Microsoftâ€™s Web functions, and IBMâ€™s OpenWhisk are all incarnations of this idea in various stages of development and adoption.
One important class of applications that can potentially benefit from serverless functions, is the the Web-class of workloads designed for short-lived requests received intermittently from multiple clients. Given that these clients can come at different frequencies, with arbitrary amounts of payload, and arbitrary computational requirements, the question is, how can we identify the right platform for the type of function we need to deploy to the cloud? In other words, what factors need to be looked into in order for the right serverless computing platform to be chosen?
To facilitate making the right serverless computing platform choice, we created the beginnings of a methodology, experiments and tests that can provide us with some guiding measures on when and how such cloud platform features are beneficial for your workloads.
What to Look for in a Serverless Platform?
We identified and designed experiments that allowed us to collect data on the performance of existing platforms by putting stress on their offered resources. Evidently, various tests are possible and we do not claim to have identified an exhaustive list of dimensions for functions to run on serverless platforms, but instead, the goal is to provide an understanding of what needs to be considered when choosing a serverless platform.
We looked into four categories of applications, exploring each in three dimensions. The dimensions allowed us to explore how the behavior of a deployed function changed when we altered these dimensions.
Obviously different applications may utilize each of the categories or dimensions at different levels of intensity. However by looking into a diverse range of values for the dimensions, we can better understand how to calibrate what to look for in a serverless platform.
In Fall 2016, we ran experiments for all categories of applications and across all dimensions mentioned above to understand the behavior of existing serverless platforms. We will be presenting the details of our findings during CloudExpo 2017 in NewYork City and Cloud Foundry Summit 2017 in Santa Clara. For the sake of brevity, here is a summary list of observations we made while running our experiments that you may want to consider when deploying applications to a serverless platform.
Be aware of timeouts
For all the experiments, we have observed that, if your functions take unlimited time to respond, you will most likely hit a timeout. For Lambda functions, the default timeout is 30 seconds if the endpoints are accessed from outside the virtual private cloud (VPC). Azure functions would timeout after 10 seconds. If you have applications that potentially may run for longer than 30 seconds before a response is generated, make sure your cloud functions are used to submit a background job that can be picked up by a worker process capable of running the job for longer.
Know how your serverless platform scales
We noticed that cloud providers have different approaches when dealing with scalability of serverless functions. One strategy is to have the function code loaded into a container and then freeze the state of the container using cgroup freezing such that it does not use any resource except for some disk space.
Another strategy is to create containers on-demand and as the requests exceed the capacities of the existing deployment size. The first approach is more efficient in getting containers up and running at the expense of having some additional disk space dedicated to these containers when they are in an unused state. The second approach is more optimized in disk usage. In summary, it is important to know how your serverless platform responds to spiked loads.
Plan for unforeseen load
Higher number of requests to a deployed function generally implies higher number of concurrent requests, increased load on the containers running the function, requests taking longer, and hence increasing the possibilities of hitting the platform timeouts. Know the behavior of your functions and the platform your functions are deployed to. It is important to know when requests time out in the platform, how quickly the platform can scale the functions, and how many requests may tip the deployment over.
Does more money mean better performance?
At the end of the day, a serverless platform is preferred over a more traditional PaaS platform mostly due to monetary reason. So we asked the question of whether spending more money implies better performance when using a serverless platform? As serverless platforms become more mature and stable, spending more money on the platforms helps with allocating more resources to the deployed functions, hence providing better performance.
However, issues like timeout and container management are inherent to the underlying platform and depending on the type of workload, a chosen platform may not necessarily improve performance of serverless functions even if more money is spent on the underlying resource. Bottom line is, the performance of the serverless platforms and their deployed functions is mostly determined by how computing resources are managed and allocated in these platforms, rather than how much of these resources are allocated to running the target functions.
As you can see no one experiment can help answer the question of which serverless platform is the best for your workloads. Understanding the platform’s limitations and your workloadsâ€™ specifications are first steps in being able to select the correct serverless platform, or to an extreme, determining whether this new technology is best suited for your needs.
While our approach is generic enough to be used for many kinds of serverless workloads and be applied to different platforms uniformly, it’s important to note that there are threats to the validity of our findings. The most obvious one is that the results reflect experimentsâ€™ observations during our execution period (Fall of 2016). And obviously the various platforms have changed and perhaps improved. This means that our results must be continually be updated as observations might need to be adjusted with time.
It’s worth noting that we are not anticipating that the observations be invalidated but instead their relative importance might change and clearly new observations could surface. To address this issue, our solution is to create a benchmark for serverless platforms that could provide constant streams of data that are up to date to the evolution of the various platforms. And as new serverless platforms are introduced, running our experiments on the new platforms would help gauge and improve our methodology and tests so that they can be truly generic and serve as the benchmark we hope them to be.
In conclusion, we invite you again to CloudExpo 2017 in New York and to the CloudFoundry Summit 2017 in Santa Clara, CA to see a detailed presentation of our findings. We would also love to hear about how you are evaluating serverless platforms and what roadblocks you have reached, if any, in your journey to serverless. In the meantime, if you have a question or would like to engage outside of these events, consider reaching out to us on Twitter @nimak and @maximilien or on CloudFoundry Slack usernames: @nimak and @dr.max.