Skill Level: Any Skill Level


  1. What Is Serverless Computing?

    Serverless computing is a software design and delivery method in which computing resources are provided as a cloud service, and users are completely unaware of the underlying infrastructure. 

    In traditional application deployments, the server’s computing resources are fixed and costs are incurred regardless of much computing work the server performs. In serverless computing, billing only occurs when the customer’s code is actually executed, and there is no charge for idle time. 

    Serverless computing does not eliminate servers, but its purpose is to remove considerations related to computer resources from the software design and development process. When hardware and infrastructure is fully managed by the cloud provider, developers can easily create back-end applications and handle events without managing servers, virtual machines (VMs), or computing resources.

  2. Serverless Computing Examples

    AWS Lambda

    AWS Lambda is a computing service that runs code based on events without the need to provision or manage servers. You only pay for the time spent running the code. Code is delivered in the form of serverless functions, and is run based on event triggers (for example, a user request, streaming data events, or a request from another application). 

    Lambda provides basic monitoring capabilities via Amazon CloudWatch, which provides logs of all serverless function activity.

    Azure Functions

    Microsoft Azure is another serverless platform. It has three main components: 

    • Azure Functions—similar to Lambda, lets you provide serverless functions and run them based on triggers, without managing the underlying servers
    • Logic Apps—which allows you to visualize your workflow and coordinate activities in Azure FunctionsAzure provides monitoring capabilities through the Application Insights service. It requires including a package into your serverless function that sends telemetry data to the service.
    • Event Grid—a routing service you can use to deliver messages between your serverless functions and other Azure services or resources

    Azure provides monitoring capabilities through the Application Insights service. It requires including a package into your serverless function that sends telemetry data to the service.

    Google Cloud Functions

    Google Cloud’s serverless platform is quite similar to Lambda and Azure Functions. It has a limit of only 1000 serverless functions per project. It also currently only supports Python, Javascript and Node.js. This service currently supports events from Google Cloud Storage (object change notification) and Google Cloud Pub/Sub, the Google message bug service.

    On the monitoring side, Google provides the Cloud Trace and Cloud Debugger services, which both support Cloud Functions and add observability to your serverless functions.

  3. Key Considerations for Serverless Monitoring

    Serverless monitoring is a challenge, because most traditional monitoring tools focus on servers and endpoints, and cannot see serverless infrastructure. Also, the operational metrics of serverless applications are very different from those of traditional or container-based applications. Here are several key considerations when setting up monitoring for a serverless environment.

    Issue Management and Team Collaboration

    All serverless applications experience bugs and issues, especially when they are in active development. Development and operations teams need an effective way to address these issues. With effective monitoring, teams can easily investigate and control issues in production systems, communicate the problem to others, and collaborate to resolve it. 

    Event-Driven Debugging

    Developers don’t have time to directly monitor application logs. A system is needed to generate alerts when something breaks. Serverless applications require monitoring systems that can not only generate alerts, but also prioritize alerts to help developers and operations teams focus on the most urgent and significant issues.

    Rapid Remediation

    When an issue occurs, developers need very specific information in order to diagnose and solve it. A monitoring infrastructure should not only pick up a problem, but also provide detailed information such as stack traces and event logs to help developers identify the root cause and resolve the problem.


  4. What Should You Be Monitoring in Serverless Applications?

    Key Concepts

    Here are some of the basic concepts you need to understand to monitor serverless applications. Each of these is also a metric you should watch in your production systems:

    • Invocations—the number of calls to execute the function code, including successful and failed executions.
    • Errors—the number of calls that had a function error. These include exceptions thrown in your code and exceptions thrown while the serverless platform is running, including timeouts. You can calculate error rate by dividing the number of invocations by the number of errors in a certain time period.
    • Throttles—serverless platforms sometimes have to throttle requests when all instances of your serverless functions are fully occupied and you did not allow additional scalability. This means the serverless platform will deny user requests, not because there is a problem with your function but because they exceeded the maximum capacity you defined.
    • Duration—the time your serverless function spent processing an event. For the first time a function instance handles an event, this will include initialization time. Duration is very significant because long durations can result in latency or other user-facing issues.


    “Push” Monitoring

    This type of monitoring takes the form of a server-installed agent that periodically collects data and sends it to a central server. This is known as “push data” monitoring, which is reactive. You need to get real application traffic in order to monitor how the application behaves, and there is still a delay between the actual event and when it is reported to the system. 

    Here are the main metrics you can gather with a “push” approach:

    • Function execution time
    • Database response time
    • Network latency
    • Memory usage
    • User experience metrics
    • Traffic sources


    “Pull” Monitoring

    Pull monitoring was typically not necessary in traditional applications, and is important in serverless systems. Serverless functions do not run statically, so a function should support the ability to actively report metrics, and send the data to a monitoring system.

    It is important to standardize how serverless functions expose this data. You should set up an endpoint on each function, which can be accessed by monitoring tools at the same folder, for example /logs or /status. In addition, you should plan a data schema for reporting, and always use the same structure across all serverless functions. Using JSON format is a good idea because it is supported by most monitoring tools

  5. Conclusion

    In this article I discussed the serverless computing landscape, key considerations for monitoring in serverless systems, and provided a few key metrics and methods you should use to instrument and monitor serverless functions. Serverless monitoring is a tough problem to solve, but if you get it right, you can reap the benefits of serverless infrastructure, learn about production issues quickly, and get the data developers need to resolve them.

Join The Discussion