eG Monitoring
 

Measures reported by AWSAmazonLambdaTest

AWS Lambda is a compute service that lets you run code without provisioning or managing servers. In other words, AWS Lambda runs your code on a high-availability compute infrastructure and performs all of the administration of the compute resources, including server and operating system maintenance, capacity provisioning and automatic scaling, code monitoring and logging. All you need to do is author your code in a language that AWS Lambda supports (currently Node.js, Java, C#, Go and Python), and upload your application code to AWS Lambda in the form of one or more Lambda functions. Using AWS Lambda, you can even maintain multiple versions of your in-production function code, and can also create aliases for each of your function versions for easy reference.

Typically, AWS Lambda is used to run code in response to events, such as changes to data in an Amazon S3 bucket or an Amazon DynamoDB table; to run your code in response to HTTP requests using Amazon API Gateway; or invoke your code using API calls made using AWS SDKs.

In such scenarios, if the Lambda function code fails or takes too long to execute, it can stall or even completely stop data/request processing by critical AWS services (eg., Amazon S3, Amazon DynamoDB, Amazon API Gateway, etc.) that rely on that code for their operations. To pre-empt the failure/delay of critical AWS services, administrators need to monitor each Lambda function that these services use and promptly capture problems in the function code. This is exactly what the AWS Lambda test does!

This test automatically discovers the Lambda functions, monitors the invocations of each function, and in the process, reports latencies and errors/failures in function execution. This enables administrators to quickly and accurately identify slow and/or buggy functions, so that they take those functions and their codes up for closer review and fine-tuning.

Optionally, you can configure this test to report metrics for each version of a function or for every alias of a function version. This enables administrators to quickly compare the performance of different versions or aliases of a function, and then decide which version/alias to use in the production environment.

Outputs of the test : One set of results for each Lambda function / version / alias in every region.

First-level descriptor: AWS Region

Second-level descriptor: Function / Version / Alias, depending upon the option chosen from the LAMBDA FILTER NAME parameter of this test

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
Invocations By default, this measure represents the number of times this function was invoked in response to an event or invocation API call.

If the LAMBDA FILTER NAME is set to Version, then this measure represents the number of times this version of a function was invoked in response to an event or invocation API call.
Number Compare the value of this measure acros functions to know which function is used the maximum. The failure of such a function will naturally have a more adverse impact on performance and productivity than other functions.
Invoctn_Error By default, this measure represents the number of invocations of this function failed due to errors (response code 4xx)

If the LAMBDA FILTER NAME is set to Version, then this measure represents the number of number of invocations of this version of a function that failed due to errors.

If the LAMBDA FILTER NAME is set to Alias, then this measure represents the number of invocations of this alias that failed due to errors.
Number Ideally, the value of this measure should be 0. A non-zero value is indicative of one/more errors in the function code.

To know what errors occurred during invocation, check the logs. Typically, each time the code is executed in response to an event, it writes a log entry into the log group associated with a Lambda function, which is /aws/lambda/<function name>.

Following are some examples of errors that might show up in the logs:

  • If you see a stack trace in your log, there is probably an error in your code. Review your code and debug the error that the stack trace refers to.

  • If you see a permissions denied error in the log, the IAM role you have provided as an execution role may not have the necessary permissions. Check the IAM role and verify that it has all of the necessary permissions to access any AWS resources that your code references.

  • If you see a timeout exceeded error in the log, your timeout setting exceeds the run time of your function code. This may be because the timeout is too low, or the code is taking too long to execute.

  • If you see a memory exceeded error in the log, your memory setting is too low. Set it to a higher value. Typically, when creating a function, you need to mention the amount of memory that should be given to that function. Lambda uses this memory size to infer the amount of CPU and memory allocated to your function. Your function use-case determines your CPU and memory requirements. For example, a database operation might need less memory compared to an image processing function. The default value is 128 MB. The value must be a multiple of 64 MB.

Letter_Error By default, this measure represents the number of times Lambda could not write the failure of this function to the configured dead letter queues.

If the LAMBDA FILTER NAME is set to Version, then this measure represents the number of times Lambda could not write the failure of this version of a function to the configured dead letter queues.

If the LAMBDA FILTER NAME is set to Alias, then this measure represents the number of times Lambda could not write the failure of this alias to the configured dead letter queues.
Number By default, a failed Lambda function invoked asynchronously is retried twice, and then the event is discarded. Using Dead Letter Queues (DLQ), you can indicate to Lambda that unprocessed events should be sent to an Amazon SQS queue or Amazon SNS topic instead, where you can take further action.

If the value of this measure keeps increasing, it implies that the event payload is consistently failing to reach the dead letter queue. Probable cause for this are as follows:

  • Permissions errors

  • Throttles from downstream services

  • Misconfigured resources

  • Timeouts tidings

Invoctn_Duraton By default, this measure indicates the average elapsed wall clock time from when this function's code starts executing because of an invocation to when it stops executing.

If the LAMBDA FILTER NAME is set to Version, then this measure represents the average elapsed wall clock time from when this version of a function's code starts executing because of an invocation to when it stops executing.

If the LAMBDA FILTER NAME is set to Alias, then this measure represents the average elapsed wall clock time from when this alias starts executing because of an invocation to when it stops executing.
Number Ideally, the value of this measure should be low. A high value indicates that a function/version/alias is taking too long to execute.

To determine why there is increased latency in the execution of a Lambda function, do the following:

  • Test your code with different memory settings: If your code is taking too long to execute, it could be that it does not have enough compute resources to execute its logic. Try increasing the memory allocated to your function and testing the code again, using the Lambda console's test invoke functionality. You can see the memory used, code execution time, and memory allocated in the function log entries. Changing the memory setting can change how you are charged for duration.

  • Investigate the source of the execution bottleneck using logs: You can test your code locally, as you would with any other Node.js function, or you can test it within Lambda using the test invoke capability on the Lambda console, or using the asyncInvoke command by using AWS CLI. Each time the code is executed in response to an event, it writes a log entry into the log group associated with a Lambda function, which is named aws/lambda/<function name>. Add logging statements around various parts of your code, such as callouts to other services, to see how much time it takes to execute different parts of your code.

Invoctn_Throtle By default, this measure indicates the number of invocation attempts for this Lambda function that were throttled due to invocation rates exceeding the customer's concurrent limits (error code 429).

If the LAMBDA FILTER NAME is set to Version, then this measure represents the number of invocation attempts that were throttled for this version of the Lambda function due to invocation rates exceeding the customer’s concurrent limits (error code 429).

If the LAMBDA FILTER NAME is set to Alias, then this measure represents the number of invocation attempts that were throttled for the version of the function that maps to this alias, due to invocation rates exceeding the customer’s concurrent limits (error code 429).
Number The unit of scale for AWS Lambda is a concurrent execution (see Understanding Scaling Behavior for more details). However, scaling indefinitely is not desirable in all scenarios. For example, you may want to control your concurrency for cost reasons, or to regulate how long it takes you to process a batch of events, or to simply match it with a downstream resource. To assist with this, Lambda provides a concurrent execution limit control at both the account level and the function level.

On reaching the concurrency limit associated with a function, any further invocation requests to that function are throttled, i.e. the invocation doesn't execute your function. Each throttled invocation increases the value of this measure for the corresponding function.

AWS Lambda handles throttled invocation requests differently, depending on their source:

  • Event sources that aren't stream-based: Some of these event sources invoke a Lambda function synchronously, and others invoke it asynchronously. Handling is different for each:

    • Synchronous invocation: If the function is invoked synchronously and is throttled, Lambda returns a 429 error and the invoking service is responsible for retries. The ThrottledReason error code explains whether you ran into a function level throttle (if specified) or an account level throttle. Each service may have its own retry policy. For example, CloudWatch Logs retries the failed batch up to five times with delays between retries. For a list of event sources and their invocation type, see Supported Event Sources.

    • Asynchronous invocation: If your Lambda function is invoked asynchronously and is throttled, AWS Lambda automatically retries the throttled event for up to six hours, with delays between retries. Remember, asynchronous events are queued before they are used to invoke the Lambda function.

  • Stream-based event sources: For stream-based event sources (Kinesis and DynamoDB streams), AWS Lambda polls your stream and invokes your Lambda function. When your Lambda function is throttled, Lambda attempts to process the throttled batch of records until the time the data expires. This time period can be up to seven days for Kinesis. The throttled request is treated as blocking per shard, and Lambda doesn't read any new records from the shard until the throttled batch of records either expires or succeeds. If there is more than one shard in the stream, Lambda continues invoking on the non-throttled shards until one gets through.