Ensure Cloudwatch logs are not retained forever

, 8 minute read

By default AWS cloudwatch log events are retained forever because the default AWS retention setting for cloudwatch log groups is set to “Never Expire”. Not specifying the retention setting leads to cloudwatch logging costs continually increasing as more and more log events are stored in Cloudwatch. This post describes how to automatically manage retention settings for log groups to ensure cloudwatch costs are managed effectively.

Cloudwatch log storage is priced at $0.03 per GB stored per month (at time of writing) so is inexpensive and typically not seen as problematic, at least to begin with. However over time the cost of log storage can become a problem if a large amount of logging is generated by your AWS workloads, stored in Cloudwatch and then retained forever.

It should be noted that many log groups are created implicitly when using other AWS services. For example AWS lambda will typically create a log group for each new function automatically for you. In an enterprise setting it’s therefore not uncommon to quickly end up with hundreds of log groups where log events are retained forever and start costing serious money.

Therefore

In order to prevent costs from increasing month on month we need to ensure that all cloudwatch log groups have specified a retention setting that isn’t the default “Never Expire”.

Cloudwatch log groups can be configured to automatically delete logs based on the retention setting against each log group. Log events older than the retention setting will be automatically deleted by Cloudwatch and related storage costs will no longer be incurred. The retention setting can be set to various durations from 1 day to 10 years.

The following sections show:

  • how to identify log groups that retain log events forever
  • how to set the retention setting for log groups
  • ensure log groups always have a retention setting for zero effort ongoing management

Identifying log groups that retain log events forever

The “Never Expire” retention setting is represented as a missing retentionInDays property in the describe log groups API and CLI command. The following AWS CLI command selects all log groups without a retention setting, within the default region of a single AWS account.

aws logs describe-log-groups --query 'logGroups[?!not_null(retentionInDays)]'

Note the double negative in the query expression with ! (not) applied to not_null(retentionInDays) to select log groups with a missing retentionInDays setting.

Specifying log group retention

The log group retention setting can be set after a log group has been created in the AWS Console. See the AWS documentation.

Note, the log group retention setting cannot be specified during the initial creation of a log group within the AWS console, it can only subsequently be updated post creation.

Specifying log group retention using Cloudformation or Terraform

Both cloudformation and terraform support creation of cloudwatch log groups with a retention in days property in each case.

Bulk update existing log groups set with “Never Expire” retention

If a large number of log groups need to be updated, the following python3 script will iterate through all cloudwatch log groups in all regions of a single AWS account and will set the retention period to 7 days where the retention setting is not already set.

# bulk-update-log-group-retention-cli.py

import boto3


def main():
    for region_name in all_regions():
        update_retention_period_for_never_expiring_log_groups(region_name, retention_in_days = 7)


def update_retention_period_for_never_expiring_log_groups(region_name, retention_in_days):
    print("Processing log groups in region '{}' ...".format(region_name))

    logs_client = boto3.client("logs", region_name=region_name)

    for log_group in all_log_groups(logs_client):
        if "retentionInDays" not in log_group:
            update_log_group_retention_setting(logs_client, log_group["logGroupName"], retention_in_days)

    print("Processed all log groups in region '{}'.".format(region_name))


def update_log_group_retention_setting(logs_client, log_group_name, retention_in_days):
    logs_client.put_retention_policy(logGroupName=log_group_name, retentionInDays=retention_in_days)
    print(" - Updated retention setting for log group '{}' to {} days.".format(log_group_name, retention_in_days))


def all_regions():
    response = boto3.client("ec2").describe_regions()
    return [region["RegionName"] for region in response["Regions"]]


def all_log_groups(logs_client):
    all_log_groups = []
    paginator = logs_client.get_paginator("describe_log_groups")

    for page in paginator.paginate():
        all_log_groups.extend(page["logGroups"])

    return all_log_groups


if __name__ == "__main__":
    main()

Automate retention setting using AWS Lambda for zero effort management

The above command line script can be adapted into a lambda function which can be periodically executed using an EventBridge rule to automatically ensure log groups have a retention setting.

AWS component diagram

The lambda_handler function in the example below replaces the main function from the CLI example and an environment variable RETENTION_IN_DAYS is used to specify the desired retention setting for never expiring log groups.

# bulk-update-log-group-retention-sls.py

import boto3
import os


def lambda_handler(event, context):
    retention_in_days = int(os.environ.get("RETENTION_IN_DAYS", "7"))

    for region_name in all_regions():
        update_retention_period_for_never_expiring_log_groups(region_name, retention_in_days)

    return {
        "message": "Function completed successfully."
    }


def update_retention_period_for_never_expiring_log_groups(region_name, retention_in_days):
    print("Processing log groups in region '{}' ...".format(region_name))

    logs_client = boto3.client("logs", region_name=region_name)

    for log_group in all_log_groups(logs_client):
        if "retentionInDays" not in log_group:
            update_log_group_retention_setting(logs_client, log_group["logGroupName"], retention_in_days)

    print("Processed all log groups in region '{}'.".format(region_name))


def update_log_group_retention_setting(logs_client, log_group_name, retention_in_days):
    logs_client.put_retention_policy(logGroupName=log_group_name, retentionInDays=retention_in_days)
    print(" - Updated retention setting for log group '{}' to {} days.".format(log_group_name, retention_in_days))


def all_regions():
    response = boto3.client("ec2").describe_regions()
    return [region["RegionName"] for region in response["Regions"]]


def all_log_groups(logs_client):
    all_log_groups = []
    paginator = logs_client.get_paginator("describe_log_groups")

    for page in paginator.paginate():
        all_log_groups.extend(page["logGroups"])

    return all_log_groups

The lambda function can then be triggered on a schedule using an EventBridge rule to ensure any log groups created without a retention setting are reconfigured with a retention setting and therefore ensure log events are not retained forever.

Using the serverless framework it is simple to set up such an EventBridge rule by specifying a schedule based event to periodically trigger the function as follows.

functions:
  BulkUpdateLogGroupRetention:
    ...
    events:
      - schedule: rate(7 days)

The complete serverless.yml config file including the lambda function definition, event schedule and required IAM permissions is shown below:

# serverless.yml

service: cw-log-group-retention

provider:
  name: aws
  runtime: python3.8
  region: eu-west-1
  iamRoleStatements:
    - Effect: "Allow"
      Action:
        - "ec2:DescribeRegions"
        - "logs:DescribeLogGroups"
        - "logs:PutRetentionPolicy"
      Resource: "*"

functions:
  BulkUpdateLogGroupRetention:
    name: ${self:service}-${self:provider.stage, 'dev'}-bulk-update-log-group-retention-sls
    description: Update the retention setting of all cloudwatch log groups where they are set to never expire
    handler: bulk-update-log-group-retention-sls.lambda_handler
    timeout: 300
    memorySize: 128
    events:
      - schedule: rate(7 days)
    environment:
      RETENTION_IN_DAYS: 7

The full serverless app code is available in Github.

Conclusion

AWS Cloudwatch log groups by default will retain log events forever. To avoid Cloudwatch costs continuously increasing over time ensure all log groups specify a suitable retention setting to ensure old log events are expired. An AWS Lambda function triggered periodically by an Event Rule can be used to ensure all cloudwatch log groups have a suitable retention setting that ensures old log events are deleted and associated storage costs are managed effectively.