Ensure Cloudwatch logs are not retained forever
, 8 minute read
By default AWS cloudwatch log events are retained forever because the default AWS retention setting for cloudwatch log groups is set to “Never Expire”. Not specifying the retention setting leads to cloudwatch logging costs continually increasing as more and more log events are stored in Cloudwatch. This post describes how to automatically manage retention settings for log groups to ensure cloudwatch costs are managed effectively.
Cloudwatch log storage is priced at $0.03 per GB stored per month (at time of writing) so is inexpensive and typically not seen as problematic, at least to begin with. However over time the cost of log storage can become a problem if a large amount of logging is generated by your AWS workloads, stored in Cloudwatch and then retained forever.
It should be noted that many log groups are created implicitly when using other AWS services. For example AWS lambda will typically create a log group for each new function automatically for you. In an enterprise setting it’s therefore not uncommon to quickly end up with hundreds of log groups where log events are retained forever and start costing serious money.
Therefore
In order to prevent costs from increasing month on month we need to ensure that all cloudwatch log groups have specified a retention setting that isn’t the default “Never Expire”.
Cloudwatch log groups can be configured to automatically delete logs based on the retention setting against each log group. Log events older than the retention setting will be automatically deleted by Cloudwatch and related storage costs will no longer be incurred. The retention setting can be set to various durations from 1 day to 10 years.
The following sections show:
- how to identify log groups that retain log events forever
- how to set the retention setting for log groups
- ensure log groups always have a retention setting for zero effort ongoing management
Identifying log groups that retain log events forever
The “Never Expire” retention setting is represented as a missing retentionInDays property in the describe log groups API and CLI command. The following AWS CLI command selects all log groups without a retention setting, within the default region of a single AWS account.
aws logs describe-log-groups --query 'logGroups[?!not_null(retentionInDays)]'
Note the double negative in the query expression with ! (not) applied to not_null(retentionInDays) to select log groups with a missing retentionInDays setting.
Specifying log group retention
The log group retention setting can be set after a log group has been created in the AWS Console. See the AWS documentation.
Note, the log group retention setting cannot be specified during the initial creation of a log group within the AWS console, it can only subsequently be updated post creation.
Specifying log group retention using Cloudformation or Terraform
Both cloudformation and terraform support creation of cloudwatch log groups with a retention in days property in each case.
- https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-logs-loggroup.html
- https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_log_group
Bulk update existing log groups set with “Never Expire” retention
If a large number of log groups need to be updated, the following python3 script will iterate through all cloudwatch log groups in all regions of a single AWS account and will set the retention period to 7 days where the retention setting is not already set.
# bulk-update-log-group-retention-cli.py
import boto3
def main():
for region_name in all_regions():
update_retention_period_for_never_expiring_log_groups(region_name, retention_in_days = 7)
def update_retention_period_for_never_expiring_log_groups(region_name, retention_in_days):
print("Processing log groups in region '{}' ...".format(region_name))
logs_client = boto3.client("logs", region_name=region_name)
for log_group in all_log_groups(logs_client):
if "retentionInDays" not in log_group:
update_log_group_retention_setting(logs_client, log_group["logGroupName"], retention_in_days)
print("Processed all log groups in region '{}'.".format(region_name))
def update_log_group_retention_setting(logs_client, log_group_name, retention_in_days):
logs_client.put_retention_policy(logGroupName=log_group_name, retentionInDays=retention_in_days)
print(" - Updated retention setting for log group '{}' to {} days.".format(log_group_name, retention_in_days))
def all_regions():
response = boto3.client("ec2").describe_regions()
return [region["RegionName"] for region in response["Regions"]]
def all_log_groups(logs_client):
all_log_groups = []
paginator = logs_client.get_paginator("describe_log_groups")
for page in paginator.paginate():
all_log_groups.extend(page["logGroups"])
return all_log_groups
if __name__ == "__main__":
main()
Automate retention setting using AWS Lambda for zero effort management
The above command line script can be adapted into a lambda function which can be periodically executed using an EventBridge rule to automatically ensure log groups have a retention setting.
The lambda_handler function in the example below replaces the main function from the CLI example and an environment variable RETENTION_IN_DAYS is used to specify the desired retention setting for never expiring log groups.
# bulk-update-log-group-retention-sls.py
import boto3
import os
def lambda_handler(event, context):
retention_in_days = int(os.environ.get("RETENTION_IN_DAYS", "7"))
for region_name in all_regions():
update_retention_period_for_never_expiring_log_groups(region_name, retention_in_days)
return {
"message": "Function completed successfully."
}
def update_retention_period_for_never_expiring_log_groups(region_name, retention_in_days):
print("Processing log groups in region '{}' ...".format(region_name))
logs_client = boto3.client("logs", region_name=region_name)
for log_group in all_log_groups(logs_client):
if "retentionInDays" not in log_group:
update_log_group_retention_setting(logs_client, log_group["logGroupName"], retention_in_days)
print("Processed all log groups in region '{}'.".format(region_name))
def update_log_group_retention_setting(logs_client, log_group_name, retention_in_days):
logs_client.put_retention_policy(logGroupName=log_group_name, retentionInDays=retention_in_days)
print(" - Updated retention setting for log group '{}' to {} days.".format(log_group_name, retention_in_days))
def all_regions():
response = boto3.client("ec2").describe_regions()
return [region["RegionName"] for region in response["Regions"]]
def all_log_groups(logs_client):
all_log_groups = []
paginator = logs_client.get_paginator("describe_log_groups")
for page in paginator.paginate():
all_log_groups.extend(page["logGroups"])
return all_log_groups
The lambda function can then be triggered on a schedule using an EventBridge rule to ensure any log groups created without a retention setting are reconfigured with a retention setting and therefore ensure log events are not retained forever.
Using the serverless framework it is simple to set up such an EventBridge rule by specifying a schedule based event to periodically trigger the function as follows.
functions:
BulkUpdateLogGroupRetention:
...
events:
- schedule: rate(7 days)
The complete serverless.yml config file including the lambda function definition, event schedule and required IAM permissions is shown below:
# serverless.yml
service: cw-log-group-retention
provider:
name: aws
runtime: python3.8
region: eu-west-1
iamRoleStatements:
- Effect: "Allow"
Action:
- "ec2:DescribeRegions"
- "logs:DescribeLogGroups"
- "logs:PutRetentionPolicy"
Resource: "*"
functions:
BulkUpdateLogGroupRetention:
name: ${self:service}-${self:provider.stage, 'dev'}-bulk-update-log-group-retention-sls
description: Update the retention setting of all cloudwatch log groups where they are set to never expire
handler: bulk-update-log-group-retention-sls.lambda_handler
timeout: 300
memorySize: 128
events:
- schedule: rate(7 days)
environment:
RETENTION_IN_DAYS: 7
The full serverless app code is available in Github.
Conclusion
AWS Cloudwatch log groups by default will retain log events forever. To avoid Cloudwatch costs continuously increasing over time ensure all log groups specify a suitable retention setting to ensure old log events are expired. An AWS Lambda function triggered periodically by an Event Rule can be used to ensure all cloudwatch log groups have a suitable retention setting that ensures old log events are deleted and associated storage costs are managed effectively.
Useful links:
- Installing the AWS CLI
- See the aws logs describe-log-groups command docs for usage details of the AWS command to list cloudwatch log groups
- See the query docs for specifying query expressions with JMESPath to control AWS CLI output
- See the filter docs for matching and filtering resources based on property values, although note negation or testing for the absence of a property is not supported
- Installing the Serverless framework
- Serverless configuration reference
- Serverless docs for schedule events