We're big users of PostgreSQL for our databases, all managed by RDS. Things generally hum along smoothly and RDS provides a lot of database-level metrics for troubleshooting.
Often though, problems cannot be fully identified without looking at individual queries, and this is where the logs are useful. It depends on the configuration, but they usually contain information like system-level messages, long-running queries and lock waits, allowing the identification of inefficient queries and/or unnecessary queries. Tools like pgBadger, a log analysis tool, can help further by aggregating and building a query-level picture of what's happening.
The default retention period of the log files is 3 days, configurable up to 7 days using the rds.log_retention_period
parameter. This often isn't long enough, particularly for temporal issues that just aren't annoying enough to investigate immediately. Of course, you can download them manually, but manual steps are prone to be forgotten or lost.
Is there a better way?
Well if you're using MySQL, MariaDB or Oracle, yes - just publish your log files to Cloudwatch Logs and let Cloudwatch manage the access and retention.
UPDATE (Jan 2019): As of December 2018, AWS has added the ability to stream RDS logs to Cloudwatch Logs, in the same way available to the databases mentioned above. See the documentation for more details.
If you prefer your logs to be stored straight to S3 however, keep reading.
Introducing the RDS Logs Archiver
The RDS Logs Archiver is a Lambda function that,
- retrieves the RDS logs available
- streams each one to a S3 bucket
- stores the log last written time in the S3 bucket so we only upload logs from this point onwards next time
If there are more logs than can be streamed in the Lambda function invocation, it invokes itself again and continues on from where it left off. This behaviour usually only kicks in at the start when it is storing all the existing logs into the bucket.
If the log file is currently incomplete, i.e. it is still being written to, it will be re-uploaded to S3 on the next invocation.
The function only archives log files when invoked, so to automate this we added a Cloudwatch Event rule that triggers this Lambda function every 10 minutes. Depending on what the log_rotation_age
parameter is set to on the database, this may be overkill as there may not be newly completed log files to archive, but Lambda invocations are cheap enough that tuning this isn't really necessary.
Lastly, we added a Cloudwatch Alarm to email us when the function fails to work.
All of this has been wrapped up in a CloudFormation template that's ready to deploy. Just upload the function code to S3, update the parameters at the top of the template, then create the stack.
The Lambda function code, CloudFormation template and more detailed deployment instructions can be found in the citymapper/rds-logs-archiver repository.
Conclusion
We've been running this on all our mission-critial RDS databases for the past few weeks and the all logs have been reliably archived.
~~We may improve the function to stream into Cloudwatch Logs in the future~~, but for now, the function relieves us of having to manually download logs for later investigation. Querying the data using Athena is another option.
Issues and pull requests welcome!