Monitoring & Alerting
bucketAV monitors critical parts of the system out of the box using CloudWatch Alarms. Critical parts are:
- The health of the Scan Queue
- Signatures updates
BucketAV alarms you when there are operational issues. The recipient is configurable via the InfrastructureAlarmsEmail configuration parameter.
The possible alarms are described in the following, including runbooks to fix them.
DeadLetterQueueAlarm
The Dead Letter Queue contains messages. Some scan jobs were dropped.
To investigate why scan jobs are not processed:
- Visit the Amazon SQS Console.
- Ensure that you are in the correct region.
- Navigate to Queues.
- Click on the bucketAV Dead Letter Queue (if you followed the docs, the name starts with
bucketav-DeadLetterQueue-
). - Click on the Send and receive messages button at the top right.
- Click on Poll for messages button.
- Click on the first message.
- The message body contains a lot of information in JSON format. Watch out for the bucket name and object key.
{"Records":[{"eventVersion":"2.1","eventSource":"aws:s3","awsRegion":"eu-west-1","eventTime":"2022-08-17T09:17:47.697Z","eventName":"ObjectCreated:Put","userIdentity":{"principalId":"xxx"},"requestParameters":{"sourceIPAddress":"91.45.138.113"},"responseElements":{"x-amz-request-id":"xxx","x-amz-id-2":"xxx"},"s3":{"s3SchemaVersion":"1.0","configurationId":"bucketav","bucket":{"name":"**bucketav-demo**","ownerIdentity":{"principalId":"xxx"},"arn":"arn:aws:s3:::bucketav-demo"},"object":{"key":"**demo.pdf**","size":70437,"eTag":"b27c3f8633c054bf40bd797ebc435047","sequencer":"0062FCB23BA3653FA3"}}}]}
We can now search for logs for the particular file:
- Visit the Amazon CloudWatch Console.
- Ensure that you are in the correct region.
- Navigate to Logs Insights.
- Select the bucketAV logs log group (if you followed the docs, the name starts with
bucketav-Logs-
). - Select a date range.
- Enter the following query and replace
BUCKET_NAME
andOBJECT_KEY
with the values extracted from the message body:
fields @timestamp, @logStream, @message
| filter (@logStream like "/var/log/messages" or @logStream like "/journald/bucketav.service") and @message like "s3://BUCKET_NAME/OBJECT_KEY"
| sort @timestamp desc
Here is an example based on the sample message body from above:
fields @timestamp, @logStream, @message
| filter (@logStream like "/var/log/messages" or @logStream like "/journald/bucketav.service") and @message like "s3://bucketav-demo/demo.pdf"
| sort @timestamp desc
- Click Run query.
- Click on the log stream of the first search result.
- You are looking at the logs around when the file was scanned.
- Feel free to send the logs our way at hello@bucketav.com.
ScanQueueOldMessagesAlarm
The Scan Queue contains messages older than 12 hours.
By default, the AutoScalingMinSize configuration parameter and the AutoScalingMaxSize configuration parameter are set to 1
. Therefore, you will only have one EC2 instance running to scan files. If you increase the AutoScalingMaxSize configuration parameter, the solution will scale out if the Scan Queue grows and scale in if the Scan Queue is empty. The defaults are low to protect your AWS bill.
If the InstanceType configuration parameter is set to t3.*
or t3a.*
, you should consider changing to m5.*
before you scale out by increasing the AutoScalingMaxSize configuration parameter.
- Visit the AWS CloudFormation Console.
- Ensure that you are in the correct region.
- Navigate to Stacks.
- Click on the bucketAV stack (if you followed the docs, the name is
bucketav
). - At the top right, click on Update.
- In the next step, just click Next.
- Increase the AutoScalingMaxSize configuration parameter.
- Click Next.
- In the next step, just click Next.
- At the bottom, check “I acknowledge that AWS CloudFormation might create IAM resources.” and click Update stack.
SignaturesAgeAlarm
Signatures are older than seven days. Are signature updates working?
Check the dashboard!
ScanQueueEmptyAlarm
Don’t be worried about this alarm. It is used to trigger auto-scaling policies.
To hide the alarm in the CloudWatch Management Console, select Hide Auto Scaling alarms.
ScanQueueFullAlarm
Don’t be worried about this alarm. It is used to trigger auto-scaling policies.
To hide the alarm in the CloudWatch Management Console, select Hide Auto Scaling alarms.
NatGatewayAErrorPortAllocationAlarm
This feature is only available for delivery method dedicated private VPC!
NAT gateway could not allocate a source port. Too many concurrent connections are open through the NAT gateway.
Please contact us!
NatGatewayAPacketsDropCountAlarm
This feature is only available for delivery method dedicated private VPC!
NAT gateway dropped packets. This might indicate an ongoing transient issue with the NAT gateway.
Please contact us!
NatGatewayABandwidthAlarm
This feature is only available for delivery method dedicated private VPC!
NAT gateway bandwidth utilization is over 80%.
Please contact us!
NatGatewayAPacketsAlarm
This feature is only available for delivery method dedicated private VPC!
NAT gateway packet utilization is over 80%.
Please contact us!
RefreshBucketCacheStateMachineAlarmExecutionsFailed
The bucket cache is out-of-date.
Please contact us!
RefreshBucketCacheStateMachineAlarmExecutionsTimedOut
The bucket cache is out-of-date.
Please contact us!