Monitoring & Alerting

bucketAV monitors critical parts of the system out of the box using CloudWatch Alarms. Critical parts are:

  • The health of the Scan Queue
  • Signatures updates

BucketAV alarms you when there are operational issues. The recipient is configurable via the InfrastructureAlarmsEmail configuration parameter.

The possible alarms are described in the following, including runbooks to fix them.

DeadLetterQueueAlarm (#)

The Dead Letter Queue contains messages. Some scan jobs were dropped.

To investigate why scan jobs are not processed:

  1. Visit the Amazon SQS Console.
  2. Ensure that you are in the correct region.
  3. Navigate to Queues.
  4. Click on the bucketAV Dead Letter Queue (if you followed the docs, the name starts with bucketav-DeadLetterQueue-).
  5. Click on the Send and receive messages button at the top right.
  6. Click on Poll for messages button.
  7. Click on the first message.
  8. The message body contains a lot of information in JSON format. Watch out for the bucket name and object key. {"Records":[{"eventVersion":"2.1","eventSource":"aws:s3","awsRegion":"eu-west-1","eventTime":"2022-08-17T09:17:47.697Z","eventName":"ObjectCreated:Put","userIdentity":{"principalId":"xxx"},"requestParameters":{"sourceIPAddress":"91.45.138.113"},"responseElements":{"x-amz-request-id":"xxx","x-amz-id-2":"xxx"},"s3":{"s3SchemaVersion":"1.0","configurationId":"bucketav","bucket":{"name":"**bucketav-demo**","ownerIdentity":{"principalId":"xxx"},"arn":"arn:aws:s3:::bucketav-demo"},"object":{"key":"**demo.pdf**","size":70437,"eTag":"b27c3f8633c054bf40bd797ebc435047","sequencer":"0062FCB23BA3653FA3"}}}]}

We can now search for logs for the particular file:

  1. Visit the Amazon CloudWatch Console.
  2. Ensure that you are in the correct region.
  3. Navigate to Logs Insights.
  4. Select the bucketAV logs log group (if you followed the docs, the name starts with bucketav-Logs-).
  5. Select a date range.
  6. Enter the following query and replace BUCKET_NAME and OBJECT_KEY with the values extracted from the message body:
fields @timestamp, @logStream, @message
| filter @logStream like "/var/log/messages" and @message like "s3://BUCKET_NAME/OBJECT_KEY"
| sort @timestamp desc

Here is an example based on the sample message body from above:

fields @timestamp, @logStream, @message
| filter @logStream like "/var/log/messages" and @message like "s3://bucketav-demo/demo.pdf"
| sort @timestamp desc
  1. Click Run query.
  2. Click on the log stream of the first search result.
  3. You are looking at the logs around when the file was scanned.
  4. Feel free to send the logs our way at hello@bucketav.com.

ScanQueueOldMessagesAlarm (#)

The Scan Queue contains messages older than 12 hours.

By default, the AutoScalingMinSize configuration parameter and the AutoScalingMaxSize configuration parameter are set to 1. Therefore, you will only have one EC2 instance running to scan files. If you increase the AutoScalingMaxSize configuration parameter, the solution will scale out if the Scan Queue grows and scale in if the Scan Queue is empty. The defaults are low to protect your AWS bill.

If the InstanceType configuration parameter is set to t3.* or t3a.*, you should consider changing to m5.* before you scale out by increasing the AutoScalingMaxSize configuration parameter.

  1. Visit the AWS CloudFormation Console.
  2. Ensure that you are in the correct region.
  3. Navigate to Stacks.
  4. Click on the bucketAV stack (if you followed the docs, the name is bucketav).
  5. At the top right, click on Update.
  6. In the next step, just click Next.
  7. Increase the AutoScalingMaxSize configuration parameter.
  8. Click Next.
  9. In the next step, just click Next.
  10. At the bottom, check “I acknowledge that AWS CloudFormation might create IAM resources.” and click Update stack.

SignaturesAgeAlarm (#)

Signatures are older than seven days. Are signature updates working?

Check the dashboard!

ScanQueueEmptyAlarm (#)

Don’t be worried about this alarm. It is used to trigger auto-scaling policies.

To hide the alarm in the CloudWatch Management Console, select Hide Auto Scaling alarms.

Hide Auto Scaling alarms

ScanQueueFullAlarm (#)

Don’t be worried about this alarm. It is used to trigger auto-scaling policies.

To hide the alarm in the CloudWatch Management Console, select Hide Auto Scaling alarms.

Hide Auto Scaling alarms

NatGatewayAErrorPortAllocationAlarm (#)

Delivery Method Dedicated private VPC only!

NAT gateway could not allocate a source port. Too many concurrent connections are open through the NAT gateway.

Please contact us!

NatGatewayAPacketsDropCountAlarm (#)

Delivery Method Dedicated private VPC only!

NAT gateway dropped packets. This might indicate an ongoing transient issue with the NAT gateway.

Please contact us!

NatGatewayABandwidthAlarm (#)

Delivery Method Dedicated private VPC only!

NAT gateway bandwidth utilization is over 80%.

Please contact us!

NatGatewayAPacketsAlarm (#)

Delivery Method Dedicated private VPC only!

NAT gateway packet utilization is over 80%.

Please contact us!

RefreshBucketCacheStateMachineAlarmExecutionsFailed (#)

The bucket cache is out-of-date.

Please contact us!

RefreshBucketCacheStateMachineAlarmExecutionsTimedOut (#)

The bucket cache is out-of-date.

Please contact us!

Need more help?

Write us, and we'll get back to you as soon as we can.

Send us an email