You can turn the following knobs to tune the throughput of bucketAV:
- You can limit the size of the Scan Fleet via the AutoScalingMaxSize configuration parameter.
- You can adjust the EC2 instance type used in the Scan Fleet via the InstanceType configuration parameter.
Size of the Scan Fleet
The key question is: How many files can one EC2 instance scan per day/minute? Unfortunately, the time to scan a file depends heavily on file size, type, and content.
We used the following workload with real-world files to gain performance insights:
- File size: ~5MB
- File types: pdf, png, jpeg, mp3, xlsx, docx, pptx, key, zip, tar.gz;
- File content: real-world files, not generated
We measured that one m5.large EC2 instance scans:
- Powered by Sophos®: ~400,000 files per day.
- Powered by ClamAV®: ~20,000 files per day.
Keep in mind that your mileage may vary. Scanning a 5 MB PDF can take 0.1 to 5 seconds, depending on the number of pages and the content of the pages (images versus text). Excel sheets can take 10 seconds, while images take only 0.1 seconds.
We highly recommend measuring scan times for your workload during the 14-day free trial! Learn how to run a performance test.
To calculate your EC2 instance needs, use the following formula:
Number of files per day / Files one EC2 instance can scan per day
Assuming you want to scan 100,000 files per day and your performance test resulted in a throughput of 20,000 files per day, you will need
100,000 / 20,000 = 5 EC2 instances. This number is helpful for pricing estimation.
But that is only half of the truth. Files likely arrive at a different rate during the day. You might see peaks during business hours and idle periods during the night. The following graph shows the number of EC2 instances running over the day.
The average number of EC2 instances is five, but we still need more at peak times. The good news here is that the bucketAV costs and the AWS infrastructure costs are mostly the same whether you run 5 EC2 instances for 24 hours or a more dynamic fleet that runs 5 EC2 instances on average. Therefore, we recommend setting the AutoScalingMaxSize configuration parameter to a higher number than what you calculate. Watch the Scan queue length and adjust the size if the queue grows over time.
EC2 instance type
We recommend running bucketAV on
m5.large instance types. Depending on your workload, you might benefit from larger instance types. The only way to figure that out is by running performance tests with your workload.
As a rule of thumb: large files (> 1 GB) benefit from larger instance types.
Remember that bucketAV costs and the AWS infrastructure costs are the same for running two m5.large EC2 instances versus one m5.xlarge EC2 instance.
Due to tight network bandwidth constraints, we don’t recommend the
t3a families in production environments.