Why does bucketAV use EC2 to protect S3 from malware?

Prospects often ask why bucketAV runs on EC2 instead of Fargate (ECS/EKS) or Lambda. Even though, I am a huge fan of containers and Serverless, I argue that good old virtual machines are the compute platform of choice when it comes to downloading data from S3, scanning the data with an antivirus engine (ClamAV or Sophos), and publishing the results.

In the following, I describe the reasons why I think that EC2 is the best choice for bucketAV. The considerations can be transferred to other scenarios.

Why does bucketAV use EC2 to protect S3 from malware?

Stateful

bucketAV uses ClamAV or Sophos to scan S3 objects for malware. Both anti-malware engines consist of a large malware database that needs to be initialized and updated constantly. Therefore, bucketAV is a stateful workload. With the specific characteristic that the state can be restored at any time within a few minutes.

EC2: Virtual machines (EC2) in combination with network-attached storage (EBS) provide the foundation to operate stateful systems on AWS. For example, you could run a database management system on EC2 and EBS. A virtual machine runs forever as long as you pay the bill and the underlying hardware does not cause any interruptions.

Fargate: While it is possible to run stateful applications on Fargate, the main use case for Fargate is to deploy stateless applications. For example, we would not recommend running a database management system on Fargate.

Lambda: Provides a highly distributed execution environment for your code. To achieve that, Lambda starts and terminates execution environments automatically. An execution environment is temporary, and may be disposed by Lambda after processing a request. Therefore, Lambda is not a good fit for stateful applications.

As bucketAV is a kind of stateful application, it is best suited for EC2.

Network Capacity

bucketAV needs to download objects from S3 to be able to scan them with an anti-malware engine. Especially when scanning large files, the network capacity becomes the bottleneck of the system.

EC2: An m5.large instance comes with 0.75 Gbps baseline throughput and 10.0 Gbps burst throughput. The largest instance type of the same family, an m5.24xlarge provides a constant network throughput of 25 Gbps. And there are even instance types like the m5zn.12xlarge that offer a constant throughput of 100 Gbps.

Fargate: Unfortunately, AWS does not share any information about the network capabilities of Fargate. So, we need to rely on the AWS community to run benchmarks. Sebastian Cohnen run a Fargate network benchmark in 2021. And in 2024, Sami Jaktholm run a Fargate network benchmark. The network bandwidth of a container running on Fargate depends on the vCPUs and memory assigned to the task/pod. The smallest possible configuration with 0.25 vCPU and 512 MB provides a baseline performance of 0.063 Gbps and bursts up to 4.214 Gbps for about 6 minutes. A larger configuration with 4 vCPUs and 8192 MB provides 1.241 Gbps baseline and 9.829 Gbps burst throughput.

Lambda: AWS does not publish information about the networking capabilities of a Lambda function. Sami Jaktholm benchmarked the networking capabilities of Lambda recently, and came to the conclusion that a Lambda function provides a baseline performance of 0.600 Gbps and bursts up to 2.750 Gbps for a short period of time. It seems like assigning more memory to a Lambda function does not impact its network capacity.

In summary, EC2 provides configuration options with up to 100 Gbps which is far beyond what Fargate can offer. Moreover, Lambda comes with pretty limited network capabilities with a baseline performance of 0.6000 Gbps. As it is not possible to distribute the process of downloading and scanning large files on multiple machines, EC2 is the best option for bucketAV.

Costs

Let’s compare the hourly costs for the compute capacity of 2 vCPUs (Intel/AMD) and 8 GiB memory.

Service$/hourDifference between EC2 and …
EC2$0.09600%
Fargate$0.116521%
Lambda$0.4800400%

The decision is simple. If you want to buy compute capacity as cheaply as possible, it’s worth opting for EC2. That’s one of the main reasons, bucketAV runs on EC2.

Scalability

bucketAV is an asynchronous system. When a user uploads a new object, S3 sends an event notification to a SQS queue. A fleet of EC2 instances polls the queue and processes scan tasks. This architecture allows bucketAV to adapt the size of the fleet based on the queue length without the risk of losing any scan tasks.

Unfortunately, there are no hard numbers to compare the scalability of EC2, Fargate, and Lambda. But we are giving the comparison a try.

EC2: It takes about a minute to launch and boot an EC2 instance. Furthermore, auto-scaling based on CloudWatch metrics adds a delay of about three minutes. While we have optimized the auto-scaling process, we observe that it takes up to 5 minutes until bucketAV launches a new EC2 instance to react to a spike in the workload.

Fargate: In 2022, Vlad ran an experiment on scaling containers on AWS. What we can learn from Vlad’s experiment is that Fargate scales significantly faster than EC2.

Lambda: Lambda is built to scale. The documentation says “In each AWS Region, and for each function, your concurrency scaling rate is 1,000 execution environment instances every 10 seconds. In other words, every 10 seconds, Lambda can allocate at most 1,000 additional execution environment instances to each of your functions.”1 That’s quite impressive. When it comes to scalability, Lambda is definitely the winner.

When scalability is crucial to minimize overprovisioning for peak hours, Lambda is certainly the best choice. We are making a trade-off at this point. As scalability is not that important in an asynchronous system, we live with the fact, that EC2 scales relatively slow.

Summary

In conclusion, while Fargate and Lambda offer certain advantages, EC2 remains the optimal choice for bucketAV. Its ability to efficiently handle stateful workloads, superior network capacity, and cost-effectiveness make it well-suited for processing large S3 objects with anti-malware engines. Although EC2 may scale more slowly than serverless options, this trade-off is acceptable given bucketAV’s asynchronous nature. Ultimately, EC2’s strengths align closely with bucketAV’s specific requirements, making it the most effective compute platform for this malware scanning service.


Published on July 30, 2024 | Written by Andreas

Stay up-to-date

Monthly digest of security updates, new capabilities, and best practices.