Data lake and analytics

A data lake is a centralized repository that stores large amounts of structured and unstructured data in its raw form. As organizations store more sensitive information in their data lakes, it becomes increasingly important to ensure the security of this data. Malware scanning is a crucial aspect of data lake security, as it helps detect and prevent the spread of malicious software that could compromise the data’s integrity. By regularly scanning data in the data lake, organizations can proactively detect and neutralize threats, safeguard their critical information, and maintain the confidentiality and availability of their data.

It is crucial to delete malware from a data lake for several reasons:

  • Data security: Malware can compromise the confidentiality, integrity, and availability of sensitive data stored in the data lake, leading to data breaches and potential loss of sensitive information.
  • System stability: Malware can infect and spread through the systems and infrastructure used to manage the data lake, causing instability and potential system failures.
  • Compliance: Depending on the nature of the data and the regulations governing it, malware in a data lake may violate compliance requirements, leading to legal or financial consequences.
  • Reputation: The presence of malware in a data lake can damage an organization’s reputation and erode the trust of customers, stakeholders, and partners.

Therefore, it is important to scan data in a data lake for malware regularly and promptly delete any malicious software found to maintain the security and stability of the data and systems.

Our customers use the following options to defend their data lakes in real-time.

Quarantine infected files (#)

A quarantine approach is a security technique to isolate potentially harmful data to prevent it from spreading and causing damage to users. To protect a data lake using this approach, the following steps are needed:

  • Data scanning: All incoming data is scanned for potential threats, such as malware, viruses, or suspicious code.
  • Isolation: Infected data is moved to a separate, isolated location, known as a quarantine bucket, where it can be further analyzed and dealt with.
  • Analysis: The isolated data is thoroughly analyzed to determine if it is malicious and how it can be neutralized or removed.

By using a quarantine approach, organizations can reduce the risk of malware infections in their data lake and maintain the security and integrity of their data.

Setup (#)

  1. Follow the Getting started guide (reporting part is optional).
  2. Install the Quarantine infected files Add-On.

Summary (#)

ProsCons
Thorough analysis and evaluation of the malware, leading to complete removal of the threatTakes more time to complete
Ability to maintain a record of the malware for future reference or analysisRequires staff trained to deal with infected files
Reduced risk of data loss or false positives

Delete infected files (#)

Deleting infected files as soon as detected is a low-effort approach.

Setup (#)

Follow the Getting started guide (reporting part is optional).

Summary (#)

ProsCons
Quick removal of the threat, reducing the risk of further spread and damagePossibility of false positives, leading to deletion of benign data
Minimal analysis and remediation timeLack of a thorough analysis of the malware
Eases the process of ensuring compliance with security policies and regulationsRisk of data loss if the malware is deeply integrated into the data

Reporting only (#)

Instead of removing the threat, you can also minimize the impact on the data lake by only observing the data using reporting capabilities of bucketAV.

Setup (#)

  1. Follow the Getting started guide.
  2. Set the DeleteInfectedFiles configuration parameter to false.

Summary (#)

ProsCons
Minimal impact on the data lake, reducing the risk of data loss or false positivesDoes not remove the malware, leaving the data lake at risk of further spread and damage
Ability to track and analyze malware trends and patternsDoes not provide a thorough analysis of the malware, which can limit the understanding of the threat
Low resource requirements for maintaining the approachMay not comply with security policies and regulations requiring the removal of malware from the data lake.

Need more help?

Write us, and we'll get back to you as soon as we can.

Send us an email