Serverless Virus Scanning with AWS Lambda

To scan every uploaded file without running a server, wire an S3 ObjectCreated event to a Lambda function that runs ClamAV, then tag the object clean or move it to a quarantine prefix on detection. The scanner scales to zero between uploads and to thousands of concurrent invocations during a spike.

This guide is part of Automated Virus Scanning Integration under Backend Validation & Cloud Storage Architecture. It is the serverless counterpart to implementing ClamAV for uploaded file scanning.

When to use this approach

  • Files arrive via direct-to-cloud uploads, so there is no server in the request path to scan them.
  • Upload volume is spiky and you do not want an always-on scanning fleet.
  • You need an audit trail of which objects were scanned and the verdict.

Prerequisites

  1. An S3 bucket receiving uploads (ideally into a quarantine/ prefix).
  2. ClamAV packaged as a Lambda layer, or its database mounted from EFS.
  3. A Lambda execution role with s3:GetObject, s3:PutObjectTagging, s3:CopyObject, and s3:DeleteObject on the bucket.
  4. Node 20 Lambda runtime and the AWS SDK v3.

Architecture

The upload lands in a quarantine prefix where nothing can serve it. S3 emits an event, Lambda fetches the object, scans it with ClamAV (whose signature database lives on an EFS mount or in the layer), and either promotes a clean file to a serving prefix or quarantines an infected one with a tag explaining why.

Serverless virus scanning pipeline An upload to the quarantine prefix triggers an S3 event to Lambda, which scans with ClamAV using an EFS database, then tags clean or moves infected files. S3 quarantine new object event Lambda ClamAV scan EFS database signatures ready/ prefix tagged scan=clean clean infected/ prefix tagged + alerted infected
An upload event triggers a Lambda scan against an EFS-mounted ClamAV database, routing clean and infected files to separate prefixes.

Implementation

The handler downloads the object, runs clamscan against it, applies object tags, and routes the file. ClamAV’s binary and database come from a layer or EFS; the function streams the object to /tmp for scanning.

import {
  S3Client,
  GetObjectCommand,
  PutObjectTaggingCommand,
  CopyObjectCommand,
  DeleteObjectCommand,
} from "@aws-sdk/client-s3";
import { execFile } from "node:child_process";
import { promisify } from "node:util";
import { createWriteStream } from "node:fs";
import { pipeline } from "node:stream/promises";
import type { Readable } from "node:stream";
import type { S3Event } from "aws-lambda";

const exec = promisify(execFile);
const s3 = new S3Client({});

export async function handler(event: S3Event): Promise<void> {
  for (const record of event.Records) {
    const bucket = record.s3.bucket.name;
    const key = decodeURIComponent(record.s3.object.key.replace(/\+/g, " "));
    const localPath = `/tmp/${key.split("/").pop()}`;

    // 1. Download the uploaded object to the function's scratch space.
    const obj = await s3.send(new GetObjectCommand({ Bucket: bucket, Key: key }));
    await pipeline(obj.Body as Readable, createWriteStream(localPath));

    // 2. Scan with ClamAV; exit code 1 means a virus was found.
    let infected = false;
    try {
      await exec("/opt/bin/clamscan", ["--database=/mnt/clamav", localPath]);
    } catch (err) {
      const code = (err as { code?: number }).code;
      if (code === 1) infected = true;
      else throw err; // code 2 = scan error, not a clean result
    }

    // 3. Tag the object with the verdict for auditing.
    await s3.send(
      new PutObjectTaggingCommand({
        Bucket: bucket,
        Key: key,
        Tagging: {
          TagSet: [{ Key: "scan", Value: infected ? "infected" : "clean" }],
        },
      }),
    );

    // 4. Route the object based on the verdict.
    const target = key.replace(
      /^quarantine\//,
      infected ? "infected/" : "ready/",
    );
    await s3.send(
      new CopyObjectCommand({
        Bucket: bucket,
        CopySource: `${bucket}/${key}`,
        Key: target,
        TaggingDirective: "COPY",
      }),
    );
    await s3.send(new DeleteObjectCommand({ Bucket: bucket, Key: key }));

    console.log(`${key} -> ${target} (${infected ? "infected" : "clean"})`);
  }
}

Line-by-line on the critical pieces

  • decodeURIComponent(... replace(/\+/g, " ")) undoes the URL encoding S3 applies to keys in event records; spaces arrive as + and other characters are percent-encoded.
  • clamscan exit codes are load-bearing: 0 is clean, 1 is a virus found, 2 is a scan error. Treating 2 as clean would let unscanned files through, so it is re-thrown.
  • --database=/mnt/clamav points at the EFS mount holding the signature database, which is refreshed by freshclam on a schedule so signatures stay current without redeploying the function.
  • PutObjectTaggingCommand records the verdict as an object tag, giving you a queryable audit trail and a hook for bucket policies that deny serving anything not tagged clean.
  • TaggingDirective: "COPY" carries the verdict tag onto the promoted copy so the routing decision and the tag stay consistent.
  • The original quarantine object is deleted after copying so nothing lingers in the unscanned prefix.

EFS vs layer for the ClamAV database

The ClamAV signature database is large and grows over time. A Lambda layer is capped at 250 MB unzipped and is immutable, so the database goes stale until you redeploy. Mounting EFS lets a scheduled freshclam task keep signatures current and shares one database across all concurrent invocations. Use the layer only for the ClamAV binary; keep the database on EFS.

Configuration gotchas

Scan timeout on large media

ClamAV scanning is CPU-bound and large video files can exceed a short Lambda timeout. Raise the function timeout and memory (more memory grants more CPU), and reject files above a size your scan budget can handle before they reach the scanner.

Recursive trigger loop

If the event fires on the whole bucket and your handler also writes into the bucket, the copy can re-trigger the function. Scope the S3 event notification to the quarantine/ prefix only, never the destination prefixes.

/tmp space exhaustion

Lambda’s /tmp defaults to 512 MB. A file larger than that fails to download. Configure ephemeral storage up to 10 GB, or stream-scan via clamdscan against a long-lived daemon if your files are very large.

Verification

Drop the EICAR test signature into the quarantine prefix and confirm it is tagged infected and moved:

# EICAR is a harmless standard test string every scanner detects.
printf 'X5O!P%%@AP[4\\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*' > eicar.txt
aws s3 cp eicar.txt s3://my-bucket/quarantine/eicar.txt
# Expected after the Lambda runs:
aws s3api get-object-tagging --bucket my-bucket --key infected/eicar.txt
# TagSet contains { "Key": "scan", "Value": "infected" }

FAQ

Why scan in Lambda instead of in my API server?

Direct-to-cloud uploads never pass through your API, so there is no request-path server to scan them. An S3-event-driven Lambda scans them after they land, which also scales to zero between uploads. The path trade-off is discussed in presigned URL vs server proxy tradeoffs.

How do I keep ClamAV signatures up to date?

Run freshclam on a schedule (an EventBridge-triggered task) that writes to the EFS-mounted database. Because the scanner reads the database at runtime, updates apply without redeploying the function.

What stops a file from being served before it is scanned?

Upload into a quarantine prefix that no public path or signed-read policy can reach, and only promote files to the serving prefix after they are tagged clean. Nothing serves directly from quarantine.