Bounce Processing in Project Lancelot

Overview

A serious mailing list manager (MLM) requires some method of automatically dealing with messages that bounce, i.e., cannot be delivered to the target address and are returned to the list with an error message.

As of now, bouncing messages are forwarded to the list owner, which of course is unacceptable in the long run.

The accepted way of dealing with bounces is to try to figure out the address to which the bouncing message was originally delivered (which does not have to have anything to do with the place the bounce originated from), and to remove addresses from the list if they generate bounces over an extended period of time. This document discusses a design for Project Lancelot to enable this.

A related problem is what to do with bounces for confirmations and other administrative-type messages. Current thinking for Project Lancelot is simply to ignore these; (prospective) subscribers should make sure that they send requests from addresses that do not bounce.

Figuring Out the Bouncing Address

This is not easy in the general case -- experience shows that people will use one address to subscribe to a mailing list, then move elsewhere and have all their mail forwarded to the new address. The MLM continues to distribute messages to the old address, but bounces can also originate from the MTA handling the new address. It may be all but impossible to find out, from a bounce sent by the new installation, which address the message was originally sent to.

Project Lancelot's answer to this problem is to use variable envelope return paths (VERP). This is a method, pioneered by Dan J. Bernstein, which encodes the distribution address in a message's SMTP sender address. Bounces are supposed to be sent to the SMTP sender address, thus this address is the only definitive channel through which Project Lancelot can communicate with itself by way of the subscriber's MTA setup. In brief, the usual way of sending messages to the subscribers of a list called list@… would use a SMTP sender address of

  list+bounce@example.com

With VERP, a message to foobar@… would instead be sent with a SMTP sender address of

  list+bounce-NNNNN-foobar=example.net@example.com

(where NNNNN is the sequential number of the message on this list). I.e., the distribution address is encoded within the local part of the SMTP sender address. If we forward all bounces to a program that will handle them automatically, we can easily distinguish which message and subscriber the bounce refers to.

Of course this method means that we cannot distribute messages to a group of subscribers within the same domain by sending one copy to the domain's MTA with multiple SMTP receiver addresses, or even distribute messages in general by sending single copies of the message to our local MTA with lots of SMTP receiver addresses in various domains (SMTP servers are required to take at least 100 SMTP receiver addresses in one go). This problem can be circumvented if the local MTA used for message delivery supports the proposed XVERP extension.

A short review of Project Lancelot's list configuration parameter concerning VERP is in order:

  • smtp.verp (boolean): Determines whether VERP is used at all. All other VERP parameters are only considered if this parameter is set to 1.
  • smtp.verpstyle: Which style of VERP Project Lancelot will use. A value of smtp will use the XVERP extension if it is being advertised by the local (distribution) MTA. A value of lancelot means that Project Lancelot will do its own VERP encoding. The recommended value is smtp; if the MTA does not support VERP Project Lancelot will fall back to lancelot automatically.

Bounces are routed to the mail.workflow.bounce workflow, which will look at the incoming address and figure out the bouncing message and destination address from there.

Handling Bounces

The current idea is to implement bounce handling similar to what Mailman does:

  • If a bounce is received, we log that message N to address A bounced at date/time T.
  • For each address A we calculate a »bounce score« B depending on any bounces received for A during the preceding timespan D. For example, B can be increased by 1 for each day within D on which a bounce was received. Any bounces that happened earlier than T-D are discarded. It is important to cap the bounce score; if ten messages out of ten bounce during one day, address A is not ten times more broken than it would be if just one single message sent to the list on that day bounced.
  • If the bounce score for address A exceeds a »bounce threshold« B0, A's status is set to BOUNCING. Addresses which are BOUNCING are no longer sent copies of the distributed messages. We log the fact that A was set to BOUNCING and send a notification to that effect to A. The notification contains instructions on how to set the status of the subscription back to SUBSCRIBED.
  • For all addresses that are BOUNCING, repeat the notification up to M times, at intervals of D'. Instead of sending the M+1'th notification, unsubscribe the address in question.

According to Mailman, suitable values for the parameters in this algorithm might be: D = 30 days, B0 = 2, M = 2 (or 0, where 0 results in immediate unsubscription if the first notification is not reacted upon), D' = 7 days.

Mailman differentiates between »hard bounces« (no such user, and similar bad things) and »soft bounces« (mailbox full, etc.), assigning a lower bounce score to known soft bounces. This sounds like a sensible strategy.

Proposed Architecture

The design above may be implemented as follows:

  • We add a table bounces to the list's database giving details of the bouncing address, message number, and timestamp when the bounce was received.
  • We implement a Project Lancelot module process_bounce that will form the mainstay of a »bounce« workflow. This module looks at incoming bounces, tries to determine the bouncing address, and logs the bounce in the database. It also computes the bounce score for that address, sets the address to BOUNCING if required, and sends out the first notification.
  • Bounces will be analysed using various sub-modules depending on the actual bounces that we see in the wild.
  • We implement a program, ll-janitor, which will be invoked periodically to scan the list database and send out additional notifications or perform unsubscriptions. Ideally, each Project Lancelot user will need to arrange for ll-janitor to be run just once a day to cover all their lists.

Proposed Parameters

We propose the following set of parameters to control Project Lancelot's bounce processing:

  • mail.workflow.bounce: This workflow governs bounce processing. It should probably refer to a sequence of modules where the process_bounce module first tries to classify and process the bounce as mentioned above, and a second module forwards any unclaimed bounces to the list owner.
  • bounce.process: This parameter controls whether bounces are to be processed automatically, i.e., whether process_bounce tries to do anything at all. Possible values include auto for automatic bounce processing, and manual, if bounces are to be passed through to the list owner. (Default: auto)
  • bounce.dirs: A space-separated list of directories containing bounce-processing modules. This parameter is mostly supposed to help with developing these.
  • bounce.bounceinterval: The time interval, counting back from the current point in time, during which a bounce must have been logged to count towards the bounce score.
  • bounce.threshold: The bounce score that an address must reach in order to be considered »bouncing«.
  • bounce.notifications: The number of notifications (in addition to the first one) that a bouncing address will be sent before it is unsubscribed.
  • bounce.notifyinterval: The interval at which ll-janitor will send out additional bounce notifications.