2016-11-20

Extracting an email attachment to a pipe in procmail

At work, some of my emails come through some Microsoft mail server, which I don't use. However, through its web interface, I can set up redirection to the address I do actually use. But there's a problem: the redirected email is identical to the original, except that the Message-Id header field is overwritten with one the server chooses locally. What the hell could this be useful for?!! It completely messes up threading, and achieves nothing in its place! Worse, the new id ends in .local!! I thought these things were supposed to be globally unique…?

My own emails run through a procmail script that I control, so I already get it to fix things like restoring the subject line when mangled by spam filters. This does the job for that:

:0 fhw
* ^X-Spam-Prev-Subject:
| formail -R X-Spam-Prev-Subject Subject -U Subject

Can I do something similar for Message-Id? No, not with what gets redirected. However, there is another option: forward email as attachment. I checked, and the message id within the attachment is preserved!

I hunted around for a tool to unpack the attachment and print it to standard output (so it could be used as a procmail filter), but tools like munpack and ripmime only unpack to a directory. With nothing ready-made, I decided to pick a language with some hopefully mature RFC822 library:

#!/usr/bin/env python

"""Unpack a MIME message onto STDOUT."""

# Distilled from https://docs.python.org/2/library/email-examples.html

import sys
import email

def main():

    msg = email.message_from_file(sys.stdin)

    if not(msg.is_multipart()):
        sys.stderr.write('Not multipart\n')
        sys.exit(1)
    attachment = msg.get_payload(1)
    print attachment.get_payload(0).as_string(False)
    sys.exit(0)

if __name__ == '__main__':
    main()

The procmail configuration just has to be certain that the message has come through the mindlessly mangling server. I used three lines that should prevent mishaps:

:0 fw
* ^From:.*<my@email\.address\.example\.com>
* ^X-MS-Has-Attach: yes
* ^X-MS-Exchange-Inbox-Rules-Loop: my@email\.address\.example\.com
| /path/to/the/python/script

The encapsulating message should always come from me, even if the original is from someone else.

It seems to do the job! Just tried it with an image attachment (within the attached message), and it was fine. I wonder how well it will cope with very big emails. Will the Python library automatically save to disc when the size is above a threshold?