|
TUTORIAL FOR EMIL VERSION 2.1Written by Martin Wendel, ITS, Uppsala university. Martin.Wendel@its.uu.seHOW EMIL V2 DOES MESSAGE CONVERSION.To be able to understand message conversion it is important to understand the components of a message and the types of these components. Basically a message consists of a header and a body. The format of the header is thoroughly defined in rfc822 and rfc1522. Basically it consists of lines of text divided into two parts; the field and the value, separated by a colon ':'. Apart from a few details the header is very straight forward. The body, on the other hand, is somewhat more tricky to understand. The body of a message consists of text or/and encoded data, where data may refer to text as well. Emil recognizes the encodings BinHex, Base64, Quoted-Printable and UUencode. Text is the parts of the message that is not recognized as being an encoded enclosure. Text is a raw format whereas encoded data follows the syntax of the encoding. This means that incorrect or incomplete encodings are not recognized and are thus treated as raw text (as a safety measure). Emil works in a two pass manner. After loading the message, it first interprets the header and body of the message before applying any conversions. First the format is recognized. This is done by interprating the header of the message. Then the data of the message is structured into a hierarchical message structure. UUencoded and BinHexed enclosures are recognized by their preamble and a syntax check is made on them to make pretty sure they are valid. When emil receives a MIME or Mailtool message, encoded enclosures are defined in the header. Emil recognizes these definitions and trusts them to be valid, no further processing is done initially. The rest of the body parts and the erroneous BinHex and UUencode parts are treated as text. The text is checked for 7bit or 8bit encoding. Emil now has a hierarchical representation of the message and each part is type marked by it's encoding. It may be that the trusted encodings of a MIME or Mailtool message are erroneous as well. However, the definitions in the header is a strong evidence that they must be correct. In the case of error, in spite of all that, the message part is left untouched, but not treated as text. Conversion is applied as specified by the target format. Emil has a clear view of the incoming message and can work pretty straight forward parsing the hierarchical structure. Each object in the structure contains headers, data, and pointers to other objects. When applying conversion, first the data of each object is converted to what's specified in the target format, then the headers of each object is taken care of based on the resulting type and encoding of the data of the objects. An example: The target format specifies Sun Mailtool, UUencoded attachments and ISO-646-SE text. A MIME message containing a Quoted-Printable encoded text/plain using ISO-8859-1 and a Base64 encoded Image/GIF will be converted as follows (this is a fairly long and detailed description):
|