Next: , Previous: , Up: Top   [Contents][Index]

27 VM Internals

This section gives a sketchy overview of the VM internals for the developers/programmers.


Next: , Up: Internals   [Contents][Index]

27.1 Folder Internals

VM stores mail folders in the Unix ‘mbox’ format (in all its variants). Internal to Emacs, the mbox is loaded into a text buffer (the Folder buffer) and individual messages are identified by remembering markers into the text buffer. See Message Internals.

The Unix mbox format is described in the RFC 4155 specification of the Internet Engineering Task Force. The mail folder is a text file consisting of a sequence of messages, with each message consisting of a series of headers followed by a message body. The beginning of each message is delineated by a separator line starting with the string “From ” and the end of the message by a blank line. The leading separator line in VM folder is of the form “From VM ...” where the “...” records the time at which VM first saw the message. The format of the individual messages is as per the RFC 2822 specification, except that Line-Feed characters may be used to delineate the end of lines in the "Unix" format.

Three variants of the mbox format are recognized by VM, called From_, BellFrom_ and From_with-Content-Length. In a From_ type mbox, every message has a leading and trailing separator line, as indicated above. In a BellFrom_ type mbox, the trailing separator line can be missing. (This is so that the mbox’s from the old System V format can be handled.) In a From_with-Content-Length type mbox, the From separator line stores the length of the message. So, no trailing separator line is required.

In addition to these mbox formats, VM also handles the MMDF format and the Emacs Rmail’s Babyl format. The variable vm-folder-type stores the type of the folder being used.

To every message, VM adds a header with the field name “X-VM-v5-Data:” and stores in it the information about the message it wishes to remember between sessions.

The first message of the VM folder file contains additional headers used by VM for remembering information between sessions.

Folder variables

Internal to Emacs, VM stores the folder as simply a text buffer. However, it remembers a variety of data about the message contents in the buffer through internal variables.

The running state of the folder buffer is represented in a number of buffer-local variables:

vm-folder-access-data

The variable vm-folder-access-data is a vector storing data about the state of the mail server (for POP and IMAP servers). It contains the following items:


Next: , Previous: , Up: Internals   [Contents][Index]

27.2 Message Internals

The message data structure is a vector containing various pieces of data about the message, some of which is permanent and some that is calculated during a VM session. The data is organized into four sub-vectors:

The attributes vector and cached data vector are stored in the folder on disk as the X-VM-v5-Data header of the first message.

Location data

This vector holds the data about the location of the various parts of the message in the folder buffer. Every folder buffer or folder-like buffer (such as a Presentation buffer) has variables that contain message data structures. The location data is normally expected to refer to locations in that very buffer. However, this condition is not actually required. (See below.)

Unfortunately, in the current versions of VM, the folder buffer to which the location data point is not itself part of this vector. This information is inferred from the context (which makes the code brittle). The Folder buffer of the message can be obtained from the soft data vector but the location data could also point to a Presentation buffer.

Soft data

This vector contains other calculated data about the message that is specific to a VM session.

Attributes

All the hard-wired message attributes are stored in this vector. They also get saved as part of the X-VM-v5-Data header field when the folder is saved to disk.

Cached Data

The data that is cached for the message and stored on the disk as part of the X-VM-v5-Data header field. Even though this vector is only supposed to have data that can be calculated from the message itself, the fields pop-uidl, imap-uid and imap-uid-validity form an exception. They are really hard data that cannot be calculated from anything else.

Some of the data deals with information from message headers. The header fields can have MIME-encoded words in them. The strings stored in the cached-data vector, however, are MIME-decoded versions of the header fields, but they also have text properties that store the names of the original character sets used in the header fields. This allows the strings to be quickly re-encoded for storage on disk.

Mirror Data

Extra data shared by virtual messages if vm-virtual-mirror is non-nil.

MIME layout

The MIME layout of a message, stored in the soft data of the message, is in turn a vector containing various pieces of data. Such a vector is used not only for the overall message, but for all its MIME parts and subparts as well.

Cross-buffer sharing of data

Every Folder buffer has a vm-message-list and a vm-message-pointer list containing message data vectors.

Every Presentation buffer also uses a vm-message-pointer list with a single message (the one being presented). The message data vector in the Presentation buffer has its own location data, but shares all other components with the message in the Folder buffer. This allows the Presentation buffer to, for example, change the attributes of the message without having to switch context to the Folder buffer.

Virtual folders, which contain only references to messages in other folders, store just a single message body in the Folder buffer. However, they have message descriptors for all the messages in vm-message-list. All the message descriptors use the same location data vector, because only one message body can be stored in the Folder buffer, but have separate Soft data vectors. (This allows, for instance, virtual folders to have their own threads, which could in general be different from the threads in the underlying folders.) The other sub-vectors are shared with the underlying real folders. (In particular, the tokenized summary line is the same in the virual folders and their underlying folders.)


Next: , Previous: , Up: Internals   [Contents][Index]

27.3 Summary Internals

Generating a summary is quite a time-consuming operation. VM uses a variety of tricks to speed up the generation of summaries.

The format of the summary lines is specified in the variable vm-summary-line-format. The information that needs to go into the summary lines is divided into two classes:

A tokenized summary line is a list whose elements can be strings, representing fixed information in a message, and tokens, representing variable information. VM calculates a tokenized summary line for each message and caches it in the cached-data vector. The following forms of tokens are used in tokenized summary lines:

The function vm-tokenized-summary-insert converts a tokenized summary line into a string and inserts it in the summary buffer. The minibuffer message “Generating summary...” is used to show the progress of generating summary lines from tokenized summaries.

Buffer local variables in each Folder buffer responsible for maintaining summary information:

The beginning and the ending positions of each message summary line are stored in the message’s soft data vector. see Message Internals. The positions within the summary line have text-properties set, which give the data about the message:


Next: , Previous: , Up: Internals   [Contents][Index]

27.4 Threading Internals

Message threads required for threaded summaries are calculated using message ID’s, which are unique when the message was originally composed. However, VM may need to deal with multiple copies of the same message received via possibly different routes. So, message ID’s are not unique for messages inside VM.

Messages composed as replies generally have an “In-Reply-To” header. The message mentioned in this header is referred to as the parent of the message. In addition, messages also arrive with a “References” header which lists all the ancestors of the message, with the oldest message being listed first. The last message listed in the “References” header is the direct parent of message. It is important to keep in mind that all the messages listed in the “References” header may not be present in the VM folder.

Thread trees are constructed using the “In-Reply-To” headers and “References” headers. Jamie Zawinski has done a good analysis of the information contained in these headers which can be found on the web. VM’s threading algorithm is currently based on these ideas. These trees are called reference-based threads.

In addition, VM also allows threads to be built using the subject headers via the option vm-thread-using-subject. Subject-based threading is used in addition to reference-based threading. So, in a subject-based thread, the root message would be the oldest message with that subject and, below it, would be reference-based threads all of which share the same subject. The roots of these reference-based threads are referred to as the “members” of the subject thread. Subject threading is only one level deep, whereas reference threading can be arbitrarily deep.

Threads are built using two hash tables vm-thread-obarray and vm-thread-subject-obarray. The former keeps track of the thread obtained by following parent and reference chains. The latter keeps track of messages with the “same subject”. To prevent messages from jumping from one thread to another within the same VM session, the subject used is not the message’s own subject, but rather the subject of the oldest message in the thread. This subject is retained even if the oldest message is expunged.

The message ID’s are interned in vm-thread-obarray and the following information is stored for each message ID:

The vm-thread-subject-obarray interns each subject string found in the folder and maps it to a vector containing the following elements:

Building threads involves calculating all the data stored with the vm-thread-obarray and vm-thread-subject-obarray. These two collections of data are calculated in sequence, because the subject threads are based on the reference threads.

After the threads are built, the thread-list, thread-indentation and the thread-subtree fields of the Soft data vector are calculated as needed on demand and cached. (See Soft data vector.) These fields cannot be calculated without building threads first.

When new messages are assimilated, they are added to the threads that might have been already built, and the thread-related fields in the Soft data vector are erased so that they will be recalculated. The thread-subtree field is erased for all the ancestors of the assimilated message. The thread-list and thread-indentation fields are erased for all the descendants of the assimilated message.

Before messages in the folder are expunged, they are unthreaded. This involves removing them from their respective thread trees. It also involves the erasure of the thread-subtree field of all their ancestors and the thread-list and thread-indentation fields of the descendants.

Error handling

The code for threading has to be robust in the presence of erroneous information in the message headers. We have no control over the mail clients that produce those messages and faulty information should not lead to VM hanging or producing errors. It should just do the best job it can in the presence of imperfect information.

It is possible that the information in the headers give rise to cycles in the thread trees. Kyle Jones’s original implementation allowed these cycles to exist, but all functions that traversed the thread trees were protected to detect cycles. However, since thread trees are updated when new messages are received or existing messages are expunged, this led to unstable results.

Following Jamie Zawinski’s recommendation, VM now avoids cycles in thread trees. Loop detection is still carried out during traversal as a double safeguard.

VM gives priority to the parent information contained in the “In-Reply-To” headers in preference to the information in the “References” headers. However, if an “In-Reply-To” header gives rise to a cycle, it is ignored, and then “References” headers might be used to fill in the missing information.


Next: , Previous: , Up: Internals   [Contents][Index]

27.5 Sorting Internals

Sorting of messages in VM is carried out using the Emacs built-in sorting function, which is generic in the comparison operation to be used for sorting. The required comparison operation is expressed as a sequence of basic comparison operations such as comparison by date, by author, by subject etc. The dynamic variable vm-key-functions is bound to a list of comparison functions before calling the Emacs sort function.

The function vm-sort-compare-xxxxxx uses the functions listed in vm-key-functions to do the overall comparison. It compares the given messages using the key functions in sequence. If the first key function decides one of the messages to precede the other, then the comparison is over. If the messages are found to be equivalent according to the first key function then the second key function is tried and, if they are still equivalent, then the next key function is tried and so on. This is called the lexicographic combination of the given key functions.

Sorting by threads is special. When messages are to be sorted by threads, all the messages belonging to a thread should appear together. The required effect is achieved by using vm-sort-compare-thread as the first key function in the sequence. This function checks to see if the two messages belonging to the same thread. If they do then the farthest ancestors of the two messages that share the same parent are returned so that the remaining comparison operations can be applied to these ancestors. The rationale is that these ancestors are the roots of the thread subtrees that the two messages belong to. So, the relative ordering of the messages should be the same as the relative ordering of these ancestors. If the two messages belong to different threads then the thread roots of the two messages are returned, again with the same rationale.

Threaded summaries can be sorted by any key, e.g., by author (full-name). It is most common to sort them by “activity,” i.e., the order of the most recent message in the thread or subthread. Sorting them by “date” means using the date of the root message of the thread or subthread.


Next: , Previous: , Up: Internals   [Contents][Index]

27.6 User Interaction

For each mail folder, VM creates three kinds of buffers in Emacs: the Folder buffer, the Presentation buffer and the Summary buffer. All three types of buffers have the same user interface as far as possible: the same key bindings, menu bars, tool bars and also the same commands. The functions implementing the commands must therefore work irrespective which of the three buffers they are invoked in. This makes VM quite different from most Emacs modes.

VM stores the identity of the Folder buffer in a buffer-local variable vm-mail-buffer in each of the other types of buffers. Conversely, each Folder buffer uses buffer-local variables vm-summary-buffer and vm-presentation-buffer to store the identity of the other buffers.

Whenever a VM command is invoked by the user, VM calls a function called vm-select-folder-buffer-and-validate, which sets the current-buffer to the Folder buffer. It also stores the identity of the buffer with the user’s focus in a global variable called vm-user-interaction-buffer. Thus, at every point during the command execution, VM has knowledge of all the buffers involved as well as the buffer in which the command execution was initiated.

[More to be filled in on vm-display etc.]

The default menu bar of VM contains VM-specific menus, replacing the standard Emacs menus. This is achieved by setting the buffer-specific menu bar to one in which the Emacs menus are undefined (at least in Gnu Emacs).

VM computes its standard menu bar and stores it internally:

The menu bar also has a menu, or a menu item, to switch back to the standard Emacs menu bar. The computed menu bar is then installed depending on the setting of vm-use-menus. If the user selects the action to revert to the standard Emacs menu bar, the installation is easily reverted.

When the user picks a menu item to revert to the Emacs menu bar, the function vm-menu-toggle-menubar is invoked, which installs a fresh menu bar retaining the standard Emacs menus. The same function is used to reinstall the dedicated VM menu bar when needed.


Next: , Previous: , Up: Internals   [Contents][Index]

27.7 Coding Systems

A Coding System is a way of encoding characters as bit patterns. see Coding System Basics in Emacs Lisp manual. US-ASCII is a coding system for English. Other coding systems are used to encode the various languages of the world, e.g., iso-latin-1 for Western European languages, and hebrew-iso-8bit for Hebrew. Emacs also uses its own internal coding system for characters, which can encode all character sets currently in existence. But the internal coding system can vary between different versions of Emacs.

Emacs defines a property called mime-charset for each implemented coding system, which is the official preferred name of the MIME character set that it corresponds to. For example, iso-latin-1 corresponds to the MIME charset iso-8859-1, and hebrew-iso-8bit corresponds to the MIME charset iso-8859-8. The Emacs function coding-system-get can be used to extract the mime-charset property of a coding system. VM stores all the known coding systems and the corresponding MIME charsets in its internal variables vm-mime-mule-coding-to-charset-alist and vm-mime-mule-charset-to-coding-alist.

MIME messages specify the character set that their content is in, in the Content-Type header. VM uses this information to decode the content to the Emacs internal coding system. This is done using the function decode-coding-region. Conversely, VM encodes the outgoing messages into the default or chosen MIME character set using the function encode-coding-region.

The headers of email messages can only be in US-ASCII. So header fields in other character sets are encoded using either base-64 or quoted-printable encoding (which give ASCII strings) and annotated with the name of the original character set. Such annotations look like =?charset?B?. They can apply to individual words or sequences of words appearing the in the headers. Note that the annotation ?B? signifies base-64 encoding of the byte stream. Similarly the annotation ?Q? might be used to denote the quoted-printable encoding. VM decodes such strings using the function decode-coding-string. Conversely, the headers of outgoing messages are encoded using encode-coding-string


Next: , Previous: , Up: Internals   [Contents][Index]

27.8 Virtual Folder Internals

A virtual folder is characterized by its definition, which is stored in the buffer-local variable virtual-folder-definition. The form of the definition is as given in vm-virtual-folder-alist. See vm-virtual-folder-alist. It is a collection of clauses, with each clause listing a collection of folders and a collection of virtual selectors.

Each virtual selector X has a corresponding Lisp function ‘vm-vs-X’, whose purpose is to check whether a given message matches the selector. The arguments for ‘vm-vs-X’ are a message data structure m and all the arguments for the virtual selector X.

For example, the virtual selector author has a string argument, representing the author name. The corresponding Lisp function is defined as:

(defun vm-vs-author (m author-name)
  (or (string-match author-name (vm-su-full-name m))
      (string-match author-name (vm-su-from m))))

The definition checks to see if the given author-name pattern occurs in the full name of the author (vm-su-full-name) or the email address of the author (vm-su-from).

The author selector is then registered in four places:

Evidently, the last two registrations are only needed for interactive selectors that can be used with the V C command.


Next: , Previous: , Up: Internals   [Contents][Index]

27.9 MIME Display

The MIME layout of a message is stored in the mime-layout field of the Soft data vector of the message. (See MIME layout.) The MIME layout is in general a tree structure of “MIME parts”. The function vm-decode-mime-layout is responsible for traversing the tree structure at each MIME part and displaying it appropriately.

The function vm-decode-mime-layout goes through the following sequence of decisions:

  1. If the MIME part is a multipart type, then the subparts are displayed as needed. If it is a single part, it proceeds as follows.
  2. If the MIME part should not be displayed automatically, it is displayed as a button. (An automatically displayed MIME type is one listed in vm-mime-auto-displayed-content-types but not listed in the corresponding exceptions.)
  3. If the MIME part should be displayed internally and VM is able to do so, then it is displayed internally. (An internally displayed MIME type is one listed in vm-mime-internal-content-types but not listed in the corresponding exceptions.)
  4. Otherwise, the MIME part is displayed externally. An external viewer is found from vm-mime-external-content-types-alist and it is invoked to display the MIME part.

MIME parts of type ‘message/external-body’ need special treatment. If they are not asked to be auto-displayed, then they are displayed as buttons, but the button caption may use information from the child part (the actual object that is in the external-body) such as its type and description. If a message/external-body part is asked to be auto-displayed, then the child part is fetched from the external source and stored in an internal buffer. It may be auto-displayed if it is appropriate to do so, or shown in turn as a button.

MIME buttons are displayed as regions of text displaying button labels. In addition, they have an overlay/extent placed on them, which has a number of properties associated with it:


Next: , Previous: , Up: Internals   [Contents][Index]

27.10 MIME Composition

A MIME message is composed just like a normal message. When objects are attached using commands like vm-attach-file, attachment buttons are created in the message composition buffer. An attachment button is a region of text that looks like:

[Attachment mary.jpeg, image/jpeg]

Various text properties are associated with an attachment button, allowing it to be turned into an actual attachment when the message is sent.

The representation of the attachment buttons differs in GNU Emacs and XEmacs. In GNU Emacs, the region of text is given text properties that represent the metadata about the object. In XEmacs, the region of text is given an extent, which is then given properties representing the metadata. The reason for the different representations is that in GNU Emacs, only text properties are preserved under killing and yanking.

The following properties are defined for attachment buttons:

When a composed message is sent, the attachment buttons are replaced by actual attachment objects. In FSF Emacs, the attachment buttons are first converted into “fake” overlays before MIME encoding, in a function called vm-mime-fake-attachment-overlays. This allows the next stage to treat both FSF Emacs and XEmacs using the same logic.

The function vm-mime-encode-composition then encodes the composition buffer, by selecting each attachment button and replacing it with the corresponding object. The bodies of ‘external-body’ objects are also retrieved at this stage. Unless the objects were already MIME-encoded, they are MIME-encoded and made into MIME parts by adding suitable headers. The message itself is given MIME headers describing its content and then handed to Emacs message-sending functions.

Yanking or Forwarding MIME Messages

When another message is yanked or “included” in a message composition, the handling of attachments depends on the variable vm-include-mime-attachments. If the variable is nil, then the attachments are displayed as token buttons in plain text that appear similar to:

[DELETED ATTACHMENT mary.jpg, image/jpeg]

The function vm-decode-mime-layout is employed to generate the yanked text along with such token buttons.

If vm-include-mime-attachments is t, then first the vm-decode-mime-layout function is employed to generate proper MIME buttons for all the attachments. In a second step, the MIME buttons are replaced by attachment buttons using a function called vm-mime-convert-to-attachment-buttons. These attachment buttons are then handled as described above.


Next: , Previous: , Up: Internals   [Contents][Index]

27.11 Extents and Overlays

XEmacs and GNU Emacs differ in how they represent non-textual properties in buffers. The web page on “XEmacs vs GNU Emacs” describes the situation as follows:

XEmacs uses "extents" to represent all non-textual aspects of buffers; GNU Emacs 19 uses two distinct objects, "text properties" and "overlays", which divide up the functionality between them. Extents are a superset of the union of the functionality of the two GNU Emacs data types. The full GNU Emacs 19 interface to text properties and overlays is supported in XEmacs (with extents being the underlying representation).

Extents can be made to be copied into strings, and then restored, by kill and yank. Thus, one can specify this behavior on either "extents" or "text properties", whereas in GNU Emacs 19 text properties always have this behavior and overlays never do.

While extents and overlays look similar on the surface, they differ fundamentally in that extents are attached to text and, so, can be killed and yanked, whereas overlays are not attached to text. XEmacs has implemented GNU-like text properties on top of extents. So, text properties may work more uniformly in both the Emacsen, but VM was developed in the early days of the forking and does not use these common features.

The file vm-misc.el contains definitions whereby both extents and overlays can be treated as a single type of “VM extents”. Wherever such VM extents can be used, there is some uniformity in the code but, in other places, there is not. (Independently, the XEmacs team has developed the fsf-compat package by which FSF-style overlays are implemented on top of extents. This package is not compatible with the way VM deals with the two types.)

Another major differences between extents and overlays is that the beginning and ending of overlays are markers. This has some advantages. However, if a buffer has many overlays, normal editing operations must update all the overlay markers, which can be time-consuming.

The major applications of extents and overlays in VM are the following:

  1. Summary buffers use extents/overlays for each summary line. These are implemented uniformly but, to avoid the performance problem in GNU Emacs, all the markers are reset to nil before a summary is regenerated and then set to their correct positions afterwards. Not doing this correctly can seriously degrade the performance of summary generation.
  2. Presentation buffers use extents/overlays for MIME buttons. These are implemented uniformly.
  3. The message composition buffers have attachment buttons. These are implemented using text properties in GNU Emacs and extents in overlays. The difference is necessary because VM allows the attachment buttons to be killed and yanked. It is not possible to implement this functionality using overlays.

Previous: , Up: Internals   [Contents][Index]

27.12 Timers and Concurrency

VM has been designed as mainly a sequential program. However, there three timer tasks that get scheduled to occur at regular intervals:

vm-flush-itimer-function

Stores message attributes in the folder so that they will be saved when an auto-save is done. This is controlled by the variable vm-flush-interval.

vm-get-mail-itimer-function

Moves new mail from maildrops into the folder. This is controlled by the variable vm-auto-get-new-mail.

vm-check-mail-itimer-function

Checks the maildrops for any new mail. This is controlled by the variable vm-mail-check-interval.

These timer tasks are scheduled using the itimer package in XEmacs and the timer package in Gnu Emacs.


Previous: , Up: Internals   [Contents][Index]