VM User’s Manual: Internals

27 VM Internals

This section gives a sketchy overview of the VM internals for the developers/programmers.

• Folder Internals:		Structure of the folders
• Message Internals:		Structure of the message data structure
• Summary Internals:		Details of summary generation
• Threading Internals:		Details of message threads handling
• Sorting Internals:		Details of how messages are sorted
• User Interaction:		Handling of the user interaction
• Coding Systems:		How VM handles character coding
• MIME Display:		How VM displays MIME messages
• MIME Composition:		How MIME messages are composed
• Virtual Folder Internals:		Details of virtual folders and selectors
• Extents and Overlays:		How VM deals with XEmacs and GNU Emacs differences
• Timers and Concurrency:		How VM runs asynchronous timers

27.1 Folder Internals

VM stores mail folders in the Unix ‘mbox’ format (in all its variants). Internal to Emacs, the mbox is loaded into a text buffer (the Folder buffer) and individual messages are identified by remembering markers into the text buffer. See Message Internals.

The Unix mbox format is described in the RFC 4155 specification of the Internet Engineering Task Force. The mail folder is a text file consisting of a sequence of messages, with each message consisting of a series of headers followed by a message body. The beginning of each message is delineated by a separator line starting with the string “From ” and the end of the message by a blank line. The leading separator line in VM folder is of the form “From VM ...” where the “...” records the time at which VM first saw the message. The format of the individual messages is as per the RFC 2822 specification, except that Line-Feed characters may be used to delineate the end of lines in the "Unix" format.

Three variants of the mbox format are recognized by VM, called From_, BellFrom_ and From_with-Content-Length. In a From_ type mbox, every message has a leading and trailing separator line, as indicated above. In a BellFrom_ type mbox, the trailing separator line can be missing. (This is so that the mbox’s from the old System V format can be handled.) In a From_with-Content-Length type mbox, the From separator line stores the length of the message. So, no trailing separator line is required.

In addition to these mbox formats, VM also handles the MMDF format and the Emacs Rmail’s Babyl format. The variable vm-folder-type stores the type of the folder being used.

To every message, VM adds a header with the field name “X-VM-v5-Data:” and stores in it the information about the message it wishes to remember between sessions.

The first message of the VM folder file contains additional headers used by VM for remembering information between sessions.

X-VM-Bookmark. This header stores the position of the cursor, as a message number, in effect when VM saved the folder. Upon revisiting the folder, VM attempts to put the cursor back at this position.
X-VM-Last-Modified. The date and time at which the folder was last modified.
X-VM-Message-Order. This header lists the order in which the messages should be listed.
X-VM-Labels. This header lists the message labels that have been used in the folder.
X-VM-VHeader. This header lists the values of vm-visible-headers and vm-invisible-header-regexp that were in effect when the folder was saved. The messages in the folder would have their headers arranged according to these variables.
X-VM-Summary-Format. This header stores the format string for the summary lines.
X-VM-POP-Retrieved. This header lists all the messages that have been retrieved from POP servers together with the identifying information for the POP servers. VM refrains from retrieving these messages again in future in order to avoid duplication.
X-VM-IMAP-Retrieved. This header lists messages that have been retrieved from IMAP servers together with their identifying information on the IMAP servers (UID and UIDVALIDITY). VM refrains from retrieving these messages again in future in order to avoid duplication. (For local folders, this lists all the retrieved messages except those known to be expunged on the server. For IMAP folders, it does not list all the retrieved messages because they are normally the same as those on the server. Only the messages locally expunged in the cache folder but not known to be expunged on the server are listed. In the normal cases, the variable is just nil in IMAP folders.)

Folder variables

Internal to Emacs, VM stores the folder as simply a text buffer. However, it remembers a variety of data about the message contents in the buffer through internal variables.

vm-message-list. A list of message data structures for all the messages in the buffer.
vm-folder-type. The type of the current folder indicating how the messages are stored: one of ’babyl, ’From_, ’BellFrom_, ’From_-with-Content-Length and ’mmdf.
vm-folder-access-method. The method for accessing the server message store: ’pop for pop-folders and ’imap for imap-folders, and nil for all other folders.
vm-folder-access-data. A vector of data for accessing the server message store. The first two elements of the vector are the maildrop specification for the mail server and a reference to the process connecting to the mail server. For the ’pop access method, that is all there is. But, for the ’imap access method, the vector has 9 other entries detailing various pieces of data about the IMAP server.
vm-folder-read-only. A boolean flag indicating whether the folder is read-only. If so, no modifications are allowed, including attribute changes. However, messages can be fetched from external storage for viewing.
vm-virtual-folder-definition. If the current folder is virtual, then this variable holds the data constituting its definition.
vm-real-buffers. If the current folder is virtual, then this variable is a list of all the real folder buffers involved in constructing it.
vm-virtual-buffers. A list of all the virtual folder buffers that the current buffer is involved in.
vm-component-buffers. An a-list containing all the folder buffers (real or virtual) that make up the components of the current virtual folder, and a flag indicating whether those folders were visited as part of visiting the virtual folder. When the virtual folder is closed, all the folders purposely visited will also be closed..
vm-summary-buffer. The Summary buffer of the folder. (If the Summary buffer gets killed for any reason, the value of this variable becomes <killed buffer>, which is unfortunate. Therefore, most interactive commands of VM check for killed Summary buffer and reset this variable to nil in such a case. So, in the middle of code, this variable can be regarded as a valid buffer pointer.)
vm-presentation-buffer-handle. The message Presentation buffer of the folder. (Same proviso applies as for vm-summary-buffer.)
vm-presentation-buffer. This seems to be a copy of the vm-presentation-buffer-handle. Its purpose is unknown.

The running state of the folder buffer is represented in a number of buffer-local variables:

vm-message-pointer. A sublist of vm-message-list starting from the current message that the cursor is on. So, the first element of vm-message-pointer is the current message.
vm-last-message-pointer. Whenever the cursor is moved, the previous value of vm-message-pointer is remembered in this variable.
vm-summary-pointer. The message struct of the message which has the summary pointer in the Summary buffer.
vm-fetched-messages. List of external messages whose bodies were fetched for viewing or other operations.
vm-fetched-message-count. The number of messages in vm-fetched-messages. An attempt is made to keep this below the vm-fetched-message-limit.
vm-mime-decoded. The MIME decoding state of the current message display: undecoded if the message is shown in undecoded plain text form, decoded if the message is shown decoded, and buttons if the message is shown as a series of buttons for all its MIME components. The D command cycles through these states.
vm-system-state. The state of VM in a Folder buffer or Presentation buffer:
- previewing. if a message is being previewed.
- showing. if a full message is being shown.
- reading. if message reading is in progress.
A message edit buffer is in state editing.

A message composition buffer may be in one of these states:
- forwarding. if a message is being forwarded.
- replying. if a message is being replied to.
- redistributing. if a message is being redistributed.
vm-spooled-mail-waiting. VM periodically checks if there is new mail in the spool files of the current folder and set this flag to t if there is new mail.
vm-undo-record-list. A list of undo records describing the actions to be performed if an undo operation is invoked. Each undo record has an action, the message, if any, to which the action applies, and any arguments needed for the action.
vm-undo-record-pointer. A pointer into the vm-undo-record-list indicating the current position of the undoing cycle.

vm-folder-access-data

The variable vm-folder-access-data is a vector storing data about the state of the mail server (for POP and IMAP servers). It contains the following items:

pop-maildrop-spec or imap-maildrop-spec. MAILDROP specification of the server folder.
pop-process or imap-process. The Emacs process being used to communicate with the server for this folder. (Each folder uses a separate process to avoid unwanted interference.)
imap-uid-validity. The UIDVALIDITY value of the IMAP folder.
imap-read-write. A boolean flag indicating whether the folder is writable.
imap-can-delete. A boolean flag indicating whether the folder allows deletions.
imap-body-peek. A boolean flag indicating whether the folder allows the BODYPEEK command of IMAP.
imap-permanent-flags. The list of permananet flags that have been stored in the folder.
imap-mailbox-count. The number of messages in the folder.
imap-recent-count. The number of messages in the folder that are considered “recent” by the server.
imap-retrieved-count. The number of messages present in the folder when messages were last retrieved. This would have been the value of imap-mailbox-count at that time.
imap-uid-list. The list of UID’s and flags of the messages in the folder, using cons cells of the form (msg-num . uid . size . flags list). The cons cells (size . flags list) are shared with imap-flags-obarray below.
imap-uid-obarray. An obarray that binds all the UIDs of messages in the folder to their message sequence numbers.
imap-flags-obarray. An obarray that binds all the UIDs of messages in the folder to cons cells of the form (size . flags list). These cons cells are the same as those occurring in the imap-uid-list field. So, any updates will be shared through both the views. The two obarrays, imap-uid-obarray and imap-flags-obarray, bind exactly the same set of UIDs. Jointly, they are referred to as uid-and-flags-data. The reason for their separation is historical.

27.2 Message Internals

The message data structure is a vector containing various pieces of data about the message, some of which is permanent and some that is calculated during a VM session. The data is organized into four sub-vectors:

Location data. This data about the location of the various parts of the message in the Folder buffer is calculated after a folder is loaded and parsed.
Soft data. This vector contains other calculated data about the message that is specific to a VM session.
Attributes. All the hard-wired message attributes are stored in this vector.
Cached Data. Calculated data that is cached for each message.
Mirror Data. Extra data shared by virtual messages if vm-virtual-mirror is non-nil.

The attributes vector and cached data vector are stored in the folder on disk as the X-VM-v5-Data header of the first message.

Location data

This vector holds the data about the location of the various parts of the message in the folder buffer. Every folder buffer or folder-like buffer (such as a Presentation buffer) has variables that contain message data structures. The location data is normally expected to refer to locations in that very buffer. However, this condition is not actually required. (See below.)

start. Marker for the starting position of the message, at which a leading separator line begins.
headers. Marker for the position in the buffer where the headers of the message start.
vheaders. Marker for the position in the buffer where the visible headers of the message start. (The headers are rearranged in such a way that all the visible headers are towards the end of the headers region.)
text. Marker for the position in the buffer where the text of the message starts.
text-end. Marker for the position in the buffer where the text of the message ends.
end. Marker for the position in the buffer where the message ends.

Unfortunately, in the current versions of VM, the folder buffer to which the location data point is not itself part of this vector. This information is inferred from the context (which makes the code brittle). The Folder buffer of the message can be obtained from the soft data vector but the location data could also point to a Presentation buffer.

Soft data

This vector contains other calculated data about the message that is specific to a VM session.

number. The message number as an integer.
padded-number. The message number as a padded string.
mark. Flag that indicates if the message has been marked (via vm-mark-message).
su-start. The position in the Summary buffer where the summary line of the message starts.
su-end. The position in the Summary buffer where the summary line of the message ends.
real-message-sym. If the message is in a virtual folder, then its corresponding “real message” is the underlying message in another folder which is described by a message data structure similar to the current one. The real message data structures are represented by uninterned symbols written as “<<>>”. This field stores the symbol representing the real message of the current message. If the current message is a real message then this field contains its own symbol. The use of symbols for this purpose avoids the possibility of circular data structures.
mirrored-message-sym. This is similar to the real-message-sym, except that it points to the message directly mirrored by the current virtual folder message.
reverse-link-sym. Reference to the previous message in the message list, also represented by an uninterned symbol written as “<–”.
message-type. A symbol indicating the type of the message according to its folder type, one of BellFrom_, From_ and From_-with-Content-Length.
message-id-number. A number that uniquely identifies the message within a VM session.
buffer. The Folder buffer of the message. (Messages in Presentation buffers also have this field set to the corresponding Folder buffer.)
thread-indentation. Indentation level of the message in its message thread.
thread-list. List of symbols from vm-thread-obarray that give this message’s lineage.
thread-subtree. List of messages that form the subtree under this message in a threaded summary display.
babyl-frob-flag.
saved-virtual-attributes. Saved attributes if the message switched from unmirrored to mirrored.
saved-virtual-mirror-data. Saved mirror data, if the message was switched from unmirrored to mirrored.
virtual-summary. Summary for unmirrored virtual message.
mime-layout. MIME layout information; types, ids, positions, etc of all MIME entities. (See below.)
mime-encoded-header-flag. Flag that indicates if the headers of the message are MIME encoded.
su-summary-mouse-track-overlay. The overlay on the summary of this message used for selection by mouse.
message-access-method. The access-method to be used for the message, inherited from its real folder.

Attributes

All the hard-wired message attributes are stored in this vector. They also get saved as part of the X-VM-v5-Data header field when the folder is saved to disk.

new-flag. Flag to indicate if the message is “new”.
unread-flag. Flag to indicate if the message is unread.
deleted-flag. Flag to indicate if the message has been deleted.
filed-flag. Flag to indicate if the message has been filed.
replied-flag. Flag to indicate if the message has been replied to.
written-flag. Flag to indicate if the message has been saved.
forwarded-flag. Flag to indicate if the message has been forwarded.
edited-flag. Flag to indicate if the message has been edited.
redistributed-flag. Flag to indicate if the message has been redistributed.

Cached Data

The data that is cached for the message and stored on the disk as part of the X-VM-v5-Data header field. Even though this vector is only supposed to have data that can be calculated from the message itself, the fields pop-uidl, imap-uid and imap-uid-validity form an exception. They are really hard data that cannot be calculated from anything else.

Some of the data deals with information from message headers. The header fields can have MIME-encoded words in them. The strings stored in the cached-data vector, however, are MIME-decoded versions of the header fields, but they also have text properties that store the names of the original character sets used in the header fields. This allows the strings to be quickly re-encoded for storage on disk.

byte-count. The size of the message in bytes.
weekday, monthday, month, year, hour, zone. Data indicating the date of the message.
full-name. The full name of the author of the message. This is a MIME-decoded string with text properties.
from. The email address of the author of the message. This is a MIME-decoded string with text properties.
message-id. The unique id of the message.
line-count. The number of lines in the message.
subject. The subject string of the message. This is a MIME-decoded string with text properties.
vheaders-regexp. A regular expression that can be used to find the start of the visible headers. The headers must have been already ordered so that the visible headers are at the bottom of the headers section.
to. Addresses of the recipients of the message in a comma separated string. This is a MIME-decoded string with text properties.
to-names. The full names of the recipients in a comma separated string. Addresses are used if full names are not available. This is a MIME-decoded string with text properties.
month-number. Numeric month of the sent date.
sortable-datestring. Date string of the sent date for sorting purposes (or delivery date if vm-sort-messages-by-delivery-date is set to t).
sortable-subject. The subject string for sorting purposes. (Prefixes such as “re:” are removed.) This is a MIME-decoded string with text properties.
summary. A tokenized summary for the message, from which the actual summary line can be quickly calculated. This is a list containing tokens, such as number and thread-indent, as well as MIME-decoded strings with text properties.
parent. The message ID of the parent of the message in its thread.
references. Message IDs listed in the References header of the message.
body-to-be-discarded. Flag that indicates whether they body of the message should be discarded before the folder is saved. (This is used in conjunction with the body-to-be-retrieved below.
body-to-be-retrieved. Flag that indicates whether the body of the message has not been retrieved from the mail server.
pop-uidl. The UIDL id of the message on the POP server.
imap-uid. The UID of the message on the IMAP server.
imap-uid-validity. The UIDVALIDITY value of the message on the IMAP server.
spam-score. The spam score of the message.

Mirror Data

Extra data shared by virtual messages if vm-virtual-mirror is non-nil.

edit-buffer. If the message is being edited, this is the buffer being used.
virtual-messages-sym. List of virtual messages mirroring the current real message, represented by an uninterned symbol written as “<v>”.
stuff-flag. Flag to indicates if the attribute changes have been “stuffed” into the folder buffer.
labels. List of labels attached to the message.
label-string. The string of labels attached to the message.
attribute-modflag. Flag to indicate if the attributes of the message have been modified since the last save.

MIME layout

The MIME layout of a message, stored in the soft data of the message, is in turn a vector containing various pieces of data. Such a vector is used not only for the overall message, but for all its MIME parts and subparts as well.

type. A list of strings consisting of the MIME type of the part along with its attributes. This comes from “Content-Type” header. The type could be of the form ‘type/subtype’. Quotation marks are stripped from attribute values. An example is ("multipart/mixed" "boundary=----_=_NextPart_001_01AFE588.63E23840").
qtype. Like type, but the quotation marks are not stripped.
encoding. The MIME encoding used for the part. It comes from the “Content-Transfer-Encoding” header.
id. The id obtained from the “Content-ID” header of the part.
description. A description string obtained from the “Content-Description” header of the part.
disposition. A list of strings obtained from the “Content-Disposition” header of the part. Quotation marks are stripped from attribute values. (An example is (``attachment'', ``filename=mydocument.doc'').)
qdisposition. Like disposition, but the quotation marks are not stripped.
header-start, header-end, body-start and body-end. Markers into the content buffer delineating the headers/body of the MIME part.
parts. A list of MIME layouts for the individual subparts of this part.
cache. A symbol that is unique to this MIME part. Other data is stored as properties of this symbol:
- vm-mime-display-external-generic. This property stores the id of the process used to externally display the MIME part as well as the name of the temporary file used.
- vm-mime-display-internal-image-xxxx. This property stores the name of the temporary file where the image is stored. For an image represented as image strips, it actually stores a list with a number of other data items.
- vm-image-modified. This property stores a boolean flag indicating that the image has been modified.
- vm-mime-display-internal-audio/basic. This property stores the name of the temporary file where the audio clip is stored.
- vm-message-garbage.
message-symbol. A reference to the message that contains the MIME part. Represented as a symbol (that is, an interned key into a hash table). This is a different symbol from the real-message-sym of the message.
display-error. If the display of a MIME part fails, its error string is stored here.
layout-is-converted. Flag indicating that MIME type conversion has been performed on this part. see MIME type conversion.
unconverted-layout. If the MIME type conversion has been performed on this part, then this holds the original unconverted layout.

Cross-buffer sharing of data

Every Folder buffer has a vm-message-list and a vm-message-pointer list containing message data vectors.

Every Presentation buffer also uses a vm-message-pointer list with a single message (the one being presented). The message data vector in the Presentation buffer has its own location data, but shares all other components with the message in the Folder buffer. This allows the Presentation buffer to, for example, change the attributes of the message without having to switch context to the Folder buffer.

Virtual folders, which contain only references to messages in other folders, store just a single message body in the Folder buffer. However, they have message descriptors for all the messages in vm-message-list. All the message descriptors use the same location data vector, because only one message body can be stored in the Folder buffer, but have separate Soft data vectors. (This allows, for instance, virtual folders to have their own threads, which could in general be different from the threads in the underlying folders.) The other sub-vectors are shared with the underlying real folders. (In particular, the tokenized summary line is the same in the virual folders and their underlying folders.)

27.3 Summary Internals

Generating a summary is quite a time-consuming operation. VM uses a variety of tricks to speed up the generation of summaries.

The format of the summary lines is specified in the variable vm-summary-line-format. The information that needs to go into the summary lines is divided into two classes:

Information that is fixed for each message. Examples include the subject, author and other header information.
Information that is variable during a VM session. Examples include the message number and thread indentation.

A tokenized summary line is a list whose elements can be strings, representing fixed information in a message, and tokens, representing variable information. VM calculates a tokenized summary line for each message and caches it in the cached-data vector. The following forms of tokens are used in tokenized summary lines:

number. Stands for the message number in the linear order of the summary.
mark. Stands for an indicator of message mark (whether the message is marked at present).
thread-indent. Stands for the indentation to be used for the message’s summary depending on its position in the message thread.
group-begin, group-end. Brackets used to denote groups of items that might have particular formatting constraints.

The function vm-tokenized-summary-insert converts a tokenized summary line into a string and inserts it in the summary buffer. The minibuffer message “Generating summary...” is used to show the progress of generating summary lines from tokenized summaries.

Buffer local variables in each Folder buffer responsible for maintaining summary information:

vm-summary-pointer. The message selected by the cursor in the Summary window.
vm-summary-redo-start-point. A pointer into the vm-message-list indicating the first message for which the summary line must be redisplayed. All the messages from here on are assumed to require a summary redisplay. The assumption is usually valid because the message numbers of all the succeeding messages might have changed. But, if message numbers are not included in the summary lines, then this results in unnecessary work.
vm-messages-needing-summary-update. The list of messages for which summary lines must be redisplayed. Messages are included in this list by calling the function vm-mark-for-summary-update.
vm-numbering-redo-start-point. A pointer into vm-message-list indicating the first message whose message number needs to be recalculated. vm-numbering-redo-end-point. A pointer into vm-message-list indicating the last message whose message number needs to be recalculated.

The beginning and the ending positions of each message summary line are stored in the message’s soft data vector. see Message Internals. The positions within the summary line have text-properties set, which give the data about the message:

vm-message. The message struct for which this line is a summary.

27.4 Threading Internals

Message threads required for threaded summaries are calculated using message ID’s, which are unique when the message was originally composed. However, VM may need to deal with multiple copies of the same message received via possibly different routes. So, message ID’s are not unique for messages inside VM.

Messages composed as replies generally have an “In-Reply-To” header. The message mentioned in this header is referred to as the parent of the message. In addition, messages also arrive with a “References” header which lists all the ancestors of the message, with the oldest message being listed first. The last message listed in the “References” header is the direct parent of message. It is important to keep in mind that all the messages listed in the “References” header may not be present in the VM folder.

Thread trees are constructed using the “In-Reply-To” headers and “References” headers. Jamie Zawinski has done a good analysis of the information contained in these headers which can be found on the web. VM’s threading algorithm is currently based on these ideas. These trees are called reference-based threads.

In addition, VM also allows threads to be built using the subject headers via the option vm-thread-using-subject. Subject-based threading is used in addition to reference-based threading. So, in a subject-based thread, the root message would be the oldest message with that subject and, below it, would be reference-based threads all of which share the same subject. The roots of these reference-based threads are referred to as the “members” of the subject thread. Subject threading is only one level deep, whereas reference threading can be arbitrarily deep.

Threads are built using two hash tables vm-thread-obarray and vm-thread-subject-obarray. The former keeps track of the thread obtained by following parent and reference chains. The latter keeps track of messages with the “same subject”. To prevent messages from jumping from one thread to another within the same VM session, the subject used is not the message’s own subject, but rather the subject of the oldest message in the thread. This subject is retained even if the oldest message is expunged.

The message ID’s are interned in vm-thread-obarray and the following information is stored for each message ID:

messages: The list of messages that carry this message ID in the folder. There could be none, if we only know this message from its appearance in other “References” headers.
message: The “canonical” message with this message ID. It is typically the first message encountered by VM with this message ID. If there are no messages with this ID, then the field is nil.
date: The date of the message.
parent: The interned message ID of the parent of this message. (The folder may or may not contain a message with this ID.) If there is no parent, then this is nil
children: The interned message ID’s of all the children of this message. (The folder may or may not contain messages with these ID’s.)
youngest-date: The date of the youngest message in the thread, among all the messages present in the folder.
oldest-date: The date of the oldest message in the thread, among the messages present in the folder.
oldest-subject: The subject of the oldest message in the thread, among the messages present in the folder.

The vm-thread-subject-obarray interns each subject string found in the folder and maps it to a vector containing the following elements:

id-sym: The interned message ID of what is likely to be the root of the thread, which is, at any rate, the oldest message with this subject.
date: The date of the root message.
members: A list of interned message ID’s for the “members” of the subject thread, which are messages without any reference-based ancestors. The root message represented by id-sym is not included as a member.
messages: The list of all the messages in the folder that have this subject.

Building threads involves calculating all the data stored with the vm-thread-obarray and vm-thread-subject-obarray. These two collections of data are calculated in sequence, because the subject threads are based on the reference threads.

After the threads are built, the thread-list, thread-indentation and the thread-subtree fields of the Soft data vector are calculated as needed on demand and cached. (See Soft data vector.) These fields cannot be calculated without building threads first.

When new messages are assimilated, they are added to the threads that might have been already built, and the thread-related fields in the Soft data vector are erased so that they will be recalculated. The thread-subtree field is erased for all the ancestors of the assimilated message. The thread-list and thread-indentation fields are erased for all the descendants of the assimilated message.

Before messages in the folder are expunged, they are unthreaded. This involves removing them from their respective thread trees. It also involves the erasure of the thread-subtree field of all their ancestors and the thread-list and thread-indentation fields of the descendants.

Error handling

The code for threading has to be robust in the presence of erroneous information in the message headers. We have no control over the mail clients that produce those messages and faulty information should not lead to VM hanging or producing errors. It should just do the best job it can in the presence of imperfect information.

It is possible that the information in the headers give rise to cycles in the thread trees. Kyle Jones’s original implementation allowed these cycles to exist, but all functions that traversed the thread trees were protected to detect cycles. However, since thread trees are updated when new messages are received or existing messages are expunged, this led to unstable results.

Following Jamie Zawinski’s recommendation, VM now avoids cycles in thread trees. Loop detection is still carried out during traversal as a double safeguard.

VM gives priority to the parent information contained in the “In-Reply-To” headers in preference to the information in the “References” headers. However, if an “In-Reply-To” header gives rise to a cycle, it is ignored, and then “References” headers might be used to fill in the missing information.

27.5 Sorting Internals

Sorting of messages in VM is carried out using the Emacs built-in sorting function, which is generic in the comparison operation to be used for sorting. The required comparison operation is expressed as a sequence of basic comparison operations such as comparison by date, by author, by subject etc. The dynamic variable vm-key-functions is bound to a list of comparison functions before calling the Emacs sort function.

The function vm-sort-compare-xxxxxx uses the functions listed in vm-key-functions to do the overall comparison. It compares the given messages using the key functions in sequence. If the first key function decides one of the messages to precede the other, then the comparison is over. If the messages are found to be equivalent according to the first key function then the second key function is tried and, if they are still equivalent, then the next key function is tried and so on. This is called the lexicographic combination of the given key functions.

Sorting by threads is special. When messages are to be sorted by threads, all the messages belonging to a thread should appear together. The required effect is achieved by using vm-sort-compare-thread as the first key function in the sequence. This function checks to see if the two messages belonging to the same thread. If they do then the farthest ancestors of the two messages that share the same parent are returned so that the remaining comparison operations can be applied to these ancestors. The rationale is that these ancestors are the roots of the thread subtrees that the two messages belong to. So, the relative ordering of the messages should be the same as the relative ordering of these ancestors. If the two messages belong to different threads then the thread roots of the two messages are returned, again with the same rationale.

Threaded summaries can be sorted by any key, e.g., by author (full-name). It is most common to sort them by “activity,” i.e., the order of the most recent message in the thread or subthread. Sorting them by “date” means using the date of the root message of the thread or subthread.

27.6 User Interaction

For each mail folder, VM creates three kinds of buffers in Emacs: the Folder buffer, the Presentation buffer and the Summary buffer. All three types of buffers have the same user interface as far as possible: the same key bindings, menu bars, tool bars and also the same commands. The functions implementing the commands must therefore work irrespective which of the three buffers they are invoked in. This makes VM quite different from most Emacs modes.

VM stores the identity of the Folder buffer in a buffer-local variable vm-mail-buffer in each of the other types of buffers. Conversely, each Folder buffer uses buffer-local variables vm-summary-buffer and vm-presentation-buffer to store the identity of the other buffers.

Whenever a VM command is invoked by the user, VM calls a function called vm-select-folder-buffer-and-validate, which sets the current-buffer to the Folder buffer. It also stores the identity of the buffer with the user’s focus in a global variable called vm-user-interaction-buffer. Thus, at every point during the command execution, VM has knowledge of all the buffers involved as well as the buffer in which the command execution was initiated.

[More to be filled in on vm-display etc.]

The default menu bar of VM contains VM-specific menus, replacing the standard Emacs menus. This is achieved by setting the buffer-specific menu bar to one in which the Emacs menus are undefined (at least in Gnu Emacs).

VM computes its standard menu bar and stores it internally:

In Gnu Emacs, this is stored in the keymap vm-mode-menu-map.
In XEmacs ...

The menu bar also has a menu, or a menu item, to switch back to the standard Emacs menu bar. The computed menu bar is then installed depending on the setting of vm-use-menus. If the user selects the action to revert to the standard Emacs menu bar, the installation is easily reverted.

In Gnu Emacs, the installation involves inserting a key binding for menu-bar.
In XEmacs, ...

When the user picks a menu item to revert to the Emacs menu bar, the function vm-menu-toggle-menubar is invoked, which installs a fresh menu bar retaining the standard Emacs menus. The same function is used to reinstall the dedicated VM menu bar when needed.

27.7 Coding Systems

A Coding System is a way of encoding characters as bit patterns. see Coding System Basics in Emacs Lisp manual. US-ASCII is a coding system for English. Other coding systems are used to encode the various languages of the world, e.g., iso-latin-1 for Western European languages, and hebrew-iso-8bit for Hebrew. Emacs also uses its own internal coding system for characters, which can encode all character sets currently in existence. But the internal coding system can vary between different versions of Emacs.

Emacs defines a property called mime-charset for each implemented coding system, which is the official preferred name of the MIME character set that it corresponds to. For example, iso-latin-1 corresponds to the MIME charset iso-8859-1, and hebrew-iso-8bit corresponds to the MIME charset iso-8859-8. The Emacs function coding-system-get can be used to extract the mime-charset property of a coding system. VM stores all the known coding systems and the corresponding MIME charsets in its internal variables vm-mime-mule-coding-to-charset-alist and vm-mime-mule-charset-to-coding-alist.

MIME messages specify the character set that their content is in, in the Content-Type header. VM uses this information to decode the content to the Emacs internal coding system. This is done using the function decode-coding-region. Conversely, VM encodes the outgoing messages into the default or chosen MIME character set using the function encode-coding-region.

The headers of email messages can only be in US-ASCII. So header fields in other character sets are encoded using either base-64 or quoted-printable encoding (which give ASCII strings) and annotated with the name of the original character set. Such annotations look like =?charset?B?. They can apply to individual words or sequences of words appearing the in the headers. Note that the annotation ?B? signifies base-64 encoding of the byte stream. Similarly the annotation ?Q? might be used to denote the quoted-printable encoding. VM decodes such strings using the function decode-coding-string. Conversely, the headers of outgoing messages are encoded using encode-coding-string

27.8 Virtual Folder Internals

A virtual folder is characterized by its definition, which is stored in the buffer-local variable virtual-folder-definition. The form of the definition is as given in vm-virtual-folder-alist. See vm-virtual-folder-alist. It is a collection of clauses, with each clause listing a collection of folders and a collection of virtual selectors.

Each virtual selector X has a corresponding Lisp function ‘vm-vs-X’, whose purpose is to check whether a given message matches the selector. The arguments for ‘vm-vs-X’ are a message data structure m and all the arguments for the virtual selector X.

For example, the virtual selector author has a string argument, representing the author name. The corresponding Lisp function is defined as:

(defun vm-vs-author (m author-name)
  (or (string-match author-name (vm-su-full-name m))
      (string-match author-name (vm-su-from m))))

The definition checks to see if the given author-name pattern occurs in the full name of the author (vm-su-full-name) or the email address of the author (vm-su-from).

The author selector is then registered in four places:

The variable vm-virtual-selector-function-alist, which contains pairs of the form ‘(SELECTOR . FUNCTION)’. For the author selector, the pair is (author . vm-vs-author).
The selector symbol author is given a property vm-virtual-selector-arg-type indicating the type of argument it requies:
```
(put 'author 'vm-virtual-selector-arg-type 'string)
```
The variable vm-supported-interactive-virtual-selectors, which contains lists of strings, each string being the name of a virtual selector. For the author selector, the list is ("author"). Including the selector in this variable allows it to be used in creating interactive virtual folders (search folders).
The selector symbol author is given a property vm-virtual-selector-clause indicating the prompt string for interactive use:
```
(put 'author 'vm-virtual-selector-clause "with author matching")
```

Evidently, the last two registrations are only needed for interactive selectors that can be used with the V C command.

27.9 MIME Display

The MIME layout of a message is stored in the mime-layout field of the Soft data vector of the message. (See MIME layout.) The MIME layout is in general a tree structure of “MIME parts”. The function vm-decode-mime-layout is responsible for traversing the tree structure at each MIME part and displaying it appropriately.

The function vm-decode-mime-layout goes through the following sequence of decisions:

If the MIME part is a multipart type, then the subparts are displayed as needed. If it is a single part, it proceeds as follows.
If the MIME part should not be displayed automatically, it is displayed as a button. (An automatically displayed MIME type is one listed in vm-mime-auto-displayed-content-types but not listed in the corresponding exceptions.)
If the MIME part should be displayed internally and VM is able to do so, then it is displayed internally. (An internally displayed MIME type is one listed in vm-mime-internal-content-types but not listed in the corresponding exceptions.)
Otherwise, the MIME part is displayed externally. An external viewer is found from vm-mime-external-content-types-alist and it is invoked to display the MIME part.

MIME parts of type ‘message/external-body’ need special treatment. If they are not asked to be auto-displayed, then they are displayed as buttons, but the button caption may use information from the child part (the actual object that is in the external-body) such as its type and description. If a message/external-body part is asked to be auto-displayed, then the child part is fetched from the external source and stored in an internal buffer. It may be auto-displayed if it is appropriate to do so, or shown in turn as a button.

MIME buttons are displayed as regions of text displaying button labels. In addition, they have an overlay/extent placed on them, which has a number of properties associated with it:

vm-button. Always t.
vm-mime-layout. Gives the layout of the MIME part.
vm-mime-function. The function that carries out the action represented by pressing the button.
vm-mime-disposable. Set to true if the button should be removed when it is replaced by the MIME object.
face. Set to the value of vm-mime-button-face.
local-map (FSF Emacs) or keymap (XEmacs). Set to a keymap that includes vm-mime-reader-map, binding the $ keys.

27.10 MIME Composition

A MIME message is composed just like a normal message. When objects are attached using commands like vm-attach-file, attachment buttons are created in the message composition buffer. An attachment button is a region of text that looks like:

[Attachment mary.jpeg, image/jpeg]

Various text properties are associated with an attachment button, allowing it to be turned into an actual attachment when the message is sent.

The representation of the attachment buttons differs in GNU Emacs and XEmacs. In GNU Emacs, the region of text is given text properties that represent the metadata about the object. In XEmacs, the region of text is given an extent, which is then given properties representing the metadata. The reason for the different representations is that in GNU Emacs, only text properties are preserved under killing and yanking.

The following properties are defined for attachment buttons:

vm-mime-object. The object denoting the MIME attachment. It is either
- a string denoting a file name,
- a buffer containing the file to be attached,
- a list of the form (buffer, start, end, filename) indicating a region in a buffer, typically the Folder buffer, or
- t indicating that the attachment is another MIME object in a VM folder.
In the last case, the vm-mime-layout property describes the rest of the metadata.
vm-mime-type. A string denoting the MIME type of the object. (Note that it is a single string, unlike the type component of a MIME layout.)
vm-mime-parameters. A list of strings denoting the parameters of the MIME type.
vm-mime-description. A string for the MIME description of the object.
vm-mime-disposition. A list describing the MIME disposition.
vm-mime-encoded. A boolean indicating whether the object has MIME headers.
vm-mime-encoding. The MIME encoding used, if it is already encoded.
vm-mime-forward-local-refs. Whether or not references to local external-body objects should be forwarded as is.
fontified. Standard text property.
duplicable. Set to t in XEmacs allowing the extent to be preserved under killing and yanking.
front-nonsticky and rear-nonsticky. Standard stickiness of text properties in GNU Emacs.

When a composed message is sent, the attachment buttons are replaced by actual attachment objects. In FSF Emacs, the attachment buttons are first converted into “fake” overlays before MIME encoding, in a function called vm-mime-fake-attachment-overlays. This allows the next stage to treat both FSF Emacs and XEmacs using the same logic.

The function vm-mime-encode-composition then encodes the composition buffer, by selecting each attachment button and replacing it with the corresponding object. The bodies of ‘external-body’ objects are also retrieved at this stage. Unless the objects were already MIME-encoded, they are MIME-encoded and made into MIME parts by adding suitable headers. The message itself is given MIME headers describing its content and then handed to Emacs message-sending functions.

Yanking or Forwarding MIME Messages

When another message is yanked or “included” in a message composition, the handling of attachments depends on the variable vm-include-mime-attachments. If the variable is nil, then the attachments are displayed as token buttons in plain text that appear similar to:

[DELETED ATTACHMENT mary.jpg, image/jpeg]

The function vm-decode-mime-layout is employed to generate the yanked text along with such token buttons.

If vm-include-mime-attachments is t, then first the vm-decode-mime-layout function is employed to generate proper MIME buttons for all the attachments. In a second step, the MIME buttons are replaced by attachment buttons using a function called vm-mime-convert-to-attachment-buttons. These attachment buttons are then handled as described above.

27.11 Extents and Overlays

XEmacs and GNU Emacs differ in how they represent non-textual properties in buffers. The web page on “XEmacs vs GNU Emacs” describes the situation as follows:

XEmacs uses "extents" to represent all non-textual aspects of buffers; GNU Emacs 19 uses two distinct objects, "text properties" and "overlays", which divide up the functionality between them. Extents are a superset of the union of the functionality of the two GNU Emacs data types. The full GNU Emacs 19 interface to text properties and overlays is supported in XEmacs (with extents being the underlying representation).

Extents can be made to be copied into strings, and then restored, by kill and yank. Thus, one can specify this behavior on either "extents" or "text properties", whereas in GNU Emacs 19 text properties always have this behavior and overlays never do.

While extents and overlays look similar on the surface, they differ fundamentally in that extents are attached to text and, so, can be killed and yanked, whereas overlays are not attached to text. XEmacs has implemented GNU-like text properties on top of extents. So, text properties may work more uniformly in both the Emacsen, but VM was developed in the early days of the forking and does not use these common features.

The file vm-misc.el contains definitions whereby both extents and overlays can be treated as a single type of “VM extents”. Wherever such VM extents can be used, there is some uniformity in the code but, in other places, there is not. (Independently, the XEmacs team has developed the fsf-compat package by which FSF-style overlays are implemented on top of extents. This package is not compatible with the way VM deals with the two types.)

Another major differences between extents and overlays is that the beginning and ending of overlays are markers. This has some advantages. However, if a buffer has many overlays, normal editing operations must update all the overlay markers, which can be time-consuming.

The major applications of extents and overlays in VM are the following:

Summary buffers use extents/overlays for each summary line. These are implemented uniformly but, to avoid the performance problem in GNU Emacs, all the markers are reset to nil before a summary is regenerated and then set to their correct positions afterwards. Not doing this correctly can seriously degrade the performance of summary generation.
Presentation buffers use extents/overlays for MIME buttons. These are implemented uniformly.
The message composition buffers have attachment buttons. These are implemented using text properties in GNU Emacs and extents in overlays. The difference is necessary because VM allows the attachment buttons to be killed and yanked. It is not possible to implement this functionality using overlays.

27.12 Timers and Concurrency

VM has been designed as mainly a sequential program. However, there three timer tasks that get scheduled to occur at regular intervals:

vm-flush-itimer-function: Stores message attributes in the folder so that they will be saved when an auto-save is done. This is controlled by the variable vm-flush-interval.
vm-get-mail-itimer-function: Moves new mail from maildrops into the folder. This is controlled by the variable vm-auto-get-new-mail.
vm-check-mail-itimer-function: Checks the maildrops for any new mail. This is controlled by the variable vm-mail-check-interval.

These timer tasks are scheduled using the itimer package in XEmacs and the timer package in Gnu Emacs.