DESIGN

What follows is my best effort in giving the big-ascii-picture of what happens when smd-pull is run. smd-push simply swaps smd-server and smd-client. Note that the sync direction is from smd-server to smd-client, so running them on the opposite hosts inverts the sync direction.

Your mail server           Your laptop
----------------           -----------


      --- sync direction ---> smd-pull
                                |
                                |
smd-server ------- ssh ----- smd-client
    |                           |
    |                           |
  mddiff                     (mddiff)

smd-client uses mddiff only compute sha1 sums, not to compute a diff as smd-server does.

Both endpoints hold a file (the db-file described below) that describes the status of the mailbox on which they previously agreed. The server will compute the difference between the current mailbox status and the previous one on which the client agreed. This diff is sent to the client, that tries to apply it, possibly requesting some data to the server. If the client succeeds, both the server and the client now agree on the new mailbox status.

END USER tools

smd-pull and smd-push

The idea is quite simple. If === is a double pipe (a pair of pipes, one for stdin and one for stdout), smd-pull simply performs the following

smd-client $CLIENTNAME $MAILBOX === tee log === \
  ssh $SERVERNAME smd-server $CLIENTNAME $MAILBOX

The tee command is used only for logging, and if $DEBUG is false it is replaced by cat. Viceversa smd-push performs what follows

smd-server $CLIENTNAME $MAILBOX === tee log === \
  ssh $SERVERNAME smd-client $CLIENTNAME $MAILBOX

They are both implemented in bash, since their main activity is to redirect standard file descriptors and call other tools, check their exit status and eventually notify the user with an extract of their logs.

smd-loop

The idea is to mimic cron, but retry a failed sync attempt if the given error is transient. smd-client and smd-server output TAGS that specify if the occurred error needs human intervention or not, and also suggest some actions, like retry. smd-loop understands these tags, and gives a second chance to a command that fails with an error that does not require human intervention and for which the suggested action is retry.

It is implemented in bash, since it is mostly a while true loop. Arrays (non POSIX shell compliant) are used to record failures, and give only a second chance to every smd-push or smd-pull command.

smd-applet

To write an hopefully eye-candy applet for GNOME, the language Vala was an intriguing choice, since it is based on smart and sound ideas (that is to avoid the C++ non-standardized calling conventions) to provide a modern object oriented programming language built around gobject and glib. Bindings for GTK+, GConf, libnotify, etc... are available, and require no compiled glue code, just bare text .vapi files.

If you are used to languages where writing bindings is not a trivial task, you'd better look at Vala, where bindings are simple by design.

SERVER/CLIENT interaction

A server software (smd-server) and a client software (smd-client) are respectively used to transmit the diff generated by mddiff and eventually mails header or body, and to apply a diff eventually requesting necessary data to the other endpoint.

Since they mostly implement policies, like deciding if a diff can be applied or not, are implemented in an high level scripting language called Lua. The language choice is almost arbitrary, there are no strong reasons for adopting Lua instead of python or others, but its installation is pretty small and it executes quite fast. Moreover, its syntax is particularly simple, making it understandable to non Lua experts too. Finally, I find it elegant.

They send and receive data on their standard input and output channels, delegating to external tools the transmission of data across a network, and optimizations like compressing the data, or encrypting it. OpenSSH can do both, and is adopted by smd-pull and smd-push to connect smd-client to smd-server.

A simple protocol defines how smd-client requests data to smd-server and how smd-client notifies smd-server that all changes have been applied correctly.

The protocol

The protocol is line oriented for commands, chunk oriented for data transmission.

  1. Both client and server send the following two messages, and check that they are equal to the ones sent by the other endpoint

    protocol NUMBER
    dbfile SHA1
    

    This part of the protocol is called handshake

  2. The server sends the output of mddiff (that is line oriented) and then the following message to conclude the first phase of the protocol, now the client is expected to reply

    END
    
  3. The client, from now on, can at any time send the following (alternative) messages

    ABORT
    COMMIT
    

    The former informs the server that the client was unable to apply the diff generated by mddiff, while the latter informs the server that all changes were applied successfully.

  4. In response to a COMMIT message, se server will transmit an xdelta patch the client has to apply to its db-file.

  5. The client replies with DONE to complete the synchronization

  6. After point 2. and before point 3. the client can send the following commands to the server, that can reply transmitting data or with ABORT. NAME is not URL encoded.

    GET NAME
    GETHEADER NAME
    GETBODY NAME
    

Transmission

The server can transmit data or refuse. In the latter case it just sends ABORT. In the former case it sends

chunk NUMBER
...DATA...

First it declares with chunk the number of bytes to be sent, then its sends the data.

MAILDIR DIFF

Maildir diff (mddiff) computes the delta from an old status of a maildir (previously recorded in the db-file) and the current status, generating a set of commands (a diff) that a third party software can apply to synchronize a (remote) copy of the maildir.

How it works

This software uses sha1 to compute snapshots of a maildir, and computes a set of actions a client should perform to sync with the mailbox status. This software alone is unable to synchronize two maildirs. It has to be supported but an higher level tool implementing the application of actions and data transfer over the network if the twin maildir is remote.

To cache the expensive sha1 calculation, a cache file is used. Every run the program generates a new status file (appending .new) that must substitute the old one if generated actions are committed to the other maildir. Cache files are specific to the twin maildir, if you have more than one, you must use a different cache file for each of them.

The db-file (say db.txt) is paired with a timestamp (db.txt.mtime) that is used to store the timestamp of the last run and files whose mtime does not exceed this timestamp will not be (re)processed next time mddiff is run.

The .mtime companion file is updated only server side, since the mtime concept is local to the host running mddiff.

The db-file format

The db-file is composed by two files, a real database file (extension .txt) and a timestamp (extension .txt.mtime). The latter contains just a number (date +%s). The former is line oriented, every line has 3, space separated, fields: - the sha1 sum of the header - the sha1 sum of the body - the name of the file, not URL encoded

The commands

From now on, name refers to a file name, hsha to the sha1 sum of its header and bsha to the sha1 sum of its body.

Messages should be processed in order, with the exception of ADD that can be safely postponed. In particular DELETE messages are always sent last, and COPY or COPYBODY messages preceeding them may refer to the same file name. Performing deletions in advance is still sound (since the client can always ask the servevr for the message) but clearly suboptimal, since a local copy does not involve any network traffic.

File names are URL encoded escaping only ' ' (%20) and '%' (%25).

mddiff as an hashing server

mddiff is also used by the client to compute the sha1 sums of header and body of local mails, for example to check that the source of a copy command holds the intended content. Since this operation may be really frequent, mddiff can operate in server mode. If the argument is a single file name and that file is a fifo, then mddiff reads file names not URL encoded, separated by \n from that fifo and outputs the sha1 sums of their header and body.

mddiff as an mkdir -p; ln server

mddiff is also used by the client to create the indirection layer needed to ranme mailbox folders. If the argument is a single file name and that file is a fifo and the -s flag is passed, then mddiff reads directories names not URL encoded, separated by \n, 2 at a time, from that fifo. The first one is the source path, the latter the target. Then it behaves like mkdir -p $(dirname $target); ln -s $source $target. For example if source is ~/Mail/foo/cur and the target is Maildir/.foo/cur then mddiff will create the direcotries Maildir and Maildir/.foo and place in the latter a link named cur to ~/Mail/foo/cur.

Easy to parse output messages

smd-pull and smd-push prefix all error messages with ERROR:, but what follows is meant to be read by a human being. To make other tools able to parse and react to error messages, a more formal output is given. A single line, prefixed with TAGS: is output if requested (-v option). It can be followed by error:: or stats::, that denote an error message or a statistical one respectively. Then a list of improperly called tags is output. Their meaning should be easy to guess.

<M>    ::= "error::" <ET> | "stats::" <ST> | "stats::" <DR>
<ET>   ::= "context(" <STR> ")" 
           "probable-cause(" <STR> ")"
           "human-intervention(" <HI> ")"
           <SA>
<SA>   ::= | "suggested-actions(" <ACTS> ")"
<STR>  ::= `[^)]+`
<HI>   ::= "necessary" | "avoidable"
<ACT>  ::= <A> | <A> <ACTS>
<A>    ::= "run(" <STR> ")" 
        |  "display-mail(" <STR> ")" 
        |  "display-permissions(" <STR> ")"
<ST>   ::= "new-mails(" <NUM> ")" <SPC>
           "del-mails(" <NUM> ")" <SPC>
           "bytes-received(" <NUM> ")" <SPC>
           "xdelta-received(" <NUM> ")" <SPC>
           "xdelta-received(" <NUM> ")"
<DR>   ::= "mail-transferred(" <ML> ")"
<ML>   ::= <STR> | <STR> " , " <ML>
<NUM>  ::= `[0-9]+`
<SPC>  ::= ` *,? *`