What follows is my best effort in giving the big-ascii-picture
of what happens when smd-pull
is run. smd-push
simply
swaps smd-server
and smd-client
. Note that the sync direction is
from smd-server
to smd-client
, so running them on the opposite hosts
inverts the sync direction.
Your mail server Your laptop
---------------- -----------
--- sync direction ---> smd-pull
|
|
smd-server ------- ssh ----- smd-client
| |
| |
mddiff (mddiff)
smd-client
uses mddiff
only compute sha1 sums, not to compute
a diff as smd-server
does.
Both endpoints hold a file (the db-file
described below) that describes the
status of the mailbox on which they previously agreed. The server will compute
the difference between the current mailbox status and the previous one on which
the client agreed. This diff is sent to the client, that tries to apply it,
possibly requesting some data to the server. If the client succeeds, both the
server and the client now agree on the new mailbox status.
The idea is quite simple. If ===
is a double pipe (a pair of pipes, one
for stdin
and one for stdout
), smd-pull
simply performs the following
smd-client $CLIENTNAME $MAILBOX === tee log === \
ssh $SERVERNAME smd-server $CLIENTNAME $MAILBOX
The tee
command is used only for logging, and if $DEBUG is false
it is
replaced by cat
. Viceversa smd-push
performs what follows
smd-server $CLIENTNAME $MAILBOX === tee log === \
ssh $SERVERNAME smd-client $CLIENTNAME $MAILBOX
They are both implemented in bash
, since their main activity is to
redirect standard file descriptors and call other tools, check their exit
status and eventually notify the user with an extract of their logs.
The idea is to mimic cron, but retry a failed sync attempt if the
given error is transient. smd-client
and smd-server
output TAGS
that specify if the occurred error needs human intervention or not, and
also suggest some actions, like retry. smd-loop
understands these tags,
and gives a second chance to a command that fails with an error that does not
require human intervention and for which the suggested action is retry.
It is implemented in bash
, since it is mostly a while true loop. Arrays
(non POSIX shell compliant) are used to record failures, and give only a
second chance to every smd-push
or smd-pull
command.
To write an hopefully eye-candy applet for GNOME, the language Vala was an
intriguing choice, since it is based on smart and sound ideas (that is
to avoid the C++ non-standardized calling conventions) to provide a modern
object oriented programming language built around gobject and glib. Bindings
for GTK+, GConf, libnotify, etc... are available, and require no compiled
glue code, just bare text .vapi
files.
If you are used to languages where writing bindings is not a trivial task, you'd better look at Vala, where bindings are simple by design.
A server software (smd-server
) and a client software (smd-client
) are
respectively used to transmit the diff generated by mddiff
and eventually
mails header or body, and to apply a diff eventually requesting necessary
data to the other endpoint.
Since they mostly implement policies, like deciding if a diff can be applied or not, are implemented in an high level scripting language called Lua. The language choice is almost arbitrary, there are no strong reasons for adopting Lua instead of python or others, but its installation is pretty small and it executes quite fast. Moreover, its syntax is particularly simple, making it understandable to non Lua experts too. Finally, I find it elegant.
They send and receive data on their standard input and output channels,
delegating to external tools the transmission of data across a network, and
optimizations like compressing the data, or encrypting it.
OpenSSH can do both, and is adopted by
smd-pull
and smd-push
to connect smd-client
to smd-server
.
A simple protocol defines how smd-client
requests data to smd-server
and how smd-client
notifies smd-server
that all changes have been
applied correctly.
The protocol is line oriented for commands, chunk oriented for data transmission.
Both client and server send the following two messages, and check that they are equal to the ones sent by the other endpoint
protocol NUMBER
dbfile SHA1
This part of the protocol is called handshake
The server sends the output of mddiff
(that is line oriented)
and then the following message to conclude the first phase of the protocol,
now the client is expected to reply
END
The client, from now on, can at any time send the following (alternative) messages
ABORT
COMMIT
The former informs the server that the client was unable to apply the
diff generated by mddiff
, while the latter informs the server that all
changes were applied successfully.
In response to a COMMIT
message, se server will transmit an xdelta
patch the client has to apply to its db-file.
The client replies with DONE
to complete the synchronization
After point 2. and before point 3. the client can send the following
commands to the server, that can reply transmitting data or with
ABORT
. NAME is not URL encoded.
GET NAME
GETHEADER NAME
GETBODY NAME
The server can transmit data or refuse. In the latter case it just sends
ABORT
. In the former case it sends
chunk NUMBER
...DATA...
First it declares with chunk
the number of bytes to be sent, then
its sends the data.
Maildir diff (mddiff
) computes the delta from an old status of a maildir
(previously recorded in the db-file) and the current status, generating a
set of commands (a diff) that a third party software can apply to
synchronize a (remote) copy of the maildir.
This software uses sha1 to compute snapshots of a maildir, and computes a set of actions a client should perform to sync with the mailbox status. This software alone is unable to synchronize two maildirs. It has to be supported but an higher level tool implementing the application of actions and data transfer over the network if the twin maildir is remote.
To cache the expensive sha1 calculation, a cache file is used. Every run the program generates a new status file (appending .new) that must substitute the old one if generated actions are committed to the other maildir. Cache files are specific to the twin maildir, if you have more than one, you must use a different cache file for each of them.
The db-file (say db.txt) is paired with a timestamp (db.txt.mtime) that is used to store the timestamp of the last run and files whose mtime does not exceed this timestamp will not be (re)processed next time mddiff is run.
The .mtime companion file is updated only server side, since the mtime concept is local to the host running mddiff.
The db-file is composed by two files, a real database file (extension .txt) and a timestamp (extension .txt.mtime). The latter contains just a number (date +%s). The former is line oriented, every line has 3, space separated, fields: - the sha1 sum of the header - the sha1 sum of the body - the name of the file, not URL encoded
From now on, name refers to a file name, hsha to the sha1 sum of its header and bsha to the sha1 sum of its body.
ADD name hsha bsha
is generated whenever a new mail message is found,
and there is no mail message with a different name but the same body.COPY name hsha bsha TO newname
is generated if a new message is found,
that the mailbox contains a copy of it. In case mail has been moved,
this message is followed by a DELETE
command.COPYBODY name bsha TO newname newhsha
is generated when a new file is
created, and that file has the same body of an already existent file.
In case mail has been moved, this message is followed by a DELETE
command.
This happens when a new message is moved to another directory and marked
in some way changing its header (for example when a new message is
moved to the trash bin)DELETE name hsha bsha
is emitted when a message is no longer present.REPLACEHEADER name hsha bsha WITH newhsha
is emitted whenever
a message that was already present has a different header but the same body.REPLACE name hsha bsha WITH newhsha newbsha
is emitted whenever the body
(and eventually the header) of mailmassage change. This never happens
in practice, since MUAs should do a copy of the edited message, not replace
it.ERROR message
is emitted whenever an error is encountered; message is
intended to be human readable.Messages should be processed in order, with the exception of ADD
that can be
safely postponed. In particular DELETE
messages are always sent last, and
COPY
or COPYBODY
messages preceeding them may refer to the same file
name
. Performing deletions in advance is still sound (since the client
can always ask the servevr for the message) but clearly suboptimal, since
a local copy does not involve any network traffic.
File names are URL encoded escaping only ' '
(%20
) and '%'
(%25
).
mddiff
as an hashing servermddiff
is also used by the client to compute the sha1 sums of header
and body of local mails, for example to check that the source of a copy
command holds the intended content. Since this operation may be really
frequent, mddiff
can operate in server mode. If the argument is a single
file name and that file is a fifo, then mddiff
reads file names not URL
encoded, separated by \n
from that fifo and outputs the sha1 sums of
their header and body.
mddiff
as an mkdir -p; ln
servermddiff
is also used by the client to create the indirection layer
needed to ranme mailbox folders. If the argument is a single
file name and that file is a fifo and the -s
flag is passed, then mddiff
reads directories names not URL encoded, separated by \n
, 2 at a time,
from that fifo. The first one is the source path, the latter the target.
Then it behaves like mkdir -p $(dirname $target); ln -s $source $target
.
For example if source is ~/Mail/foo/cur
and the target is Maildir/.foo/cur
then mddiff
will create the direcotries Maildir
and Maildir/.foo
and place in the latter a link named cur
to ~/Mail/foo/cur
.
smd-pull
and smd-push
prefix all error messages with ERROR:
, but
what follows is meant to be read by a human being. To make other tools able to
parse and react to error messages, a more formal output is given.
A single line, prefixed with TAGS:
is output if requested (-v
option).
It can be followed by error::
or stats::
, that denote an error message or a
statistical one respectively. Then a list of improperly called tags is output.
Their meaning should be easy to guess.
<M> ::= "error::" <ET> | "stats::" <ST> | "stats::" <DR>
<ET> ::= "context(" <STR> ")"
"probable-cause(" <STR> ")"
"human-intervention(" <HI> ")"
<SA>
<SA> ::= | "suggested-actions(" <ACTS> ")"
<STR> ::= `[^)]+`
<HI> ::= "necessary" | "avoidable"
<ACT> ::= <A> | <A> <ACTS>
<A> ::= "run(" <STR> ")"
| "display-mail(" <STR> ")"
| "display-permissions(" <STR> ")"
<ST> ::= "new-mails(" <NUM> ")" <SPC>
"del-mails(" <NUM> ")" <SPC>
"bytes-received(" <NUM> ")" <SPC>
"xdelta-received(" <NUM> ")" <SPC>
"xdelta-received(" <NUM> ")"
<DR> ::= "mail-transferred(" <ML> ")"
<ML> ::= <STR> | <STR> " , " <ML>
<NUM> ::= `[0-9]+`
<SPC> ::= ` *,? *`