Lua
lua-rdiff: librsync for Lua.

What is? · Concepts · Lua API · Examples


What is lua-rdiff?

It all began with rsync by Andrew Tridgell. He created an innovative algorithm to transfer and update files efficiently, by calculating the needed difference between two files at different locations. The implementation of this algorithm is a great utility, and soon became an important part of any Unix toolkit.

Unfortunately, rsync is great as an standalone utility, but it's hard to integrate within another program. An attempt to make apply the rsync algorithm to a slightly different problem was rproxy, by Martil Pool. One of the first requirements for rproxy was a library to encapsulate the rsync alrogithm, so he first created librsync.

Since it's a library intended to be used in very different environments, librsync deals exclusively with data; it doesn't include any facility to manage files or network, and doesn't specify protocol between sender and receiver. Its only concern is analysing data to create and apply signatures and deltas.

The rproxy project has been inactive since late 2002; but librsync continued to be developed, and is used in some other projects, like unison, rdiff-backup and superlifter.

Concepts:

The rsync algorithm has three phases: signature creation, delta creation, and delta application or patching. Here i'll describe the process of updating "fileA" to a new version called "fileB".

Signature

A signature is a set of block checksums. fileA is split in a series of blocks (by default 2048 bytes each) and for each one, two checksums are calculated. One is called the "rolling checksum", and the other is the "strong checksum". A rolling checksum is one with a property that makes it efficient to scan a big data stream generating the checksum at every offset. The strong checksum, on the other hand has to be calculated from scratch for each data block, but it's much less likely to generate collitions. By default, the strong checksum used is MD4.

Using the default 32bit rolling checksum and the 64bit MD4 for the strong, we have 12bytes of signature for each data block. With 2048 bytes on each block, that means the signature is 12/2048 (less than 0.6%) the size of fileA. In other words, 6KB for each Megabyte of fileA.

An rsync-like file update begins when the destination generates the signature of the file it already has, and transfers it to the sender.

Delta

The Delta is what is needed to convert fileA into fileB. It's calculated using fileB and the signature from fileA. Note that it's not needed to have fileA to calculate the Delta, only its signature. It's a set of 'commands', that tells how to reconstruct fileB using as much of fileA as possible. Any other data on fileB that isn't present anywhere on fileA is included in the delta. This library doesn't perform any additional compression to the data included in the delta.

When the sender has received the signature of fileA, it calculates the delta and transfers it back to the destination.

Patch

The process of applying the delta to fileA is called patching, and the result should be identical to fileB. Since the delta contains references to arbitrary blocks of fileA, it's important to be able to access randomly to fileA's data.

The Lua API:

Since lua-rdiff doesn't do any file or network I/O, it uses the concept of data sources and data sinks. A data source can be simply a Lua string, or a Lua function that would be called repeatedly to get the data piece by piece. A data sink is an optional function that, if provided, is called with the produced data. If the data sink null or unspecified, the API call returns the data as a string.

The lua-rdiff package defines the table rdiff, containing three functions: rdiff.signature(), rdiff.delta() and rdiff.patch().

rdiff.signature (srcA [,sink])

Calculates the signature of the data given by srcA. This function only needs sequential access to the data, so if srcA is a function, it'll be called without parameters, and should return chunks of data as strings or nil to signal end of data.

If sink is given, it should be a function and would be called with a string to be written sequentially to the desired output. On sucess, rdiff.signature() returns true, on error it returns nil and an error code.

If sink is not given, rdiff.signature() returns the signature as a Lua string, and nil and an error code on failure.

rdiff.delta (signature, srcB, [,sink])

Creates the delta needed to transform fileA into fileB. Both signature and srcB are data sources read sequentially. The delta is either output on sink or returned as a Lua string.

rdiff.patch (delta, srcA [,sink])

Applies the given delta into the base file. Both delta and srcA parameters are data sources; but srcA must provide random-access. If it's given as a function, it will be called with two number parameters specifying a position and length, respectively. It should read a block of data and return it as a string.

Examples

Data in strings

Assuming the original data is in strA and strB, this code creates the signature for strA, calculates the delta from strA to strB, and recreates strB from strA.

sig_str   = rdiff.signature (strA)
delta_str = rdiff.delta (sig_str, strB)
new_str   = rdiff.patch (delta_str, strA)

Data in files

Data Source/Sink

This function creates a simple file-based data source from a given open file. Note that it provides sequential data when called without parameters, and random-access when called with pos and len.

function fsrc (f)
	return function (pos, len)
		if pos == nil and len == nil then
			return f:read (4096)
		else
			f:seek ("set", pos)
			return f:read (len)
		end
	end
end

This is the corresponding data sink creator: we don't need any random-access sink, so it's much simpler.

function fsink (f)
	return function (data) f:write (data) end
end

Create signature File

fA = io.open (fileAname, "rb")
fsig = io.open (signaturename, "wb")
assert (rdiff.signature (fsrc (fA), fsink (fsig)))
fA:close()
fsig:close()

Create delta File

fsig = io.open (signaturename, "rb")
fB = io.open (fileBname, "rb")
fdelta = io.open (deltaname, "wb")
assert (rdiff.delta (fsrc (fsig), fsrc (fB), fsink(fdelta)))

fsig:close()
fB:close ()
fdelta:close()

Patch base File

fdelta = io.open (deltaname, "rb")
fA = io.open (fileAname, "rb")
fnew = io.open (newfilename, "wb")
assert (rdiff.patch (fsrc(fdelta), fsrc(fA), fsink(fnew)))

fnew:close()
fA:close()
fdelta:close ()

Copyright © 2006 Javier Guerra G. All rights reserved.