SaschaMester
/
dak
spiegel van https://ftp-master.debian.org/git/dak.git


			
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710
							<!doctype debiandoc system>
<!-- -*- mode: sgml; mode: fold -*- -->
<book>
<title>DSync File List Format</title>

<author>Jason Gunthorpe <email>jgg@debian.org</email></author>
<version>$Id: filelist.sgml,v 1.4 1999/11/15 07:59:49 jgg Exp $</version>

<abstract>
</abstract>

<copyright>
Copyright &copy; Jason Gunthorpe, 1998-1999.
<p>
DSync and this document are free software; you can redistribute them and/or
modify them under the terms of the GNU General Public License as published
by the Free Software Foundation; either version 2 of the License, or (at your
option) any later version.

<p>
For more details, on Debian GNU/Linux systems, see the file
/usr/doc/copyright/GPL for the full license.
</copyright>

<toc sect>

<chapt>Introduction
<!-- Purpose		                                               {{{ -->
<!-- ===================================================================== -->
<sect>Purpose
<p>
The DSync file list is a crucial part of the DSync system, it provides the 
client with access to a list of files and file attributes for all the files
in a directory tree. Much information is compacted into the per-file structure
that may be used by the client in reconstructing the directory tree. In spirit
it is like the common ls-lR files that mirrors have, but in practice it is 
radically different, most striking is that it is stored in a compacted binary
format and may optionally contain MD5 hashes.

<p>
The file list for a directory tree may be either dynamically generated by the
server or generated only once like the ls-lR files. In fact with a static
file list it is possible to use the <em>rsync method</> to transfer only the
differences in the list which is a huge boon for sites with over 50000 files 
in their directory trees

<p>
Internally the file list is stored as a series of directory blocks in no set 
order. Each block has a relative path from the base to the directory itself
and a list of all files in that directory. Things are not stored recursively
so that the client can have fixed memory usage when managing the list.
Depending on how the generator is configured the order of the directories
may be breadth first or depth first, or perhaps completely random. The client
should make no assumptions about the ordering of anything in the file.

<p>
Since the list may be generated on the fly by the server it is necessary for 
it to be streamable. To this effect there will be no counts or sizes that
refer to anything outside of the current record. This assures that the 
generator will be able to build a file list without negligable server side 
overhead. Furthermore a focus is placed on making things as small as possible,
to this end usefull items like record length indicators are omitted. This 
does necessarily limit the ability to handle format changes.
                                                                  <!-- }}} -->

<chapt>Structure
<!-- Data Stream	                                               {{{ -->
<!-- ===================================================================== -->
<sect>Data Stream
<p>
The data stream is encoded as a series of variable length numbers, fixed 
length numbers and strings. The use of variable length number encoding
was chosen to accomidate sites with over 100000 files, mostly below 16k,
using variable length encoding will save approximately 400k of data and still
allow some items that are very large.

<p> 
Numbers are coded as a series of bytes of non-fixed length, the highest bit
of each byte is 1 if the next byte is part of this number. Bytes are ordered
backwards from the least significant to the most significant inorder to 
simplify decoding, any omitted bits can be assumed to be 0. Clients should
decode into their largest type and fatally error if a number expands to
larger than that. All numbers are positive.

<p>
Strings are coded in pascal form, with a length number preceeding a series
of 8 bit characters making up the string. The strings are coded in UTF.

<p>
The first records in the file should be a header record followed by any
include/exclude records to indicate how the list was generated. Following
that is the actual file list data.

<p>
The records all have the same form, they start with an 8 bit tag value and
then have the raw record data after. The main header has a set of flags for
all of the records types, these flags are used to designate optional portions
of the record. For instance a 
file record may not have a md5 hash or uid/gid values, those would be marked 
off in the flags. Generally every non-critical value is optional. The records 
and their tags are as follows:

<list>
<item> 0 - Header
<item> 1 - Directory Marker
<item> 2 - Directory Start
<item> 3 - Directory End
<item> 4 - Normal File
<item> 5 - Symlink
<item> 6 - Device Special
<item> 7 - Directory
<item> 8 - Include/Exclude
<item> 9 - User Map
<item> 10 - Group Map
<item> 11 - Hard Link
<item> 12 - End Marker
<item> 13 - RSync Checksums
<item> 14 - Aggregate File
<item> 15 - RSync End
</list>

<p>
The header record is placed first in the file followed by Directory records
and then by a number of file type records. The Directory Start/End are used 
to indicate which directory the file records are in. The approach is to 
create a bundle of file type records for each directory that are stored
non-recursively. The directory marker records are used with depth-first
traversal to create unseen directories with the proper permissions.
								  <!-- }}} -->
<!-- Header		                                               {{{ -->
<!-- ===================================================================== -->
<sect>Header
<p>
The header is the first record in the file and contains some information about
what will follow.
<example>
   struct Header
   {
      uint8 Tag;             // 0 for the header
      
      uint32 Signature;
      uint16 MajorVersion;
      uint16 MinorVersion;
      number Epoch;

      uint8 FlagCount;
      uint32 Flags[12];
   };
</example>
<taglist>
<tag>Signature<item>
This field should contain the hex value 0x97E78AB which designates the file
as a DSync file list. Like all numbers it should be stored in network byte
order.

<tag>MajorVersion
<tag>MinorVersion<item>
These two fields designate the revision of the format. The major version
should be increased if an incompatible change is made to the structure of
the file, otherwise the minor version should reflect any changes. The current
major/minor is 0 and 0. Compatibility issues are discussed later on.

<tag>Epoch<item>
Inorder to encode time in a single 32 bit signed integer the format uses a 
shifting epoch. Epoch is set to a time in seconds from the unix
epoch. All other times are relative to this time.
In this way we can specify any date 68 years in either direction from any 
possible time. Doing so allows us to encode time using only 32 bits. The
generator should either error or truncate if a time value exceeds this 
representation. This does impose the limitation that the difference between
the lowest stored date and highest stored date must be no more than 136 years.

<tag>FlagCount<item>
This designates the number of items in the flag array.

<tag>Flags<item>
Each possible record type has a flag value that is used to indicate what
items the generator emitted. There is no per-record flag in order to save
space. The flag array is indexed by the record ID.

</taglist>
								  <!-- }}} -->
<!-- Directory Marker	                                               {{{ -->
<!-- ===================================================================== -->
<sect>Directory Marker, Directory Start and Directory
<p>
The purpose of the directory marker record is to specify directories that
must be created before a directory start record can be processed. It is needed
to ensure the correct permissions and ownership are generated while the
contents are in transfer.

<p>
A Directory Start record serves to indicate a change of directory. All further
file type records will refer to the named directory until a Directory End
record is processed marking the final modification for this directory. It is
not possible to nest directory start directives, in fact a Directory Start
record implies a Directory End record for the previosly Started Directory

<p>
The plain directory record is a file type record that refers to a directory
file type. All of these record types describe the same thing used in different
contexts so share the same structure.

<example>
   struct DirMarker
   {
      uint8 Tag;             // 1, 2 or 7 for the header
      
      uint32 ModTime;
      uint16 Permissions;
      number User;
      number Group;
      string Path;
   };
</example>
<taglist>
<tag>Flags [from the header]<item>
Optional portions of the structure are Permissions (1&lt;&lt;0) and user/group
(1&lt;&lt;1). The bit is set to 1 if they are present.

<tag>ModTime<item>
This is the number of seconds since the file list epoch, it is the modification
date of the directory.

<tag>Permissions<item>
This is the standard unix permissions in the usual format.

<tag>User
<tag>Group<item>
These are the standard unix user/group for the directory. They are indirected
through the user/group maps described later on.

<tag>Path<item>
The path from the base of the file list to the directory this record describes.
However ordinary directory types have a single name relative to the last
Directory Start record.
</taglist>
								  <!-- }}} -->
<!-- Directory End	                                               {{{ -->
<!-- ===================================================================== -->
<sect>Directory End
<p>
The purpose of the directory end marker is to signafy that their will be no 
more file type records from this directory. Directory Start and Directory
End records must be paired. The intent of this record is to allow future 
expansion, NOT to allow recursive directory blocks. A Directory Start
record will imply a Directory End record if the previous was not terminated.

<p>
There are no data members, it is the basic 1 item record. If the data stream
terminates with an open directory block it is assumed to be truncated and
an error issued.

								  <!-- }}} -->
<!-- Normal File	                                               {{{ -->
<!-- ===================================================================== -->
<sect>Normal File
<p>
A normal file is a simple, regular file. It has the standard set of unix 
attributes and an optional MD5 hash for integrity checking.
<example>
   struct NormalFile
   {
      uint8 Tag;             // 4
      
      uint32 ModTime;
      uint16 Permissions;
      number User;
      number Group;
      string Name;
      number Size;
      uint128 MD5;
   };
</example>
<taglist>
<tag>Flags [from the header]<item>
Optional portions of the structure are Permissions (1&lt;&lt;0), user/group
(1&lt;&lt;1), and MD5 (1&lt;&lt;2). The bit is set to 1 if they are present.

<tag>ModTime<item>
This is the number of seconds since the file list epoch, it is the modification
date of the file.

<tag>Permissions<item>
This is the standard unix permissions in the usual format.

<tag>User
<tag>Group<item>
These are the standard unix user/group for the directory. They are indirected
through the user/group maps described later on. 

<tag>Name<item>
The name of the item. It should have no pathname components and is relative
to the last Directory Start record.

<tag>MD5<item>
This is a MD5 hash of the file.

<tag>Size<item>
This is the size of the file in bytes.
</taglist>
								  <!-- }}} -->
<!-- Symlink		                                               {{{ -->
<!-- ===================================================================== -->
<sect>Symlink
<p>
This encodes a normal unix symbolic link. Symlinks do not have permissions
or size, but do have optional ownership.
<example>
   struct Symlink
   {
      uint8 Tag;             // 5

      uint32 ModTime;
      number User;
      number Group;
      string Name;
      uint8 Compression;
      string To;
   };
</example>
<taglist>
<tag>Flags [from the header]<item>
Optional portions of the structure are, user/group
(1&lt;&lt;0). The bit is set to 1 if they are present.

<tag>ModTime<item>
This is the number of seconds since the file list epoch, it is the modification
date of the file.

<tag>User
<tag>Group<item>
These are the standard unix user/group for the directory. They are indirected
through the user/group maps described later on. 

<tag>Name<item>
The name of the item. It should have no pathname components and is relative
to the last Directory Start record.

<tag>Compression<item>
Common use of symlinks makes them very easy to compress, the compression
byte allows this. It is an 8 bit byte with the first 7 bits representing an 
unsigned number and the 8th bit as being a flag. The first 7 bits describe
how many bytes of the last symlink should be prepended to To and if the 8th 
bit is set then Name is appended to To.

<tag>To<item>
This is the file the symlink is pointing to. It is an absolute string taken
as is. The client may perform checking on it if desired. The string is 
compressed as described in the Compression field.
</taglist>
                                                                  <!-- }}} -->
<!-- Device Special	                                               {{{ -->
<!-- ===================================================================== -->
<sect>Device Special
<p>
Device Special records encode unix device special files, which have a major
and a minor number corrisponding to some OS specific attribute. These also
encode fifo files, anything that can be created by mknod.
<example>
   struct DeviceSpecial
   {
      uint8 Tag;             // 6
      
      uint32 ModTime;
      uint16 Permissions;
      number User;
      number Group;
      number Dev;
      string Name;
   };
</example>
<taglist>
<tag>Flags [from the header]<item>
Optional portions of the structure areuser/group
(1&lt;&lt;0). The bit is set to 1 if they are present.

<tag>ModTime<item>
This is the number of seconds since the file list epoch, it is the modification
date of the file.

<tag>Permissions<item>
This non-optional field is used to encode the type of device and the 
creation permissions.

<tag>Dev<item>
This is the OS specific 'dev_t' field for mknod.

<tag>Major
<tag>Minor<item>
These are the OS dependent device numbers.

<tag>Name<item>
The name of the item. It should have no pathname components and is relative
to the last Directory Start record.

<tag>To<item>
This is the file the symlink is pointing to.
</taglist>
                                                                  <!-- }}} -->
<!-- Include/Exclude	                                               {{{ -->
<!-- ===================================================================== -->
<sect>Include and Exclude
<p>
The include/exclude list used to generate the file list is encoded after
the header record. It is stored as an ordered set of include/exclude records
acting as a filter. If no record matches then the pathname is assumed to 
be included otherwise the first matching record decides.

<example>
   struct IncludeExclude
   {
      uint8 Tag;             // 8
      
      uint8 Type;
      string Pattern;
   };
</example>
<taglist>
<tag>Flags [from the header]<item>
None defined.

<tag>Type<item>
This is the sort of rule, presently 1 is an include rule and 2 is an exclude
rule.

<tag>Pattern<item>
This is the textual pattern used for matching.

</taglist>
                                                                  <!-- }}} -->
<!-- User/Group Map	                                               {{{ -->
<!-- ===================================================================== -->
<sect>User/Group Map
<p>
In order to properly transfer users and groups the names are converted from
a local number into a file list number and a number to name mapping. When
the remote side reads the file list it directs all UID/GID translations
through the mapping to create the real names and then does a local lookup.
This also provides some compressesion in the file list as large UIDs are
converted into smaller values through the mapping.

<p>
The generator is expected to emit these records at any place before the IDs
are actually used.
<example>
   struct NameMap
   {
      uint8 Tag;             // 9,10
      
      number FileID;
      number RealID;
      string Name;
   };
</example>
<taglist>
<tag>Flags [from the header]<item>
Optional portions of the structure are RealID (1&lt;&lt;0).

<tag>FileID<item>
This is the ID used internally in the file list, it should be monotonically
increasing each time a Map record is created so that it is small and unique.

<tag>RealID<item>
This is the ID used in the filesystem on the generating end. This information
maybe used if the user selected to regenerate IDs without translation.
</taglist>
                                                                  <!-- }}} -->
<!-- Hard Link		                                               {{{ -->
<!-- ===================================================================== -->
<sect>Hard Link
<p>
A hard link record is used to record a file that is participating in a hard
link. The only information we know about the link is the inode and device 
on the local machine, so we store this information. The client will have to
reconstruct the linkages if possible.

<example>
   struct HardLink
   {
      uint8 Tag;             // 11
      
      uint32 ModTime;
      number Serial;
      uint16 Permissions;
      number User;
      number Group;
      string Name;
      number Size;
      uint128 MD5;
   };
</example>
<taglist>
<tag>Flags [from the header]<item>
Optional portions of the structure are Permissions (1&lt;&lt;0), user/group
(1&lt;&lt;1), and MD5 (1&lt;&lt;2). The bit is set to 1 if they are present.

<tag>ModTime<item>
This is the number of seconds since the file list epoch, it is the modification
date of the file.

<tag>Serial<item>
This is the unique ID number for the hardlink. It is composed from the
device inode pair in a generator dependent way. The exact nature of the
value is unimportant, only that two hard link records with the same serial
should be linked together. It is recommended that the generator compress
hard link serial numbers into small monotonically increasing IDs.

<tag>Permissions<item>
This is the standard unix permissions in the usual format.

<tag>User
<tag>Group<item>
These are the standard unix user/group for the directory. They are indirected
through the user/group maps described later on. 

<tag>Name<item>
The name of the item. It should have no pathname components and is relative
to the last Directory Start record.

<tag>MD5<item>
This is a MD5 hash of the file.

<tag>Size<item>
This is the size of the file in bytes.
</taglist>
								  <!-- }}} -->
<!-- End Marker		                                               {{{ -->
<!-- ===================================================================== -->
<sect>End Marker
<p>
The End Marker is the final record in the stream, if it is missing the stream
is assumed to be incomplete.
<example>
   struct Trailer
   {
      uint8 Tag;             // 12 for the header
      
      uint32 Signature;
   };
</example>
<taglist>
<tag>Signature<item>
This field should contain the hex value 0xBA87E79 which is designed to 
prevent a correputed stream as begin a legitimate end marker.
</taglist>
								  <!-- }}} -->

<!-- RSync Checksums	                                               {{{ -->
<!-- ===================================================================== -->
<sect>RSync Checksums
<p>
The checksum record contains the list of checksums for a file and represents
the start of a RSync description block which may contain RSync Checksums,
a Normal File entry or Aggregate Files records.
<example>
   struct RSyncChecksums
   {
      uint8 Tag;             // 13
      
      number BlockSize;
      number FileSize;
      uint160 Sums[ceil(FileSize/BlockSize)];
   };
</example>
<taglist>
<tag>BlockSize<item>
The size of each block in the stream in bytes.

<tag>FileSize<item>
The total size of the the file in bytes.

<tag>Sums<item>
The actual checksum data. The format has the lower 32 bytes as the weak 
checksum and the upper 128 as the strong checksum.
</taglist>
								  <!-- }}} -->
<!-- Aggregate File	                                               {{{ -->
<!-- ===================================================================== -->
<sect>Aggregate File
<p>
If the generator was given a list of included files this record will be 
emitted after the rsync checksum record, once for each file. The given 
paths are files that are likely to contain fragments of the larger file.
<example>
   struct AggregateFile
   {
      uint8 Tag;             // 14 for this record
      
      string File;
   };
</example>
<taglist>
<tag>File<item>
The stored filename.
</taglist>
								  <!-- }}} -->
<!-- RSync End	  		                                       {{{ -->
<!-- ===================================================================== -->
<sect>RSync End
<p>
The purpose of the directory end marker is to signafy that the RSync data
is finished. RSync blocks begin with the RSync checksum record, then are
typically followed by a Normal File record describing the name and attributes
of the file and then optionally followed by a set of Aggregate File records.

<p>
There are no data members, it is the basic 1 item record. If the data stream
terminates with an open block it is assumed to be truncated and an error 
issued.
								  <!-- }}} -->

<chapt>The Client
<!-- Handling Compatibility                                            {{{ -->
<!-- ===================================================================== -->
<sect>Handling Compatibility
<p>
The format has no provision for making backwards compatible changes, even
minor ones. What was provided is a way to make a generator that is both
forwards and backwards compatible with clients, this is done by disabling
generation of unsupported items and masking them off in the flags.

<p>
To deal with this a client should examine the header and determine if it has
a suitable major version, the minor version should largely be ignored. The
client should then examine the flags values and for all records it understands
ensure that no bits are masked on that it does not understand. Records that
it cannot handle should be ignored at this point. When the client is
parsing it should abort if it hits a record it does not support.
								  <!-- }}} -->
<!-- Client Requirements  	                                       {{{ -->
<!-- ===================================================================== -->
<sect>Client Requirements
<p>
The client attempting to verify syncronisity of a local file tree and a
tree destribed in a file list must do three things, look for extra local files,
manage the UID/GID mappings and maintain a map of hardlinks. These items 
corrispond to the only necessary memory usage on the client.

<p>
It is expected that the client will use the timestamp, size and possibly
MD5 hash to match the local file against the remote one to decide if it 
should be retrieved.

<p>
Hardlinks are difficult to handle, but represent a very usefull feature. The
client should track all hard links until they are associated with a local
file+inode, then all future links to that remote inode can be recreated 
locally.
								  <!-- }}} -->

<chapt>RSync Method
<!-- Overview                                                          {{{ -->
<!-- ===================================================================== -->
<sect>Overview
<p>
The <em>rsync method</> was invented by Andrew Tridgell and originally 
implemented in the rsync program. DSync has a provision to make use of the 
<em>rsync method</> for transfering differences between files effeciently,
however the implemention is not as bandwidth efficient as what the rsync 
program uses, emphasis is placed on generator efficiency. 

<p>
Primarily the <em>rsync method</> makes use of a series of weak and strong
block checksums for each block in a file. Blocks are a uniform size and
are uniformly distributed about the source file. In order to minimize server
loading the checksum data is generated for the file on the server and then 
sent to the client - this might optionally be done from a cached file. The
client is responsible for performing the checksumming and searching on its
end.

<p>
In contrast rsync has the client send its checksums to the server and the
server sends back commands to reconstruct the file. This is more bandwidth
efficient because only one round trip is required and there is a higher chance
that more blocks will be matched and not need to be sent to the client.

<p>
Furthermore a feature designed for use by CD images is provided where a file
can be specified as the aggregation of many smaller files. The aggregated
files are specified only by giving the file name. The client is expected to 
read the file (probably from the network) and perform checksum searching 
against the provided table.

								  <!-- }}} -->
<!-- CD Images                                                         {{{ -->
<!-- ===================================================================== -->
<sect>CD Images
<p>
The primary and most complex use of the rsync data is for forming CD images
on the fly from a mirror and a CD source. This is extremly usefull beacause 
CD images take up alot of space and bandwidth to mirror, while they are 
mearly aggregates of (possibly) already mirrored data. Using checksums
and a file listing allows the CD image to be reconstructed from any mirror
and reduces the loading on primary CD image servers.

<p>
The next use of checksums is to 'freshen' a CD image during development. If
a image is already present that contains a subset of the required data the 
checksums generally allow a large percentage of that data to be reused.

<p>
Since the client is responsible for reconstruction and checksum searching it
is possible to perform in place reconstruction and in place initial generation
that does not require a (large!) temporary file.
								  <!-- }}} -->


</book>