123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710 |
- <!doctype debiandoc system>
- <!-- -*- mode: sgml; mode: fold -*- -->
- <book>
- <title>DSync File List Format</title>
- <author>Jason Gunthorpe <email>jgg@debian.org</email></author>
- <version>$Id: filelist.sgml,v 1.4 1999/11/15 07:59:49 jgg Exp $</version>
- <abstract>
- </abstract>
- <copyright>
- Copyright © Jason Gunthorpe, 1998-1999.
- <p>
- DSync and this document are free software; you can redistribute them and/or
- modify them under the terms of the GNU General Public License as published
- by the Free Software Foundation; either version 2 of the License, or (at your
- option) any later version.
- <p>
- For more details, on Debian GNU/Linux systems, see the file
- /usr/doc/copyright/GPL for the full license.
- </copyright>
- <toc sect>
- <chapt>Introduction
- <!-- Purpose {{{ -->
- <!-- ===================================================================== -->
- <sect>Purpose
- <p>
- The DSync file list is a crucial part of the DSync system, it provides the
- client with access to a list of files and file attributes for all the files
- in a directory tree. Much information is compacted into the per-file structure
- that may be used by the client in reconstructing the directory tree. In spirit
- it is like the common ls-lR files that mirrors have, but in practice it is
- radically different, most striking is that it is stored in a compacted binary
- format and may optionally contain MD5 hashes.
- <p>
- The file list for a directory tree may be either dynamically generated by the
- server or generated only once like the ls-lR files. In fact with a static
- file list it is possible to use the <em>rsync method</> to transfer only the
- differences in the list which is a huge boon for sites with over 50000 files
- in their directory trees
- <p>
- Internally the file list is stored as a series of directory blocks in no set
- order. Each block has a relative path from the base to the directory itself
- and a list of all files in that directory. Things are not stored recursively
- so that the client can have fixed memory usage when managing the list.
- Depending on how the generator is configured the order of the directories
- may be breadth first or depth first, or perhaps completely random. The client
- should make no assumptions about the ordering of anything in the file.
- <p>
- Since the list may be generated on the fly by the server it is necessary for
- it to be streamable. To this effect there will be no counts or sizes that
- refer to anything outside of the current record. This assures that the
- generator will be able to build a file list without negligable server side
- overhead. Furthermore a focus is placed on making things as small as possible,
- to this end usefull items like record length indicators are omitted. This
- does necessarily limit the ability to handle format changes.
- <!-- }}} -->
- <chapt>Structure
- <!-- Data Stream {{{ -->
- <!-- ===================================================================== -->
- <sect>Data Stream
- <p>
- The data stream is encoded as a series of variable length numbers, fixed
- length numbers and strings. The use of variable length number encoding
- was chosen to accomidate sites with over 100000 files, mostly below 16k,
- using variable length encoding will save approximately 400k of data and still
- allow some items that are very large.
- <p>
- Numbers are coded as a series of bytes of non-fixed length, the highest bit
- of each byte is 1 if the next byte is part of this number. Bytes are ordered
- backwards from the least significant to the most significant inorder to
- simplify decoding, any omitted bits can be assumed to be 0. Clients should
- decode into their largest type and fatally error if a number expands to
- larger than that. All numbers are positive.
- <p>
- Strings are coded in pascal form, with a length number preceeding a series
- of 8 bit characters making up the string. The strings are coded in UTF.
- <p>
- The first records in the file should be a header record followed by any
- include/exclude records to indicate how the list was generated. Following
- that is the actual file list data.
- <p>
- The records all have the same form, they start with an 8 bit tag value and
- then have the raw record data after. The main header has a set of flags for
- all of the records types, these flags are used to designate optional portions
- of the record. For instance a
- file record may not have a md5 hash or uid/gid values, those would be marked
- off in the flags. Generally every non-critical value is optional. The records
- and their tags are as follows:
- <list>
- <item> 0 - Header
- <item> 1 - Directory Marker
- <item> 2 - Directory Start
- <item> 3 - Directory End
- <item> 4 - Normal File
- <item> 5 - Symlink
- <item> 6 - Device Special
- <item> 7 - Directory
- <item> 8 - Include/Exclude
- <item> 9 - User Map
- <item> 10 - Group Map
- <item> 11 - Hard Link
- <item> 12 - End Marker
- <item> 13 - RSync Checksums
- <item> 14 - Aggregate File
- <item> 15 - RSync End
- </list>
- <p>
- The header record is placed first in the file followed by Directory records
- and then by a number of file type records. The Directory Start/End are used
- to indicate which directory the file records are in. The approach is to
- create a bundle of file type records for each directory that are stored
- non-recursively. The directory marker records are used with depth-first
- traversal to create unseen directories with the proper permissions.
- <!-- }}} -->
- <!-- Header {{{ -->
- <!-- ===================================================================== -->
- <sect>Header
- <p>
- The header is the first record in the file and contains some information about
- what will follow.
- <example>
- struct Header
- {
- uint8 Tag; // 0 for the header
-
- uint32 Signature;
- uint16 MajorVersion;
- uint16 MinorVersion;
- number Epoch;
- uint8 FlagCount;
- uint32 Flags[12];
- };
- </example>
- <taglist>
- <tag>Signature<item>
- This field should contain the hex value 0x97E78AB which designates the file
- as a DSync file list. Like all numbers it should be stored in network byte
- order.
- <tag>MajorVersion
- <tag>MinorVersion<item>
- These two fields designate the revision of the format. The major version
- should be increased if an incompatible change is made to the structure of
- the file, otherwise the minor version should reflect any changes. The current
- major/minor is 0 and 0. Compatibility issues are discussed later on.
- <tag>Epoch<item>
- Inorder to encode time in a single 32 bit signed integer the format uses a
- shifting epoch. Epoch is set to a time in seconds from the unix
- epoch. All other times are relative to this time.
- In this way we can specify any date 68 years in either direction from any
- possible time. Doing so allows us to encode time using only 32 bits. The
- generator should either error or truncate if a time value exceeds this
- representation. This does impose the limitation that the difference between
- the lowest stored date and highest stored date must be no more than 136 years.
- <tag>FlagCount<item>
- This designates the number of items in the flag array.
- <tag>Flags<item>
- Each possible record type has a flag value that is used to indicate what
- items the generator emitted. There is no per-record flag in order to save
- space. The flag array is indexed by the record ID.
- </taglist>
- <!-- }}} -->
- <!-- Directory Marker {{{ -->
- <!-- ===================================================================== -->
- <sect>Directory Marker, Directory Start and Directory
- <p>
- The purpose of the directory marker record is to specify directories that
- must be created before a directory start record can be processed. It is needed
- to ensure the correct permissions and ownership are generated while the
- contents are in transfer.
- <p>
- A Directory Start record serves to indicate a change of directory. All further
- file type records will refer to the named directory until a Directory End
- record is processed marking the final modification for this directory. It is
- not possible to nest directory start directives, in fact a Directory Start
- record implies a Directory End record for the previosly Started Directory
- <p>
- The plain directory record is a file type record that refers to a directory
- file type. All of these record types describe the same thing used in different
- contexts so share the same structure.
- <example>
- struct DirMarker
- {
- uint8 Tag; // 1, 2 or 7 for the header
-
- uint32 ModTime;
- uint16 Permissions;
- number User;
- number Group;
- string Path;
- };
- </example>
- <taglist>
- <tag>Flags [from the header]<item>
- Optional portions of the structure are Permissions (1<<0) and user/group
- (1<<1). The bit is set to 1 if they are present.
- <tag>ModTime<item>
- This is the number of seconds since the file list epoch, it is the modification
- date of the directory.
- <tag>Permissions<item>
- This is the standard unix permissions in the usual format.
- <tag>User
- <tag>Group<item>
- These are the standard unix user/group for the directory. They are indirected
- through the user/group maps described later on.
- <tag>Path<item>
- The path from the base of the file list to the directory this record describes.
- However ordinary directory types have a single name relative to the last
- Directory Start record.
- </taglist>
- <!-- }}} -->
- <!-- Directory End {{{ -->
- <!-- ===================================================================== -->
- <sect>Directory End
- <p>
- The purpose of the directory end marker is to signafy that their will be no
- more file type records from this directory. Directory Start and Directory
- End records must be paired. The intent of this record is to allow future
- expansion, NOT to allow recursive directory blocks. A Directory Start
- record will imply a Directory End record if the previous was not terminated.
- <p>
- There are no data members, it is the basic 1 item record. If the data stream
- terminates with an open directory block it is assumed to be truncated and
- an error issued.
- <!-- }}} -->
- <!-- Normal File {{{ -->
- <!-- ===================================================================== -->
- <sect>Normal File
- <p>
- A normal file is a simple, regular file. It has the standard set of unix
- attributes and an optional MD5 hash for integrity checking.
- <example>
- struct NormalFile
- {
- uint8 Tag; // 4
-
- uint32 ModTime;
- uint16 Permissions;
- number User;
- number Group;
- string Name;
- number Size;
- uint128 MD5;
- };
- </example>
- <taglist>
- <tag>Flags [from the header]<item>
- Optional portions of the structure are Permissions (1<<0), user/group
- (1<<1), and MD5 (1<<2). The bit is set to 1 if they are present.
- <tag>ModTime<item>
- This is the number of seconds since the file list epoch, it is the modification
- date of the file.
- <tag>Permissions<item>
- This is the standard unix permissions in the usual format.
- <tag>User
- <tag>Group<item>
- These are the standard unix user/group for the directory. They are indirected
- through the user/group maps described later on.
- <tag>Name<item>
- The name of the item. It should have no pathname components and is relative
- to the last Directory Start record.
- <tag>MD5<item>
- This is a MD5 hash of the file.
- <tag>Size<item>
- This is the size of the file in bytes.
- </taglist>
- <!-- }}} -->
- <!-- Symlink {{{ -->
- <!-- ===================================================================== -->
- <sect>Symlink
- <p>
- This encodes a normal unix symbolic link. Symlinks do not have permissions
- or size, but do have optional ownership.
- <example>
- struct Symlink
- {
- uint8 Tag; // 5
- uint32 ModTime;
- number User;
- number Group;
- string Name;
- uint8 Compression;
- string To;
- };
- </example>
- <taglist>
- <tag>Flags [from the header]<item>
- Optional portions of the structure are, user/group
- (1<<0). The bit is set to 1 if they are present.
- <tag>ModTime<item>
- This is the number of seconds since the file list epoch, it is the modification
- date of the file.
- <tag>User
- <tag>Group<item>
- These are the standard unix user/group for the directory. They are indirected
- through the user/group maps described later on.
- <tag>Name<item>
- The name of the item. It should have no pathname components and is relative
- to the last Directory Start record.
- <tag>Compression<item>
- Common use of symlinks makes them very easy to compress, the compression
- byte allows this. It is an 8 bit byte with the first 7 bits representing an
- unsigned number and the 8th bit as being a flag. The first 7 bits describe
- how many bytes of the last symlink should be prepended to To and if the 8th
- bit is set then Name is appended to To.
- <tag>To<item>
- This is the file the symlink is pointing to. It is an absolute string taken
- as is. The client may perform checking on it if desired. The string is
- compressed as described in the Compression field.
- </taglist>
- <!-- }}} -->
- <!-- Device Special {{{ -->
- <!-- ===================================================================== -->
- <sect>Device Special
- <p>
- Device Special records encode unix device special files, which have a major
- and a minor number corrisponding to some OS specific attribute. These also
- encode fifo files, anything that can be created by mknod.
- <example>
- struct DeviceSpecial
- {
- uint8 Tag; // 6
-
- uint32 ModTime;
- uint16 Permissions;
- number User;
- number Group;
- number Dev;
- string Name;
- };
- </example>
- <taglist>
- <tag>Flags [from the header]<item>
- Optional portions of the structure areuser/group
- (1<<0). The bit is set to 1 if they are present.
- <tag>ModTime<item>
- This is the number of seconds since the file list epoch, it is the modification
- date of the file.
- <tag>Permissions<item>
- This non-optional field is used to encode the type of device and the
- creation permissions.
- <tag>Dev<item>
- This is the OS specific 'dev_t' field for mknod.
- <tag>Major
- <tag>Minor<item>
- These are the OS dependent device numbers.
- <tag>Name<item>
- The name of the item. It should have no pathname components and is relative
- to the last Directory Start record.
- <tag>To<item>
- This is the file the symlink is pointing to.
- </taglist>
- <!-- }}} -->
- <!-- Include/Exclude {{{ -->
- <!-- ===================================================================== -->
- <sect>Include and Exclude
- <p>
- The include/exclude list used to generate the file list is encoded after
- the header record. It is stored as an ordered set of include/exclude records
- acting as a filter. If no record matches then the pathname is assumed to
- be included otherwise the first matching record decides.
- <example>
- struct IncludeExclude
- {
- uint8 Tag; // 8
-
- uint8 Type;
- string Pattern;
- };
- </example>
- <taglist>
- <tag>Flags [from the header]<item>
- None defined.
- <tag>Type<item>
- This is the sort of rule, presently 1 is an include rule and 2 is an exclude
- rule.
- <tag>Pattern<item>
- This is the textual pattern used for matching.
- </taglist>
- <!-- }}} -->
- <!-- User/Group Map {{{ -->
- <!-- ===================================================================== -->
- <sect>User/Group Map
- <p>
- In order to properly transfer users and groups the names are converted from
- a local number into a file list number and a number to name mapping. When
- the remote side reads the file list it directs all UID/GID translations
- through the mapping to create the real names and then does a local lookup.
- This also provides some compressesion in the file list as large UIDs are
- converted into smaller values through the mapping.
- <p>
- The generator is expected to emit these records at any place before the IDs
- are actually used.
- <example>
- struct NameMap
- {
- uint8 Tag; // 9,10
-
- number FileID;
- number RealID;
- string Name;
- };
- </example>
- <taglist>
- <tag>Flags [from the header]<item>
- Optional portions of the structure are RealID (1<<0).
- <tag>FileID<item>
- This is the ID used internally in the file list, it should be monotonically
- increasing each time a Map record is created so that it is small and unique.
- <tag>RealID<item>
- This is the ID used in the filesystem on the generating end. This information
- maybe used if the user selected to regenerate IDs without translation.
- </taglist>
- <!-- }}} -->
- <!-- Hard Link {{{ -->
- <!-- ===================================================================== -->
- <sect>Hard Link
- <p>
- A hard link record is used to record a file that is participating in a hard
- link. The only information we know about the link is the inode and device
- on the local machine, so we store this information. The client will have to
- reconstruct the linkages if possible.
- <example>
- struct HardLink
- {
- uint8 Tag; // 11
-
- uint32 ModTime;
- number Serial;
- uint16 Permissions;
- number User;
- number Group;
- string Name;
- number Size;
- uint128 MD5;
- };
- </example>
- <taglist>
- <tag>Flags [from the header]<item>
- Optional portions of the structure are Permissions (1<<0), user/group
- (1<<1), and MD5 (1<<2). The bit is set to 1 if they are present.
- <tag>ModTime<item>
- This is the number of seconds since the file list epoch, it is the modification
- date of the file.
- <tag>Serial<item>
- This is the unique ID number for the hardlink. It is composed from the
- device inode pair in a generator dependent way. The exact nature of the
- value is unimportant, only that two hard link records with the same serial
- should be linked together. It is recommended that the generator compress
- hard link serial numbers into small monotonically increasing IDs.
- <tag>Permissions<item>
- This is the standard unix permissions in the usual format.
- <tag>User
- <tag>Group<item>
- These are the standard unix user/group for the directory. They are indirected
- through the user/group maps described later on.
- <tag>Name<item>
- The name of the item. It should have no pathname components and is relative
- to the last Directory Start record.
- <tag>MD5<item>
- This is a MD5 hash of the file.
- <tag>Size<item>
- This is the size of the file in bytes.
- </taglist>
- <!-- }}} -->
- <!-- End Marker {{{ -->
- <!-- ===================================================================== -->
- <sect>End Marker
- <p>
- The End Marker is the final record in the stream, if it is missing the stream
- is assumed to be incomplete.
- <example>
- struct Trailer
- {
- uint8 Tag; // 12 for the header
-
- uint32 Signature;
- };
- </example>
- <taglist>
- <tag>Signature<item>
- This field should contain the hex value 0xBA87E79 which is designed to
- prevent a correputed stream as begin a legitimate end marker.
- </taglist>
- <!-- }}} -->
- <!-- RSync Checksums {{{ -->
- <!-- ===================================================================== -->
- <sect>RSync Checksums
- <p>
- The checksum record contains the list of checksums for a file and represents
- the start of a RSync description block which may contain RSync Checksums,
- a Normal File entry or Aggregate Files records.
- <example>
- struct RSyncChecksums
- {
- uint8 Tag; // 13
-
- number BlockSize;
- number FileSize;
- uint160 Sums[ceil(FileSize/BlockSize)];
- };
- </example>
- <taglist>
- <tag>BlockSize<item>
- The size of each block in the stream in bytes.
- <tag>FileSize<item>
- The total size of the the file in bytes.
- <tag>Sums<item>
- The actual checksum data. The format has the lower 32 bytes as the weak
- checksum and the upper 128 as the strong checksum.
- </taglist>
- <!-- }}} -->
- <!-- Aggregate File {{{ -->
- <!-- ===================================================================== -->
- <sect>Aggregate File
- <p>
- If the generator was given a list of included files this record will be
- emitted after the rsync checksum record, once for each file. The given
- paths are files that are likely to contain fragments of the larger file.
- <example>
- struct AggregateFile
- {
- uint8 Tag; // 14 for this record
-
- string File;
- };
- </example>
- <taglist>
- <tag>File<item>
- The stored filename.
- </taglist>
- <!-- }}} -->
- <!-- RSync End {{{ -->
- <!-- ===================================================================== -->
- <sect>RSync End
- <p>
- The purpose of the directory end marker is to signafy that the RSync data
- is finished. RSync blocks begin with the RSync checksum record, then are
- typically followed by a Normal File record describing the name and attributes
- of the file and then optionally followed by a set of Aggregate File records.
- <p>
- There are no data members, it is the basic 1 item record. If the data stream
- terminates with an open block it is assumed to be truncated and an error
- issued.
- <!-- }}} -->
- <chapt>The Client
- <!-- Handling Compatibility {{{ -->
- <!-- ===================================================================== -->
- <sect>Handling Compatibility
- <p>
- The format has no provision for making backwards compatible changes, even
- minor ones. What was provided is a way to make a generator that is both
- forwards and backwards compatible with clients, this is done by disabling
- generation of unsupported items and masking them off in the flags.
- <p>
- To deal with this a client should examine the header and determine if it has
- a suitable major version, the minor version should largely be ignored. The
- client should then examine the flags values and for all records it understands
- ensure that no bits are masked on that it does not understand. Records that
- it cannot handle should be ignored at this point. When the client is
- parsing it should abort if it hits a record it does not support.
- <!-- }}} -->
- <!-- Client Requirements {{{ -->
- <!-- ===================================================================== -->
- <sect>Client Requirements
- <p>
- The client attempting to verify syncronisity of a local file tree and a
- tree destribed in a file list must do three things, look for extra local files,
- manage the UID/GID mappings and maintain a map of hardlinks. These items
- corrispond to the only necessary memory usage on the client.
- <p>
- It is expected that the client will use the timestamp, size and possibly
- MD5 hash to match the local file against the remote one to decide if it
- should be retrieved.
- <p>
- Hardlinks are difficult to handle, but represent a very usefull feature. The
- client should track all hard links until they are associated with a local
- file+inode, then all future links to that remote inode can be recreated
- locally.
- <!-- }}} -->
- <chapt>RSync Method
- <!-- Overview {{{ -->
- <!-- ===================================================================== -->
- <sect>Overview
- <p>
- The <em>rsync method</> was invented by Andrew Tridgell and originally
- implemented in the rsync program. DSync has a provision to make use of the
- <em>rsync method</> for transfering differences between files effeciently,
- however the implemention is not as bandwidth efficient as what the rsync
- program uses, emphasis is placed on generator efficiency.
- <p>
- Primarily the <em>rsync method</> makes use of a series of weak and strong
- block checksums for each block in a file. Blocks are a uniform size and
- are uniformly distributed about the source file. In order to minimize server
- loading the checksum data is generated for the file on the server and then
- sent to the client - this might optionally be done from a cached file. The
- client is responsible for performing the checksumming and searching on its
- end.
- <p>
- In contrast rsync has the client send its checksums to the server and the
- server sends back commands to reconstruct the file. This is more bandwidth
- efficient because only one round trip is required and there is a higher chance
- that more blocks will be matched and not need to be sent to the client.
- <p>
- Furthermore a feature designed for use by CD images is provided where a file
- can be specified as the aggregation of many smaller files. The aggregated
- files are specified only by giving the file name. The client is expected to
- read the file (probably from the network) and perform checksum searching
- against the provided table.
- <!-- }}} -->
- <!-- CD Images {{{ -->
- <!-- ===================================================================== -->
- <sect>CD Images
- <p>
- The primary and most complex use of the rsync data is for forming CD images
- on the fly from a mirror and a CD source. This is extremly usefull beacause
- CD images take up alot of space and bandwidth to mirror, while they are
- mearly aggregates of (possibly) already mirrored data. Using checksums
- and a file listing allows the CD image to be reconstructed from any mirror
- and reduces the loading on primary CD image servers.
- <p>
- The next use of checksums is to 'freshen' a CD image during development. If
- a image is already present that contains a subset of the required data the
- checksums generally allow a large percentage of that data to be reused.
- <p>
- Since the client is responsible for reconstruction and checksum searching it
- is possible to perform in place reconstruction and in place initial generation
- that does not require a (large!) temporary file.
- <!-- }}} -->
- </book>
|