A multiplatform and efficient archive format with portable ANSI C library and ZZZip application

bzt 40c44ab86e Initial commit 3 years ago
docs d63194dbad Initial commit 3 years ago
include d63194dbad Initial commit 3 years ago
src 40c44ab86e Initial commit 3 years ago
tests d63194dbad Initial commit 3 years ago
LICENSE d63194dbad Initial commit 3 years ago
README.md d63194dbad Initial commit 3 years ago

README.md

ZZZip Archive Format

The name is a pun on comics, referring to sleeping bubbles. That's because files in an archive can't be used as-is, they are "sleeping", first you have to "wake them up" by extracting the archive.

  • src/lib is a dependency-free ANSI C library, an SDK with easy to use API (to deal with the format; compressors, decompressors included)
  • src/ZZZip is a multiplatform front-end application to create and extract archives (Windows, Linux, MacOS)
WARNING: the library is finished, fully functional, but the application is still work in progress!

Why do we need yet another archive format?

Because existing ones aren't good enough, or they aren't expandable for new OSes.

  • cpio: clearly designed with only POSIX in mind, requires fields which simply doesn't exists on many OS (i-node, uid and gid for example). It also does not store file type (only mode, which is OS dependent, and since cpio does not store which OS it was created on, this means despite what they say cpio isn't really cross-platform). It is also not possible to store OS-specific attributes with the entries (beyond the standard POSIX attributes, for example it can't store xattr, ACLs, etc).

  • tar: similar issues as with cpio, clearly designed with only POSIX in mind. Plus point for having an OS independent file type field, but still requires many fields which doesn't exists on many OS, and lacking a way to store specific fields which do exists. The file name limit of 100 bytes (without PAX extensions) is very very limiting these days. Also the spec fails to define many things, like standardized way of storing ACLs for example. It is also not possible to store OS-specific attributes with the entries (without PAX extensions, which are non-portable, GNU tar isn't same as Solaris tar for example).

  • rar: proprietary, incompatible versions. Out of question. I don't even know if it's capable of storing devices, xattr or ACLs for example.

  • zip: don't let me start on this one. Plus point that it has flexible OS-specific fields, but other than that its one and only advantage is that it's supported on many many platforms. No standardized support for symlinks, device files, xattr etc.

  • 7-zip: looks like a good idea at first, but if you go deeper, you'll realize it's a terrible choice. Badly written SDK, difficult to port and integrate, file format is overcomlicated and full of bad design choices (many of the xz criticism stands for 7-zip too). Not being transmission error proof is one of its biggest sins. It was designed with only Windows in mind, forget storing POSIX or OS-specific attributes.

Features of ZZZip

  • ZZZip app can extract all the other common formats too: zip, zip64, tar / cpio (with gz, bz, xz, zstd), rar (up to ver 5.0), 7z, arj (*)
  • easy to use, easy to integrate SDK with a dependency-free ANSI C library, which works with non-seekable streams too
  • unlike all the others, ZZZip format was designed with multiple platforms and interoperability in mind from the start
  • supports both full archive compression (like tar and cpio) and per entity compression (like zip, rar, 7-zip).
  • extensible by design, no compatibility issues in the future for sure
  • all file attributes that OSes share in common are mandatory, but nothing else (file size and OS-independent type, modification date)
  • all the other fields are optional, can store mime type, icon, xattr, ACLs lists etc. in a standardized way
  • all the optional fields were designed for interoperability, but there's also a way to store non-portable OS-Specific attributes
  • can use multiple filters in chained mode (like ASCII newline conversion for text files or BCJ for executables before compression)
  • can use military grade encryption filters (chainable, you can use triple AES for example)
  • simplicity one of its goals, no outdated features (seriously, who needs floppy disk splitting in the 21th century?)

(*) - note: arj isn't used any more, but there are still lots of .arj files on the internet for old DOS sources and programs.

Limitations

Taking a quick look at the limitations, you can see how superior ZZZip is to other existing formats:

  • archive size: 2^127 (libzzz only implements 2^63 for now)
  • file size: 2^127 (libzzz only implements 2^63 for now)
  • file name maximum length: 2^15 - 1 (32k, in bytes, could mean less multi-byte UNICODE characters)
  • file attributes maximum size: 2^17 - 48 (ca. 128k)
  • file time precision: second (in OS-independent format) or nanosec (with timestamps)
  • file dates: from AD 1-01-01 up to AD 65535-12-31 (technically unlimited)
  • link target size: 2^127 (libzzz only implements 2^63 for now, technically unlimited)
  • uid and gid size: 2^64
  • device major and minor size: 2^32
  • number of files in the archive: 2^64
  • number of chained encryption filters: 2^16 - 13 (libzzz only supports 8 for now)
  • number of chained compression filters: 2^4 - 1 (libzzz only supports 0, 1 or 2 for now, for performance reasons)

License

Both the libzzz library and the portable multiplatform ZZZip application is licensed under the terms of the MIT license.

Authors

  • bzip2: Julian R. Seward
  • xz: Igor Pavlov and Lasse Collin
  • zlib: Mark Adler
  • zstd: Yann Collet
  • 7z: Igor Pavlov
  • rar: DrMcCoy (dmc)
  • arj: Robert Jung and Mark Adler
  • zip: bzt (no PKWARE source used)
  • ZZZip: bzt

Cheers, bzt