|
- LZMA SDK 4.65
- -------------
- LZMA SDK provides the documentation, samples, header files, libraries,
- and tools you need to develop applications that use LZMA compression.
- LZMA is default and general compression method of 7z format
- in 7-Zip compression program (www.7-zip.org). LZMA provides high
- compression ratio and very fast decompression.
- LZMA is an improved version of famous LZ77 compression algorithm.
- It was improved in way of maximum increasing of compression ratio,
- keeping high decompression speed and low memory requirements for
- decompressing.
- LICENSE
- -------
- LZMA SDK is written and placed in the public domain by Igor Pavlov.
- LZMA SDK Contents
- -----------------
- LZMA SDK includes:
- - ANSI-C/C++/C#/Java source code for LZMA compressing and decompressing
- - Compiled file->file LZMA compressing/decompressing program for Windows system
- UNIX/Linux version
- ------------------
- To compile C++ version of file->file LZMA encoding, go to directory
- C++/7zip/Compress/LZMA_Alone
- and call make to recompile it:
- make -f makefile.gcc clean all
- In some UNIX/Linux versions you must compile LZMA with static libraries.
- To compile with static libraries, you can use
- LIB = -lm -static
- Files
- ---------------------
- lzma.txt - LZMA SDK description (this file)
- 7zFormat.txt - 7z Format description
- 7zC.txt - 7z ANSI-C Decoder description
- methods.txt - Compression method IDs for .7z
- lzma.exe - Compiled file->file LZMA encoder/decoder for Windows
- history.txt - history of the LZMA SDK
- Source code structure
- ---------------------
- C/ - C files
- 7zCrc*.* - CRC code
- Alloc.* - Memory allocation functions
- Bra*.* - Filters for x86, IA-64, ARM, ARM-Thumb, PowerPC and SPARC code
- LzFind.* - Match finder for LZ (LZMA) encoders
- LzFindMt.* - Match finder for LZ (LZMA) encoders for multithreading encoding
- LzHash.h - Additional file for LZ match finder
- LzmaDec.* - LZMA decoding
- LzmaEnc.* - LZMA encoding
- LzmaLib.* - LZMA Library for DLL calling
- Types.h - Basic types for another .c files
- Threads.* - The code for multithreading.
- LzmaLib - LZMA Library (.DLL for Windows)
-
- LzmaUtil - LZMA Utility (file->file LZMA encoder/decoder).
- Archive - files related to archiving
- 7z - 7z ANSI-C Decoder
- CPP/ -- CPP files
- Common - common files for C++ projects
- Windows - common files for Windows related code
- 7zip - files related to 7-Zip Project
- Common - common files for 7-Zip
- Compress - files related to compression/decompression
- Copy - Copy coder
- RangeCoder - Range Coder (special code of compression/decompression)
- LZMA - LZMA compression/decompression on C++
- LZMA_Alone - file->file LZMA compression/decompression
- Branch - Filters for x86, IA-64, ARM, ARM-Thumb, PowerPC and SPARC code
- Archive - files related to archiving
- Common - common files for archive handling
- 7z - 7z C++ Encoder/Decoder
- Bundles - Modules that are bundles of other modules
-
- Alone7z - 7zr.exe: Standalone version of 7z.exe that supports only 7z/LZMA/BCJ/BCJ2
- Format7zR - 7zr.dll: Reduced version of 7za.dll: extracting/compressing to 7z/LZMA/BCJ/BCJ2
- Format7zExtractR - 7zxr.dll: Reduced version of 7zxa.dll: extracting from 7z/LZMA/BCJ/BCJ2.
- UI - User Interface files
-
- Client7z - Test application for 7za.dll, 7zr.dll, 7zxr.dll
- Common - Common UI files
- Console - Code for console archiver
- CS/ - C# files
- 7zip
- Common - some common files for 7-Zip
- Compress - files related to compression/decompression
- LZ - files related to LZ (Lempel-Ziv) compression algorithm
- LZMA - LZMA compression/decompression
- LzmaAlone - file->file LZMA compression/decompression
- RangeCoder - Range Coder (special code of compression/decompression)
- Java/ - Java files
- SevenZip
- Compression - files related to compression/decompression
- LZ - files related to LZ (Lempel-Ziv) compression algorithm
- LZMA - LZMA compression/decompression
- RangeCoder - Range Coder (special code of compression/decompression)
- C/C++ source code of LZMA SDK is part of 7-Zip project.
- 7-Zip source code can be downloaded from 7-Zip's SourceForge page:
- http://sourceforge.net/projects/sevenzip/
- LZMA features
- -------------
- - Variable dictionary size (up to 1 GB)
- - Estimated compressing speed: about 2 MB/s on 2 GHz CPU
- - Estimated decompressing speed:
- - 20-30 MB/s on 2 GHz Core 2 or AMD Athlon 64
- - 1-2 MB/s on 200 MHz ARM, MIPS, PowerPC or other simple RISC
- - Small memory requirements for decompressing (16 KB + DictionarySize)
- - Small code size for decompressing: 5-8 KB
- LZMA decoder uses only integer operations and can be
- implemented in any modern 32-bit CPU (or on 16-bit CPU with some conditions).
- Some critical operations that affect the speed of LZMA decompression:
- 1) 32*16 bit integer multiply
- 2) Misspredicted branches (penalty mostly depends from pipeline length)
- 3) 32-bit shift and arithmetic operations
- The speed of LZMA decompressing mostly depends from CPU speed.
- Memory speed has no big meaning. But if your CPU has small data cache,
- overall weight of memory speed will slightly increase.
- How To Use
- ----------
- Using LZMA encoder/decoder executable
- --------------------------------------
- Usage: LZMA <e|d> inputFile outputFile [<switches>...]
- e: encode file
- d: decode file
- b: Benchmark. There are two tests: compressing and decompressing
- with LZMA method. Benchmark shows rating in MIPS (million
- instructions per second). Rating value is calculated from
- measured speed and it is normalized with Intel's Core 2 results.
- Also Benchmark checks possible hardware errors (RAM
- errors in most cases). Benchmark uses these settings:
- (-a1, -d21, -fb32, -mfbt4). You can change only -d parameter.
- Also you can change the number of iterations. Example for 30 iterations:
- LZMA b 30
- Default number of iterations is 10.
- <Switches>
-
- -a{N}: set compression mode 0 = fast, 1 = normal
- default: 1 (normal)
- d{N}: Sets Dictionary size - [0, 30], default: 23 (8MB)
- The maximum value for dictionary size is 1 GB = 2^30 bytes.
- Dictionary size is calculated as DictionarySize = 2^N bytes.
- For decompressing file compressed by LZMA method with dictionary
- size D = 2^N you need about D bytes of memory (RAM).
- -fb{N}: set number of fast bytes - [5, 273], default: 128
- Usually big number gives a little bit better compression ratio
- and slower compression process.
- -lc{N}: set number of literal context bits - [0, 8], default: 3
- Sometimes lc=4 gives gain for big files.
- -lp{N}: set number of literal pos bits - [0, 4], default: 0
- lp switch is intended for periodical data when period is
- equal 2^N. For example, for 32-bit (4 bytes)
- periodical data you can use lp=2. Often it's better to set lc0,
- if you change lp switch.
- -pb{N}: set number of pos bits - [0, 4], default: 2
- pb switch is intended for periodical data
- when period is equal 2^N.
- -mf{MF_ID}: set Match Finder. Default: bt4.
- Algorithms from hc* group doesn't provide good compression
- ratio, but they often works pretty fast in combination with
- fast mode (-a0).
- Memory requirements depend from dictionary size
- (parameter "d" in table below).
- MF_ID Memory Description
- bt2 d * 9.5 + 4MB Binary Tree with 2 bytes hashing.
- bt3 d * 11.5 + 4MB Binary Tree with 3 bytes hashing.
- bt4 d * 11.5 + 4MB Binary Tree with 4 bytes hashing.
- hc4 d * 7.5 + 4MB Hash Chain with 4 bytes hashing.
- -eos: write End Of Stream marker. By default LZMA doesn't write
- eos marker, since LZMA decoder knows uncompressed size
- stored in .lzma file header.
- -si: Read data from stdin (it will write End Of Stream marker).
- -so: Write data to stdout
- Examples:
- 1) LZMA e file.bin file.lzma -d16 -lc0
- compresses file.bin to file.lzma with 64 KB dictionary (2^16=64K)
- and 0 literal context bits. -lc0 allows to reduce memory requirements
- for decompression.
- 2) LZMA e file.bin file.lzma -lc0 -lp2
- compresses file.bin to file.lzma with settings suitable
- for 32-bit periodical data (for example, ARM or MIPS code).
- 3) LZMA d file.lzma file.bin
- decompresses file.lzma to file.bin.
- Compression ratio hints
- -----------------------
- Recommendations
- ---------------
- To increase the compression ratio for LZMA compressing it's desirable
- to have aligned data (if it's possible) and also it's desirable to locate
- data in such order, where code is grouped in one place and data is
- grouped in other place (it's better than such mixing: code, data, code,
- data, ...).
- Filters
- -------
- You can increase the compression ratio for some data types, using
- special filters before compressing. For example, it's possible to
- increase the compression ratio on 5-10% for code for those CPU ISAs:
- x86, IA-64, ARM, ARM-Thumb, PowerPC, SPARC.
- You can find C source code of such filters in C/Bra*.* files
- You can check the compression ratio gain of these filters with such
- 7-Zip commands (example for ARM code):
- No filter:
- 7z a a1.7z a.bin -m0=lzma
- With filter for little-endian ARM code:
- 7z a a2.7z a.bin -m0=arm -m1=lzma
- It works in such manner:
- Compressing = Filter_encoding + LZMA_encoding
- Decompressing = LZMA_decoding + Filter_decoding
- Compressing and decompressing speed of such filters is very high,
- so it will not increase decompressing time too much.
- Moreover, it reduces decompression time for LZMA_decoding,
- since compression ratio with filtering is higher.
- These filters convert CALL (calling procedure) instructions
- from relative offsets to absolute addresses, so such data becomes more
- compressible.
- For some ISAs (for example, for MIPS) it's impossible to get gain from such filter.
- LZMA compressed file format
- ---------------------------
- Offset Size Description
- 0 1 Special LZMA properties (lc,lp, pb in encoded form)
- 1 4 Dictionary size (little endian)
- 5 8 Uncompressed size (little endian). -1 means unknown size
- 13 Compressed data
- ANSI-C LZMA Decoder
- ~~~~~~~~~~~~~~~~~~~
- Please note that interfaces for ANSI-C code were changed in LZMA SDK 4.58.
- If you want to use old interfaces you can download previous version of LZMA SDK
- from sourceforge.net site.
- To use ANSI-C LZMA Decoder you need the following files:
- 1) LzmaDec.h + LzmaDec.c + Types.h
- LzmaUtil/LzmaUtil.c is example application that uses these files.
- Memory requirements for LZMA decoding
- -------------------------------------
- Stack usage of LZMA decoding function for local variables is not
- larger than 200-400 bytes.
- LZMA Decoder uses dictionary buffer and internal state structure.
- Internal state structure consumes
- state_size = (4 + (1.5 << (lc + lp))) KB
- by default (lc=3, lp=0), state_size = 16 KB.
- How To decompress data
- ----------------------
- LZMA Decoder (ANSI-C version) now supports 2 interfaces:
- 1) Single-call Decompressing
- 2) Multi-call State Decompressing (zlib-like interface)
- You must use external allocator:
- Example:
- void *SzAlloc(void *p, size_t size) { p = p; return malloc(size); }
- void SzFree(void *p, void *address) { p = p; free(address); }
- ISzAlloc alloc = { SzAlloc, SzFree };
- You can use p = p; operator to disable compiler warnings.
- Single-call Decompressing
- -------------------------
- When to use: RAM->RAM decompressing
- Compile files: LzmaDec.h + LzmaDec.c + Types.h
- Compile defines: no defines
- Memory Requirements:
- - Input buffer: compressed size
- - Output buffer: uncompressed size
- - LZMA Internal Structures: state_size (16 KB for default settings)
- Interface:
- int LzmaDecode(Byte *dest, SizeT *destLen, const Byte *src, SizeT *srcLen,
- const Byte *propData, unsigned propSize, ELzmaFinishMode finishMode,
- ELzmaStatus *status, ISzAlloc *alloc);
- In:
- dest - output data
- destLen - output data size
- src - input data
- srcLen - input data size
- propData - LZMA properties (5 bytes)
- propSize - size of propData buffer (5 bytes)
- finishMode - It has meaning only if the decoding reaches output limit (*destLen).
- LZMA_FINISH_ANY - Decode just destLen bytes.
- LZMA_FINISH_END - Stream must be finished after (*destLen).
- You can use LZMA_FINISH_END, when you know that
- current output buffer covers last bytes of stream.
- alloc - Memory allocator.
- Out:
- destLen - processed output size
- srcLen - processed input size
- Output:
- SZ_OK
- status:
- LZMA_STATUS_FINISHED_WITH_MARK
- LZMA_STATUS_NOT_FINISHED
- LZMA_STATUS_MAYBE_FINISHED_WITHOUT_MARK
- SZ_ERROR_DATA - Data error
- SZ_ERROR_MEM - Memory allocation error
- SZ_ERROR_UNSUPPORTED - Unsupported properties
- SZ_ERROR_INPUT_EOF - It needs more bytes in input buffer (src).
- If LZMA decoder sees end_marker before reaching output limit, it returns OK result,
- and output value of destLen will be less than output buffer size limit.
- You can use multiple checks to test data integrity after full decompression:
- 1) Check Result and "status" variable.
- 2) Check that output(destLen) = uncompressedSize, if you know real uncompressedSize.
- 3) Check that output(srcLen) = compressedSize, if you know real compressedSize.
- You must use correct finish mode in that case. */
- Multi-call State Decompressing (zlib-like interface)
- ----------------------------------------------------
- When to use: file->file decompressing
- Compile files: LzmaDec.h + LzmaDec.c + Types.h
- Memory Requirements:
- - Buffer for input stream: any size (for example, 16 KB)
- - Buffer for output stream: any size (for example, 16 KB)
- - LZMA Internal Structures: state_size (16 KB for default settings)
- - LZMA dictionary (dictionary size is encoded in LZMA properties header)
- 1) read LZMA properties (5 bytes) and uncompressed size (8 bytes, little-endian) to header:
- unsigned char header[LZMA_PROPS_SIZE + 8];
- ReadFile(inFile, header, sizeof(header)
- 2) Allocate CLzmaDec structures (state + dictionary) using LZMA properties
- CLzmaDec state;
- LzmaDec_Constr(&state);
- res = LzmaDec_Allocate(&state, header, LZMA_PROPS_SIZE, &g_Alloc);
- if (res != SZ_OK)
- return res;
- 3) Init LzmaDec structure before any new LZMA stream. And call LzmaDec_DecodeToBuf in loop
- LzmaDec_Init(&state);
- for (;;)
- {
- ...
- int res = LzmaDec_DecodeToBuf(CLzmaDec *p, Byte *dest, SizeT *destLen,
- const Byte *src, SizeT *srcLen, ELzmaFinishMode finishMode);
- ...
- }
- 4) Free all allocated structures
- LzmaDec_Free(&state, &g_Alloc);
- For full code example, look at C/LzmaUtil/LzmaUtil.c code.
- How To compress data
- --------------------
- Compile files: LzmaEnc.h + LzmaEnc.c + Types.h +
- LzFind.c + LzFind.h + LzFindMt.c + LzFindMt.h + LzHash.h
- Memory Requirements:
- - (dictSize * 11.5 + 6 MB) + state_size
- Lzma Encoder can use two memory allocators:
- 1) alloc - for small arrays.
- 2) allocBig - for big arrays.
- For example, you can use Large RAM Pages (2 MB) in allocBig allocator for
- better compression speed. Note that Windows has bad implementation for
- Large RAM Pages.
- It's OK to use same allocator for alloc and allocBig.
- Single-call Compression with callbacks
- --------------------------------------
- Check C/LzmaUtil/LzmaUtil.c as example,
- When to use: file->file decompressing
- 1) you must implement callback structures for interfaces:
- ISeqInStream
- ISeqOutStream
- ICompressProgress
- ISzAlloc
- static void *SzAlloc(void *p, size_t size) { p = p; return MyAlloc(size); }
- static void SzFree(void *p, void *address) { p = p; MyFree(address); }
- static ISzAlloc g_Alloc = { SzAlloc, SzFree };
- CFileSeqInStream inStream;
- CFileSeqOutStream outStream;
- inStream.funcTable.Read = MyRead;
- inStream.file = inFile;
- outStream.funcTable.Write = MyWrite;
- outStream.file = outFile;
- 2) Create CLzmaEncHandle object;
- CLzmaEncHandle enc;
- enc = LzmaEnc_Create(&g_Alloc);
- if (enc == 0)
- return SZ_ERROR_MEM;
- 3) initialize CLzmaEncProps properties;
- LzmaEncProps_Init(&props);
- Then you can change some properties in that structure.
- 4) Send LZMA properties to LZMA Encoder
- res = LzmaEnc_SetProps(enc, &props);
- 5) Write encoded properties to header
- Byte header[LZMA_PROPS_SIZE + 8];
- size_t headerSize = LZMA_PROPS_SIZE;
- UInt64 fileSize;
- int i;
- res = LzmaEnc_WriteProperties(enc, header, &headerSize);
- fileSize = MyGetFileLength(inFile);
- for (i = 0; i < 8; i++)
- header[headerSize++] = (Byte)(fileSize >> (8 * i));
- MyWriteFileAndCheck(outFile, header, headerSize)
- 6) Call encoding function:
- res = LzmaEnc_Encode(enc, &outStream.funcTable, &inStream.funcTable,
- NULL, &g_Alloc, &g_Alloc);
- 7) Destroy LZMA Encoder Object
- LzmaEnc_Destroy(enc, &g_Alloc, &g_Alloc);
- If callback function return some error code, LzmaEnc_Encode also returns that code.
- Single-call RAM->RAM Compression
- --------------------------------
- Single-call RAM->RAM Compression is similar to Compression with callbacks,
- but you provide pointers to buffers instead of pointers to stream callbacks:
- HRes LzmaEncode(Byte *dest, SizeT *destLen, const Byte *src, SizeT srcLen,
- CLzmaEncProps *props, Byte *propsEncoded, SizeT *propsSize, int writeEndMark,
- ICompressProgress *progress, ISzAlloc *alloc, ISzAlloc *allocBig);
- Return code:
- SZ_OK - OK
- SZ_ERROR_MEM - Memory allocation error
- SZ_ERROR_PARAM - Incorrect paramater
- SZ_ERROR_OUTPUT_EOF - output buffer overflow
- SZ_ERROR_THREAD - errors in multithreading functions (only for Mt version)
- LZMA Defines
- ------------
- _LZMA_SIZE_OPT - Enable some optimizations in LZMA Decoder to get smaller executable code.
- _LZMA_PROB32 - It can increase the speed on some 32-bit CPUs, but memory usage for
- some structures will be doubled in that case.
- _LZMA_UINT32_IS_ULONG - Define it if int is 16-bit on your compiler and long is 32-bit.
- _LZMA_NO_SYSTEM_SIZE_T - Define it if you don't want to use size_t type.
- C++ LZMA Encoder/Decoder
- ~~~~~~~~~~~~~~~~~~~~~~~~
- C++ LZMA code use COM-like interfaces. So if you want to use it,
- you can study basics of COM/OLE.
- C++ LZMA code is just wrapper over ANSI-C code.
- C++ Notes
- ~~~~~~~~~~~~~~~~~~~~~~~~
- If you use some C++ code folders in 7-Zip (for example, C++ code for .7z handling),
- you must check that you correctly work with "new" operator.
- 7-Zip can be compiled with MSVC 6.0 that doesn't throw "exception" from "new" operator.
- So 7-Zip uses "CPP\Common\NewHandler.cpp" that redefines "new" operator:
- operator new(size_t size)
- {
- void *p = ::malloc(size);
- if (p == 0)
- throw CNewException();
- return p;
- }
- If you use MSCV that throws exception for "new" operator, you can compile without
- "NewHandler.cpp". So standard exception will be used. Actually some code of
- 7-Zip catches any exception in internal code and converts it to HRESULT code.
- So you don't need to catch CNewException, if you call COM interfaces of 7-Zip.
- ---
- http://www.7-zip.org
- http://www.7-zip.org/sdk.html
- http://www.7-zip.org/support.html
|