lzma.txt 19 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595
  1. LZMA SDK 4.65
  2. -------------
  3. LZMA SDK provides the documentation, samples, header files, libraries,
  4. and tools you need to develop applications that use LZMA compression.
  5. LZMA is default and general compression method of 7z format
  6. in 7-Zip compression program (www.7-zip.org). LZMA provides high
  7. compression ratio and very fast decompression.
  8. LZMA is an improved version of famous LZ77 compression algorithm.
  9. It was improved in way of maximum increasing of compression ratio,
  10. keeping high decompression speed and low memory requirements for
  11. decompressing.
  12. LICENSE
  13. -------
  14. LZMA SDK is written and placed in the public domain by Igor Pavlov.
  15. LZMA SDK Contents
  16. -----------------
  17. LZMA SDK includes:
  18. - ANSI-C/C++/C#/Java source code for LZMA compressing and decompressing
  19. - Compiled file->file LZMA compressing/decompressing program for Windows system
  20. UNIX/Linux version
  21. ------------------
  22. To compile C++ version of file->file LZMA encoding, go to directory
  23. C++/7zip/Compress/LZMA_Alone
  24. and call make to recompile it:
  25. make -f makefile.gcc clean all
  26. In some UNIX/Linux versions you must compile LZMA with static libraries.
  27. To compile with static libraries, you can use
  28. LIB = -lm -static
  29. Files
  30. ---------------------
  31. lzma.txt - LZMA SDK description (this file)
  32. 7zFormat.txt - 7z Format description
  33. 7zC.txt - 7z ANSI-C Decoder description
  34. methods.txt - Compression method IDs for .7z
  35. lzma.exe - Compiled file->file LZMA encoder/decoder for Windows
  36. history.txt - history of the LZMA SDK
  37. Source code structure
  38. ---------------------
  39. C/ - C files
  40. 7zCrc*.* - CRC code
  41. Alloc.* - Memory allocation functions
  42. Bra*.* - Filters for x86, IA-64, ARM, ARM-Thumb, PowerPC and SPARC code
  43. LzFind.* - Match finder for LZ (LZMA) encoders
  44. LzFindMt.* - Match finder for LZ (LZMA) encoders for multithreading encoding
  45. LzHash.h - Additional file for LZ match finder
  46. LzmaDec.* - LZMA decoding
  47. LzmaEnc.* - LZMA encoding
  48. LzmaLib.* - LZMA Library for DLL calling
  49. Types.h - Basic types for another .c files
  50. Threads.* - The code for multithreading.
  51. LzmaLib - LZMA Library (.DLL for Windows)
  52. LzmaUtil - LZMA Utility (file->file LZMA encoder/decoder).
  53. Archive - files related to archiving
  54. 7z - 7z ANSI-C Decoder
  55. CPP/ -- CPP files
  56. Common - common files for C++ projects
  57. Windows - common files for Windows related code
  58. 7zip - files related to 7-Zip Project
  59. Common - common files for 7-Zip
  60. Compress - files related to compression/decompression
  61. Copy - Copy coder
  62. RangeCoder - Range Coder (special code of compression/decompression)
  63. LZMA - LZMA compression/decompression on C++
  64. LZMA_Alone - file->file LZMA compression/decompression
  65. Branch - Filters for x86, IA-64, ARM, ARM-Thumb, PowerPC and SPARC code
  66. Archive - files related to archiving
  67. Common - common files for archive handling
  68. 7z - 7z C++ Encoder/Decoder
  69. Bundles - Modules that are bundles of other modules
  70. Alone7z - 7zr.exe: Standalone version of 7z.exe that supports only 7z/LZMA/BCJ/BCJ2
  71. Format7zR - 7zr.dll: Reduced version of 7za.dll: extracting/compressing to 7z/LZMA/BCJ/BCJ2
  72. Format7zExtractR - 7zxr.dll: Reduced version of 7zxa.dll: extracting from 7z/LZMA/BCJ/BCJ2.
  73. UI - User Interface files
  74. Client7z - Test application for 7za.dll, 7zr.dll, 7zxr.dll
  75. Common - Common UI files
  76. Console - Code for console archiver
  77. CS/ - C# files
  78. 7zip
  79. Common - some common files for 7-Zip
  80. Compress - files related to compression/decompression
  81. LZ - files related to LZ (Lempel-Ziv) compression algorithm
  82. LZMA - LZMA compression/decompression
  83. LzmaAlone - file->file LZMA compression/decompression
  84. RangeCoder - Range Coder (special code of compression/decompression)
  85. Java/ - Java files
  86. SevenZip
  87. Compression - files related to compression/decompression
  88. LZ - files related to LZ (Lempel-Ziv) compression algorithm
  89. LZMA - LZMA compression/decompression
  90. RangeCoder - Range Coder (special code of compression/decompression)
  91. C/C++ source code of LZMA SDK is part of 7-Zip project.
  92. 7-Zip source code can be downloaded from 7-Zip's SourceForge page:
  93. http://sourceforge.net/projects/sevenzip/
  94. LZMA features
  95. -------------
  96. - Variable dictionary size (up to 1 GB)
  97. - Estimated compressing speed: about 2 MB/s on 2 GHz CPU
  98. - Estimated decompressing speed:
  99. - 20-30 MB/s on 2 GHz Core 2 or AMD Athlon 64
  100. - 1-2 MB/s on 200 MHz ARM, MIPS, PowerPC or other simple RISC
  101. - Small memory requirements for decompressing (16 KB + DictionarySize)
  102. - Small code size for decompressing: 5-8 KB
  103. LZMA decoder uses only integer operations and can be
  104. implemented in any modern 32-bit CPU (or on 16-bit CPU with some conditions).
  105. Some critical operations that affect the speed of LZMA decompression:
  106. 1) 32*16 bit integer multiply
  107. 2) Misspredicted branches (penalty mostly depends from pipeline length)
  108. 3) 32-bit shift and arithmetic operations
  109. The speed of LZMA decompressing mostly depends from CPU speed.
  110. Memory speed has no big meaning. But if your CPU has small data cache,
  111. overall weight of memory speed will slightly increase.
  112. How To Use
  113. ----------
  114. Using LZMA encoder/decoder executable
  115. --------------------------------------
  116. Usage: LZMA <e|d> inputFile outputFile [<switches>...]
  117. e: encode file
  118. d: decode file
  119. b: Benchmark. There are two tests: compressing and decompressing
  120. with LZMA method. Benchmark shows rating in MIPS (million
  121. instructions per second). Rating value is calculated from
  122. measured speed and it is normalized with Intel's Core 2 results.
  123. Also Benchmark checks possible hardware errors (RAM
  124. errors in most cases). Benchmark uses these settings:
  125. (-a1, -d21, -fb32, -mfbt4). You can change only -d parameter.
  126. Also you can change the number of iterations. Example for 30 iterations:
  127. LZMA b 30
  128. Default number of iterations is 10.
  129. <Switches>
  130. -a{N}: set compression mode 0 = fast, 1 = normal
  131. default: 1 (normal)
  132. d{N}: Sets Dictionary size - [0, 30], default: 23 (8MB)
  133. The maximum value for dictionary size is 1 GB = 2^30 bytes.
  134. Dictionary size is calculated as DictionarySize = 2^N bytes.
  135. For decompressing file compressed by LZMA method with dictionary
  136. size D = 2^N you need about D bytes of memory (RAM).
  137. -fb{N}: set number of fast bytes - [5, 273], default: 128
  138. Usually big number gives a little bit better compression ratio
  139. and slower compression process.
  140. -lc{N}: set number of literal context bits - [0, 8], default: 3
  141. Sometimes lc=4 gives gain for big files.
  142. -lp{N}: set number of literal pos bits - [0, 4], default: 0
  143. lp switch is intended for periodical data when period is
  144. equal 2^N. For example, for 32-bit (4 bytes)
  145. periodical data you can use lp=2. Often it's better to set lc0,
  146. if you change lp switch.
  147. -pb{N}: set number of pos bits - [0, 4], default: 2
  148. pb switch is intended for periodical data
  149. when period is equal 2^N.
  150. -mf{MF_ID}: set Match Finder. Default: bt4.
  151. Algorithms from hc* group doesn't provide good compression
  152. ratio, but they often works pretty fast in combination with
  153. fast mode (-a0).
  154. Memory requirements depend from dictionary size
  155. (parameter "d" in table below).
  156. MF_ID Memory Description
  157. bt2 d * 9.5 + 4MB Binary Tree with 2 bytes hashing.
  158. bt3 d * 11.5 + 4MB Binary Tree with 3 bytes hashing.
  159. bt4 d * 11.5 + 4MB Binary Tree with 4 bytes hashing.
  160. hc4 d * 7.5 + 4MB Hash Chain with 4 bytes hashing.
  161. -eos: write End Of Stream marker. By default LZMA doesn't write
  162. eos marker, since LZMA decoder knows uncompressed size
  163. stored in .lzma file header.
  164. -si: Read data from stdin (it will write End Of Stream marker).
  165. -so: Write data to stdout
  166. Examples:
  167. 1) LZMA e file.bin file.lzma -d16 -lc0
  168. compresses file.bin to file.lzma with 64 KB dictionary (2^16=64K)
  169. and 0 literal context bits. -lc0 allows to reduce memory requirements
  170. for decompression.
  171. 2) LZMA e file.bin file.lzma -lc0 -lp2
  172. compresses file.bin to file.lzma with settings suitable
  173. for 32-bit periodical data (for example, ARM or MIPS code).
  174. 3) LZMA d file.lzma file.bin
  175. decompresses file.lzma to file.bin.
  176. Compression ratio hints
  177. -----------------------
  178. Recommendations
  179. ---------------
  180. To increase the compression ratio for LZMA compressing it's desirable
  181. to have aligned data (if it's possible) and also it's desirable to locate
  182. data in such order, where code is grouped in one place and data is
  183. grouped in other place (it's better than such mixing: code, data, code,
  184. data, ...).
  185. Filters
  186. -------
  187. You can increase the compression ratio for some data types, using
  188. special filters before compressing. For example, it's possible to
  189. increase the compression ratio on 5-10% for code for those CPU ISAs:
  190. x86, IA-64, ARM, ARM-Thumb, PowerPC, SPARC.
  191. You can find C source code of such filters in C/Bra*.* files
  192. You can check the compression ratio gain of these filters with such
  193. 7-Zip commands (example for ARM code):
  194. No filter:
  195. 7z a a1.7z a.bin -m0=lzma
  196. With filter for little-endian ARM code:
  197. 7z a a2.7z a.bin -m0=arm -m1=lzma
  198. It works in such manner:
  199. Compressing = Filter_encoding + LZMA_encoding
  200. Decompressing = LZMA_decoding + Filter_decoding
  201. Compressing and decompressing speed of such filters is very high,
  202. so it will not increase decompressing time too much.
  203. Moreover, it reduces decompression time for LZMA_decoding,
  204. since compression ratio with filtering is higher.
  205. These filters convert CALL (calling procedure) instructions
  206. from relative offsets to absolute addresses, so such data becomes more
  207. compressible.
  208. For some ISAs (for example, for MIPS) it's impossible to get gain from such filter.
  209. LZMA compressed file format
  210. ---------------------------
  211. Offset Size Description
  212. 0 1 Special LZMA properties (lc,lp, pb in encoded form)
  213. 1 4 Dictionary size (little endian)
  214. 5 8 Uncompressed size (little endian). -1 means unknown size
  215. 13 Compressed data
  216. ANSI-C LZMA Decoder
  217. ~~~~~~~~~~~~~~~~~~~
  218. Please note that interfaces for ANSI-C code were changed in LZMA SDK 4.58.
  219. If you want to use old interfaces you can download previous version of LZMA SDK
  220. from sourceforge.net site.
  221. To use ANSI-C LZMA Decoder you need the following files:
  222. 1) LzmaDec.h + LzmaDec.c + Types.h
  223. LzmaUtil/LzmaUtil.c is example application that uses these files.
  224. Memory requirements for LZMA decoding
  225. -------------------------------------
  226. Stack usage of LZMA decoding function for local variables is not
  227. larger than 200-400 bytes.
  228. LZMA Decoder uses dictionary buffer and internal state structure.
  229. Internal state structure consumes
  230. state_size = (4 + (1.5 << (lc + lp))) KB
  231. by default (lc=3, lp=0), state_size = 16 KB.
  232. How To decompress data
  233. ----------------------
  234. LZMA Decoder (ANSI-C version) now supports 2 interfaces:
  235. 1) Single-call Decompressing
  236. 2) Multi-call State Decompressing (zlib-like interface)
  237. You must use external allocator:
  238. Example:
  239. void *SzAlloc(void *p, size_t size) { p = p; return malloc(size); }
  240. void SzFree(void *p, void *address) { p = p; free(address); }
  241. ISzAlloc alloc = { SzAlloc, SzFree };
  242. You can use p = p; operator to disable compiler warnings.
  243. Single-call Decompressing
  244. -------------------------
  245. When to use: RAM->RAM decompressing
  246. Compile files: LzmaDec.h + LzmaDec.c + Types.h
  247. Compile defines: no defines
  248. Memory Requirements:
  249. - Input buffer: compressed size
  250. - Output buffer: uncompressed size
  251. - LZMA Internal Structures: state_size (16 KB for default settings)
  252. Interface:
  253. int LzmaDecode(Byte *dest, SizeT *destLen, const Byte *src, SizeT *srcLen,
  254. const Byte *propData, unsigned propSize, ELzmaFinishMode finishMode,
  255. ELzmaStatus *status, ISzAlloc *alloc);
  256. In:
  257. dest - output data
  258. destLen - output data size
  259. src - input data
  260. srcLen - input data size
  261. propData - LZMA properties (5 bytes)
  262. propSize - size of propData buffer (5 bytes)
  263. finishMode - It has meaning only if the decoding reaches output limit (*destLen).
  264. LZMA_FINISH_ANY - Decode just destLen bytes.
  265. LZMA_FINISH_END - Stream must be finished after (*destLen).
  266. You can use LZMA_FINISH_END, when you know that
  267. current output buffer covers last bytes of stream.
  268. alloc - Memory allocator.
  269. Out:
  270. destLen - processed output size
  271. srcLen - processed input size
  272. Output:
  273. SZ_OK
  274. status:
  275. LZMA_STATUS_FINISHED_WITH_MARK
  276. LZMA_STATUS_NOT_FINISHED
  277. LZMA_STATUS_MAYBE_FINISHED_WITHOUT_MARK
  278. SZ_ERROR_DATA - Data error
  279. SZ_ERROR_MEM - Memory allocation error
  280. SZ_ERROR_UNSUPPORTED - Unsupported properties
  281. SZ_ERROR_INPUT_EOF - It needs more bytes in input buffer (src).
  282. If LZMA decoder sees end_marker before reaching output limit, it returns OK result,
  283. and output value of destLen will be less than output buffer size limit.
  284. You can use multiple checks to test data integrity after full decompression:
  285. 1) Check Result and "status" variable.
  286. 2) Check that output(destLen) = uncompressedSize, if you know real uncompressedSize.
  287. 3) Check that output(srcLen) = compressedSize, if you know real compressedSize.
  288. You must use correct finish mode in that case. */
  289. Multi-call State Decompressing (zlib-like interface)
  290. ----------------------------------------------------
  291. When to use: file->file decompressing
  292. Compile files: LzmaDec.h + LzmaDec.c + Types.h
  293. Memory Requirements:
  294. - Buffer for input stream: any size (for example, 16 KB)
  295. - Buffer for output stream: any size (for example, 16 KB)
  296. - LZMA Internal Structures: state_size (16 KB for default settings)
  297. - LZMA dictionary (dictionary size is encoded in LZMA properties header)
  298. 1) read LZMA properties (5 bytes) and uncompressed size (8 bytes, little-endian) to header:
  299. unsigned char header[LZMA_PROPS_SIZE + 8];
  300. ReadFile(inFile, header, sizeof(header)
  301. 2) Allocate CLzmaDec structures (state + dictionary) using LZMA properties
  302. CLzmaDec state;
  303. LzmaDec_Constr(&state);
  304. res = LzmaDec_Allocate(&state, header, LZMA_PROPS_SIZE, &g_Alloc);
  305. if (res != SZ_OK)
  306. return res;
  307. 3) Init LzmaDec structure before any new LZMA stream. And call LzmaDec_DecodeToBuf in loop
  308. LzmaDec_Init(&state);
  309. for (;;)
  310. {
  311. ...
  312. int res = LzmaDec_DecodeToBuf(CLzmaDec *p, Byte *dest, SizeT *destLen,
  313. const Byte *src, SizeT *srcLen, ELzmaFinishMode finishMode);
  314. ...
  315. }
  316. 4) Free all allocated structures
  317. LzmaDec_Free(&state, &g_Alloc);
  318. For full code example, look at C/LzmaUtil/LzmaUtil.c code.
  319. How To compress data
  320. --------------------
  321. Compile files: LzmaEnc.h + LzmaEnc.c + Types.h +
  322. LzFind.c + LzFind.h + LzFindMt.c + LzFindMt.h + LzHash.h
  323. Memory Requirements:
  324. - (dictSize * 11.5 + 6 MB) + state_size
  325. Lzma Encoder can use two memory allocators:
  326. 1) alloc - for small arrays.
  327. 2) allocBig - for big arrays.
  328. For example, you can use Large RAM Pages (2 MB) in allocBig allocator for
  329. better compression speed. Note that Windows has bad implementation for
  330. Large RAM Pages.
  331. It's OK to use same allocator for alloc and allocBig.
  332. Single-call Compression with callbacks
  333. --------------------------------------
  334. Check C/LzmaUtil/LzmaUtil.c as example,
  335. When to use: file->file decompressing
  336. 1) you must implement callback structures for interfaces:
  337. ISeqInStream
  338. ISeqOutStream
  339. ICompressProgress
  340. ISzAlloc
  341. static void *SzAlloc(void *p, size_t size) { p = p; return MyAlloc(size); }
  342. static void SzFree(void *p, void *address) { p = p; MyFree(address); }
  343. static ISzAlloc g_Alloc = { SzAlloc, SzFree };
  344. CFileSeqInStream inStream;
  345. CFileSeqOutStream outStream;
  346. inStream.funcTable.Read = MyRead;
  347. inStream.file = inFile;
  348. outStream.funcTable.Write = MyWrite;
  349. outStream.file = outFile;
  350. 2) Create CLzmaEncHandle object;
  351. CLzmaEncHandle enc;
  352. enc = LzmaEnc_Create(&g_Alloc);
  353. if (enc == 0)
  354. return SZ_ERROR_MEM;
  355. 3) initialize CLzmaEncProps properties;
  356. LzmaEncProps_Init(&props);
  357. Then you can change some properties in that structure.
  358. 4) Send LZMA properties to LZMA Encoder
  359. res = LzmaEnc_SetProps(enc, &props);
  360. 5) Write encoded properties to header
  361. Byte header[LZMA_PROPS_SIZE + 8];
  362. size_t headerSize = LZMA_PROPS_SIZE;
  363. UInt64 fileSize;
  364. int i;
  365. res = LzmaEnc_WriteProperties(enc, header, &headerSize);
  366. fileSize = MyGetFileLength(inFile);
  367. for (i = 0; i < 8; i++)
  368. header[headerSize++] = (Byte)(fileSize >> (8 * i));
  369. MyWriteFileAndCheck(outFile, header, headerSize)
  370. 6) Call encoding function:
  371. res = LzmaEnc_Encode(enc, &outStream.funcTable, &inStream.funcTable,
  372. NULL, &g_Alloc, &g_Alloc);
  373. 7) Destroy LZMA Encoder Object
  374. LzmaEnc_Destroy(enc, &g_Alloc, &g_Alloc);
  375. If callback function return some error code, LzmaEnc_Encode also returns that code.
  376. Single-call RAM->RAM Compression
  377. --------------------------------
  378. Single-call RAM->RAM Compression is similar to Compression with callbacks,
  379. but you provide pointers to buffers instead of pointers to stream callbacks:
  380. HRes LzmaEncode(Byte *dest, SizeT *destLen, const Byte *src, SizeT srcLen,
  381. CLzmaEncProps *props, Byte *propsEncoded, SizeT *propsSize, int writeEndMark,
  382. ICompressProgress *progress, ISzAlloc *alloc, ISzAlloc *allocBig);
  383. Return code:
  384. SZ_OK - OK
  385. SZ_ERROR_MEM - Memory allocation error
  386. SZ_ERROR_PARAM - Incorrect paramater
  387. SZ_ERROR_OUTPUT_EOF - output buffer overflow
  388. SZ_ERROR_THREAD - errors in multithreading functions (only for Mt version)
  389. LZMA Defines
  390. ------------
  391. _LZMA_SIZE_OPT - Enable some optimizations in LZMA Decoder to get smaller executable code.
  392. _LZMA_PROB32 - It can increase the speed on some 32-bit CPUs, but memory usage for
  393. some structures will be doubled in that case.
  394. _LZMA_UINT32_IS_ULONG - Define it if int is 16-bit on your compiler and long is 32-bit.
  395. _LZMA_NO_SYSTEM_SIZE_T - Define it if you don't want to use size_t type.
  396. C++ LZMA Encoder/Decoder
  397. ~~~~~~~~~~~~~~~~~~~~~~~~
  398. C++ LZMA code use COM-like interfaces. So if you want to use it,
  399. you can study basics of COM/OLE.
  400. C++ LZMA code is just wrapper over ANSI-C code.
  401. C++ Notes
  402. ~~~~~~~~~~~~~~~~~~~~~~~~
  403. If you use some C++ code folders in 7-Zip (for example, C++ code for .7z handling),
  404. you must check that you correctly work with "new" operator.
  405. 7-Zip can be compiled with MSVC 6.0 that doesn't throw "exception" from "new" operator.
  406. So 7-Zip uses "CPP\Common\NewHandler.cpp" that redefines "new" operator:
  407. operator new(size_t size)
  408. {
  409. void *p = ::malloc(size);
  410. if (p == 0)
  411. throw CNewException();
  412. return p;
  413. }
  414. If you use MSCV that throws exception for "new" operator, you can compile without
  415. "NewHandler.cpp". So standard exception will be used. Actually some code of
  416. 7-Zip catches any exception in internal code and converts it to HRESULT code.
  417. So you don't need to catch CNewException, if you call COM interfaces of 7-Zip.
  418. ---
  419. http://www.7-zip.org
  420. http://www.7-zip.org/sdk.html
  421. http://www.7-zip.org/support.html