Parchive

From Wikipedia, the free encyclopedia

Jump to: navigation, search
PArchive
File extension:.par, .par2, .p??
Type of format:forward error correction

Parchive (or parity volume set archive) is a forward error correction-style system that can be applied to one or more files to allow recovery when data is lost or corrupted. When used with collections of files, some missing files can be regenerated.

Contents

[edit] Overview

Par and Par2 files are used to repair damaged files based on their binary pattern. The name Par comes from Parity. These files can be used to reconstruct damaged files or ones that are missing from a parity set. For example, if you have downloaded an archive that was split into 47 files but one of them is missing, you could simply download a PAR file for that archive. Once you have the PAR file, you can use a PAR program to reconstruct the missing file.

[edit] History

Usenet newsgroups were originally designed for informal conversations and therefore were not designed to be a reliable transmission medium. Another limitation which was acceptable for conversations (before the advent of Unicode) was that messages were normally fairly short in length and limited to 7-bit ASCII text.

To move 8 bit binary data in a 7 bit channel and use Usenet to transfer binary files, various techniques were devised such as uuencoding and Base64. Later Usenet software allowed 8 bit Extended ASCII so newer methods such as yEnc arrived on the scene.

While the data transmission problem was solved, the unreliable nature of Usenet remained. In 2001, Tobias Rieper and Stefan Wehlus proposed the Parity Volume Set specification 1.0.[1] By transmitting extra data and using Reed-Solomon error correction, an end user can rebuild missing data from an incomplete download.

[edit] Versions

There are incompatibilities between versions 1 and 2 of the file format specification.

For version 1, given files f1, f2, ..., fn, the Parchive consists of an index file (f.par) and a number of "parity volumes" (f.p01, f.p02, etc). Given all of the original files except for one (for example, f2), it is possible to create the missing f2 given all of the other original files and any one of the parity volumes. Alternatively, it is possible to recreate two missing files from any two of the parity volumes and so forth.

The "index files" (*.par in version 1 and *.par2 in version 2) are not needed to recover any data. The indexes consist solely of hashes to quickly identify the target files, and their content is duplicated in every parity volume. Index files are used to quickly check for errors to see if additional parity files are required. They were most useful in version 1 where the parity volumes were much larger than the short index. All par data files contain the full index, but the easiest check for completeness is simply using the index file.

The biggest limitation found in real-world use of Parchives was where a single bit error in a file forced the algorithm to discard the entire file. To improve on this situation, a second version of Parchive was created that sliced all source files into much smaller blocks. These smaller blocks can be thought of as Parchive files themselves; if enough blocks are present, all other blocks can be recreated.

PAR2 files generally use this naming/extension system: filename.vol000+01.PAR2, filename.vol001+02.PAR2, filename.vol003+04.PAR2, filename.vol007+06.PAR2, etc. The +01, +02, etc. in the filename indicates how many blocks it contains. If an index file of a download states that 4 blocks are missing, the easiest way to repair the files would be by downloading filename.vol003+04.PAR2. However, due to the redundancy, filename.vol007+06.PAR2 is also acceptable.

[edit] Limitations

  • Neither Parchive versions support the handling of directory trees. To create, validate, or repair files in a directory hierarchy, the user must process the files in each individual directory separately or combine them in some way.

[edit] Other uses

Parchive files can be used for other purposes than Usenet transmission.
  • The DAR backup program baras uses PAR or PAR2 to ensure robust backups.
  • When using inexpensive CD-R media, a user can insert additional redundancy by burning Parchives with the data.
  • Some Parchive software will split a single source file into multiple smaller files to get around limitations imposed such as FAT-32 file size limitations. Even with 0% redundancy, the software can still reassemble the smaller parts into the original file.
  • Say you backup monthly to DVDs, but weekly to your hard disk using compression programs (such as WinRAR): you could create some parity volumes on the RAR files to be able to recover the more recent backups in case of partial hard disk failure.
  • You can be certain that your files have not been corrupted by viruses or bad sectors on your hard disk; e.g., you could create PAR files for directories which contain your family photo albums.
  • An alternative to PAR, ICE ECC, is available for the Microsoft Windows platform, but should work under *ix via WINE. ICE uses Reed-Solomon codes as well, but stores the results in a format incompatible with Parchive. Parchive currently has the deficiency that it can only handle files from a single directory, while ICE ECC can natively handle nested subdirectories without resorting to encapulating the original files to a ZIP- or RAR-like format.


[edit] Software

[edit] Other uses of the file extension

A PAR file could also be a deployable SAS Portlet file for the SAS Information Delivery Portal. The nomenclature follows from JAR, WAR, and EAR files. These files are just ZIPped files.

[edit] See also

[edit] External links

fr:Parchive

Views
Personal tools

Toolbox