pzip: A Compression Utility for Finite Projective Planes

Description

This package consists of two C programs designed to run under linux: pzip (for compressing) and punzip (for decompressing) data storage files for finite projective planes (expressed as incidence matrices, lists of lines through each point, or complete sets of mutually orthogonal Latin squares). Compression attained exceeds that possible with general compression utilities such as gzip (see Performance) since we take advantage of the special nature of this type of incidence information.

History

Usage

pzip n f m filename

punzip filename

where

n ∈ {2, 3, ..., 255} is the order of the plane

f ∈ {i, l, m} is the input format (incidence matrix, list of incidences, or mutually orthogonal Latin squares).
Incidence matrices are lists of 0's and 1's (whitespace characters, including blanks and end-of-line characters, if present, are ignored).
Lists consist of n2+n+1 lists of n+1 integers each, separated by whitespace characters; each list represents the set of lines incident with a given point. Lines in these lists are denoted by integer values 0, 1, 2, ..., n2+n (or 1, 2, 3, ..., n2+n+1; but please be consistent).
Mutually orthogonal Latin squares (i.e. MOLS) are listed using any set of n symbols (each symbol represented by a single ASCII character) and whitespace characters, if present, are ignored.

m ∈ {e, i} is the mode of recovery for the plane for purposes of decompression (exact or isomorph-only). Replacing the plane by an isomorph (a copy isomorphic to the original) allows for a slightly greater rate of compression. If an exact copy of the original plane is required, use the exact mode. In the case of Latin squares, the original set of symbols will be preserved in exact mode; but in isomorph-only mode, they will be replaced by a standard set of characters 0, 1, 2, ..., 9, a, b, c, ...

filename = name of file, in the current directory, containing the projective plane

Compression using pzip stores the compressed plane in the current directory in a new file named filename.pz and decompression using punzip will send the extracted plane to the standard output (by default, the display terminal).

Performance

We compare typical file sizes for projective planes. File size varies slightly for different planes of the same order; and even for isomorphic copies of the same plane; so sizes listed are only approximate. Execution time required for compression/decompression is minimal.

Storage requirements for planes of order 11:

original file
gzipped original

pzipped, exact mode

pzipped, isomorph-only mode
incidence matrix
17 KB
2 KB
264 bytes
64 bytes
list of line sets
5 KB
2 KB
264 bytes
64 bytes
10 MOLS(11)
1.3 KB
250 bytes
115 bytes
64 bytes

Storage requirements for planes of order 25:

original file
gzipped original

pzipped, exact mode

pzipped, isomorph-only mode
incidence matrix
410KB
23 KB
2300 bytes
950 bytes
list of line sets
63 KB
23 KB
2300 bytes
950 bytes
24 MOLS(25)
15 KB
9 KB
1300 bytes
950 bytes

Storage requirements for planes of order 49: (no one would bother here with incidence matrices!)

original file
gzipped original

pzipped, exact mode

pzipped, isomorph-only mode
list of line sets
550 KB
210 KB
12 KB
6 KB
48 MOLS(49)
110 KB
81 KB
8 KB
6 KB

For Desarguesian planes stored (in any format) with points and lines ordered systematically, gzip actually does better than the average cases tabulated above (almost as well as pzip!). But pzip is always the leader; usually by a long shot.

Download/Install

  1. Download pzip.tar into your favourite directory under linux. (File size 30 KB; check the md5sum or sha256sum.)
  2. Execute tar -xvf pzip.tar
  3. Enter the directory ./pzip and execute make
  4. I'd be interested in hearing from you if you make use of pzip

Description of Algorithm

Elementary... If the input format is an incidence matrix or a list of line sets, this is first converted to n−1 MOLS(n) (using O(n3log(n)) bits of storage). Now process the MOLS one at a time, line by line, entry by entry. Each entry is constrained (by the symbols previously read) to lie in some k-subset of the n symbols, where 1 ≤ kn ; it can therefore be represented using log(k) bits instead of the naïve log(n) bits. These bits are written to filename.pz along with any additional information required to reconstruct the original file (including the value of n, format specifications, choice of permutations of rows and columns used to convert matrices to standard form, etc.)

To Do

If you email me to bug me about these issues, I might get version 2.0 ready sooner.

 Under Construction


/ revised September, 2009