From a108poon@cdf.toronto.edu Fri Dec 2 21:33:51 1994 Newsgroups: comp.compression Path: solitaire.cv.nrao.edu!hearst.acc.Virginia.EDU!caen!math.ohio-state.edu!howland.reston.ans.net!pipex!uunet!utcsri!cdf.toronto.edu!news From: Poon Jacob Tin Hang Subject: Re: silly question:hqx compressed files In-Reply-To: <3b91tc$m55@ixnews1.ix.netcom.com> X-Sender: a108poon@eddie Content-Type: TEXT/PLAIN; charset=US-ASCII Message-ID: Sender: news@cdf.toronto.edu (Usenet News) Nntp-Posting-Host: eddie Organization: University of Toronto Computing Disciplines Facility References: <3b8g7g$e3e@potogold.rmii.com> <3b91tc$m55@ixnews1.ix.netcom.com> Mime-Version: 1.0 Date: Sun, 27 Nov 1994 15:19:16 GMT Lines: 21 On 27 Nov 1994, Gerald Lee wrote: > I just downloaded a file w/ the extension .hqx from a friendly ftp site, > but I am unable to figure what program I need to decomp it. I looked > through some grps and a couple FAQ's but, there's no mention of .hqx. > Is it new? What will I need to expand it, and where should I look for > the appropriate program? Thanks much in advance! > > > -- > ------------------------------------------------------------ > Gerald Lee "sleep and jumbo frets...the best!" > gbones@ix.netcom.com > ------------------------------------------------------------ > > The .hqx file belongs to an application called BinHex 4.0 origially created in Mac platform years ago. It is NOT a compression, but it is an encoding program like uuencode. To extract .hqx files, you need a file named binhex13.zip (which can also create .hqx files) available in /pub/msdos/mac directory in oak.oaklad.edu ftp site. From rrajala@tnclus.tele.nokia.fi Fri Dec 2 21:35:07 1994 Path: solitaire.cv.nrao.edu!hearst.acc.Virginia.EDU!caen!newsxfer.itd.umich.edu!gatech!swrinde!pipex!uunet!news.tele.fi!news.csc.fi!nokia.fi!tnclus!rrajala From: rrajala@tnclus.tele.nokia.fi (Risto Rajala NTC/SWS/CPC Kalkki2 puh 90-511 5843) Newsgroups: comp.compression Subject: Re: Reed Solomon code needed Date: 28 Nov 94 15:44:02 EET Organization: Nokia Telecommunications. Lines: 869 Message-ID: <1994Nov28.154402.1@tnclus> References: <3bapvg$iau$1@mhadg.production.compuserve.com> NNTP-Posting-Host: tne02.tele.nokia.fi In article <3bapvg$iau$1@mhadg.production.compuserve.com>, Andrew Powell <100330.1701@CompuServe.COM> writes: > I am looking for source code for the Reed Solomon ECC algorithm. Here is R-S-Code articles collected and posted by Tom Moore two years ago. Risto Rajala, rrajala@tnclus.tele.nokia.fi -------------------------------------------------------------------------------- Subject: REED-SOLOMON PACKAGE From: tmoore@ewu.UUCP (Tom Moore) The collection of Reed-Solomon goodies. The following collection should help you with the theory and actual code behind REED-SOLOMON Codes. The included files are responses sent back to me by request. If you use anything, please give credit to the proper authors. As of yet, I have NOT written anything to do with RS (But I will shortly........) Tom Moore {...uunet!isc-br!ewu!tmoore} Eastern Washington University {...tmoore%ewu@uunet.uu.net} Head Consultant, Computer Graphics Lab (509) 623-4235 --------------------------------------------------------------------------- # This is a shell archive. Remove anything before this line, # then unpack it by saving it in a file and typing "sh file". # # Wrapped by cg5!tmoore on Fri Nov 15 14:24:06 PST 1991 # Contents: rs.galois rs.rply.1 rs.rply.2 rs.rply.3 echo x - rs.galois sed 's/^@//' > "rs.galois" <<'@//E*O*F rs.galois//' @From: colin@array.UUCP (Colin Plumb) Subject: REED/SOLOMON -Galois fields Date: 13 Nov 91 02:31:18 GMT Organization: Array Systems Computing, Inc., Toronto, Ontario, CANADA tmoore@ewu.uucp writes: >There has got to be someone out there who has messed with REED/SOLOMON >error correcting codes. I know, I know, you all are interested in >getting avery little bit of redundancy out of text, and here I come >wanting to put some back in. It may not sit will with you, but hey, >I need some help here, so how about a little mail to me. Okay, time for the old tutorial... First, Galois fields. In number theory, we have a thing called a "group". This is a bunch of things I'll call numbers, and an operation, usually written as addition or multiplication. The axioms of a group are, roughly: For each a, and b in the group, a+b is in the group There is an identity element, often written e, such that a+e = e + a = a. For each a in the group, there is an inverse -a so a + -a = 0. For all a, b and c, (a+b)+c = a+(b+c) Some groups are commutative, or Abelian (after the mathematician Abel), in which case you also have a+b = b+a. Everything I'll work with here is, in fact, Abelian, although there are structures that aren't. An example of a group is the integers, or the integers modulo 10, with normal addition. 5+5 = 0, so -5 = 5, and so on. Fancier than that is a ring, which has addition and multiplication. Multiplication also stays within the ring, and it distributes: a*(b+c) = (a*b)+(a*c). There's also a multiplicative identity, usually written as 1. a*1 = 1*a = a. The integers are a ring, as are the integers modulo 10. Fancier still is a field, which has multiplicative inverses. For every a other than 0 (the additive inverse), there is a multiplicative inverse a', such that a*a' = a'*a = 1. The integers are not a field, although the rationals are. So are the integers modulo any prime number. Modulo 10, 5 does not have a multiplicative inverse: times anything, it's either 5 or 0. Modulo 9 (also not a prime), 3 does not have a multiplicative inverse. But modulo 11, everything has a multiplicative inverse. (This property comes up in the prime-testing algorithms that public-key cryptography has produced so much excitement in, but I'm drifting...) Anyway, it turns out that given a ring, you can define primes. And a ring, modulo a prime, produces a field. (There are actually come complications involved in prime vs. irreducible, but I'll ignore them - this theory is *not* designed to let you pass your exam tomorrow; it's the bare minimum motivation for what's to follow.) So, anyway, we now have the material for producing rings of "nice" sizes. 2 is prime, so we can make a field of size 2, but it's not big enough to do much with. And no other prime is a power of 2, as is suitable to work with on binary computers. Galois to the rescue! Given a ring, we can define polynomials over that ring. You know, like 3x^2 + 2x + 1. Basically, it's just a list of numbers in the ring (e.g. 1, 2, 3 for the above - equivalent to 1, 2, 3, 0, 0, 0,...). There are a few more things you can do if the numbers are from a field, and it turns out we'll be using a field, so I'll stop making the distinction. Now, we can multiply polynomials, and add them, and so on. Furthermore, we have an additive identity, additive inverses, a multiplicative identity, and so on. It turns out to be a ring. So, let's take as our basic field the integers modulo 2. I.e. 0 and 1. 0+0 = 1+1 = 0, 0+1 = 1+0 = 1, 0*0 = 0*1 = 1*0 = 0, 1*1 = 1. We can now form polynomials over the integers modulo 2, in which each power of x (like, say, x^12) is either present (coefficient of 1) or absent (coefficient of 0). These can be expressed quite nicely as bit strings. E.g. 10001000000100001 is the polynomial x^16+x^12+x^5+1. This polynomial can be factored as 1111000000011111 * 11, x^15+x^14+x^13+x^12+x^4+x^3+x^2+x+1 * x+1. Both of these are prime - they cannot be expressed as the product of smaller polynomials. Now, observe that adding polynomials involves adding corresponding coefficients, modulo 2. This is also known as the xor operation. Also observe that each polynomial is its own additive inverse, so subtracting is the same as adding. And multiplication is just shift-and-add, like usual, except it's shift-and-xor. Now, given primes, we can form the polynomials modulo a prime. Let's take 1011, x^3+x+1, a polynomial which happens to be prime. We can take any given polynomial (say, 11111, x^4+x^3+x^2+x+1), and subtract multiples of 1011 until it has degree less than 3. First, subtract x times 1011, i.e. 10 times 1011, or 10110: 11111 10110 ===== 1001 Then, subtract 1 times 1011: 1001 1011 ===== 0010 Thus, x^4+x^3+x^2+x+1 modulo x^3+x+1 is x. If you think for a bit, you'll realize that exactly the polynomials with degree less than the modulus are the possible remainders. Other than that, any coefficients are possible. So if the modulus is of degree n (highest coefficient is x^n), then there are 2^n possible remainders, corresponding to all possible combinations of x^0, x^1, ..., x^(n-1). These are written as all possible n-bit strings. Now, we're getting somewhere! We have numbers we can work with (it turns out that multiplicative inverses exist, if the modulus is irreducible, which is prime plus a little bit that I won't get into), and we have a nice equivalnece with bit-strings. So, let's take some degree-8 polynomial and use it as the modulus, so all possible bytes are our numbers. Now, consider what would happen if we were to store, after a bunch of bytes a0, a1, ... a(n-1), their sum s (using this field, which means we xor them). The total would be a0+...+a(n-1)+s = a0+...+a(n-1)+a0+...+a(n-1) = 0. This is basically parity. If an error hit one of the bytes, say an error e in position k, so the received data is b0 = a0, b1 = a1, ..., bk = ak+e, ... b(n-1) = a(n-1), bn = s. Then the sum wouldn't add up, and we'd get, instead of 0, e. Now, this is an error-detection code, but isn't much better than parity. It is, in fact, 8-way interleaved parity. But we can get better. Now, let's come up with a second sum character, t. t = a0*x^0 + a1*x^1 + a2*x^2 + ... + a(n-1)*x^(n-1). Now, if an error occurs, then summing t on the received side will give us, not the received t, but t+e*x^k. We cah subtract off the received t to get e*x^k. Now, we have e and e*x^k. Since e is not 0 (if it was, there'd be no error), it has a multiplicative inverse, e'. We can multiply this by e*x^k to get x^k. As long as n is low enough that each possible k has a different x^k (there are 255 such possibilities for an 8-bit code), then from this we can recover the position, k, of the error. Since we know the error pattern and the location, we can fix it. Bingo, error correction! (If the error is in the user data, then x^k is non-zero, so both e and e*x^k are non-zero. If one or the other is zero, then the error is in the coresponding check byte and can be ignored. If both are zero, there was no error.) This is the simplest single-error-correcting Reed-Solomon code. It turns out that by adding more check bytes, computed by u = a0*x^0+a1*x^2+a2*x^4+..., v = a0*x^0+a1*x^3+a2*x^6+..., etc. and solving more complex equations to find the solutions, we can determine the patterns and locations of multiple errors. For example, suppose we have 4 check bytes and 2 errors. e1 at location k1 and e2 at location k2. We can derive, from the check bytes, e1+e2, e1*x^k1+e2*x^k2, e1*x^(2*e1)+e2*x^(2*k2) and e1*x^(3*e1)+e2*x^(3*k2). From these four equations, we can derive the four unknowns e1, k1, e2, and k2 we need to fix the error. See various journal articles and textbooks for efficient ways to solve these systems of equations. With r check bytes, you can detect r errors, correct r/2 errors, or any combination. (Say, correct up to r/3 errors and detect up to 2r/3.) In fact, even with more errors, you'll *probably* detect them, but it's just not absolutely guaranteed. What CD's do is a bit more complicated... they have two levels of Reed-Solomon ECC. First, they scatter the data around on the disc so a burst error will take a bit out of a bunch of blocks rather than totally destroying a few blocks. Then, they arrange the block in a grid and apply error correction codes to the rows. If there are just one or two errors, they repair them, but if there are a bunch, they mark the whole row as bad, and don't try to correct it. Next, they apply error correction down the columns. Because any rows with errors are already marked bad, we don't have to derive the error locations k1, k2, k3, etc. and so with the number of check bytes we have (which determines the number of equations), we can have more errors e1, e2,... without having too many unknowns. (It's also easier to solve the equations for the errors than the locations, so this makes the implementation easier.) This results in really spectacular error-correction ability, over a large block, without having to deal with hundreds of equations in hundreds of unknowns, which would make CD players ridiculously expensive. (Remember, you can't slow down the music when an error occurs - that's just as bad as no correction at all.) That was a *really* quick summary, but the essentials are there. -- -Colin @//E*O*F rs.galois// chmod u=rw,g=,o= rs.galois echo x - rs.rply.1 sed 's/^@//' > "rs.rply.1" <<'@//E*O*F rs.rply.1//' @From: kosmach@mot.com To: tmoore%ewu@uunet.uu.net Subject: REED/SOLOMON Cc: kosmach@ukraine.noname If you have no experience with error correcting codes, then I think it may be a little difficult to start with RS codes. But anyway, RS codes are a general case of BCH codes (i.e., RS codes were discovered first and then they were generalized to BCH codes). So, any literature you find about nonbinary BCH codes can be applied to RS codes. A good starter book on coding theory is: ERROR CONTROL CODING: FUNDAMENTALS AND APPLICATIONS written by Shu Lin and Daniel J. Costello, Jr. In particular, Chapter 6, discusses the encoding and decoding of binary and nonbinary BCH codes along with RS codes. It should be very simple to write the encoder. The hard part is determining the type of decoder you want (i.e., do I want a very fast algorithm, am I worried about memory, how large are the codewords, how much parity do I need, and do I need to correct erasures). If you have any questions, let me know. Jim Kosmach Research Engineer Motorola Corporate Research Center kosmach@ukraine.corp.mot.com @//E*O*F rs.rply.1// chmod u=rw,g=,o= rs.rply.1 echo x - rs.rply.2 sed 's/^@//' > "rs.rply.2" <<'@//E*O*F rs.rply.2//' @From: rpw3@rigden.wpd.sgi.com (Rob Warnock) Subject: Re: REED/SOLOMON Rob Warnock, MS-1L/515 rpw3@sgi.com Silicon Graphics, Inc. (415)335-1673 2011 N. Shoreline Blvd. Mountain View, CA 94039-7311 ========================================================================= ============= attachment 1 of 2 ========================================= ========================================================================= @From: rpw3@rigden.wpd.sgi.com Subject: ECC bibliocraphy Organization: Silicon Graphics Inc., Mountain View, CA Here's my bibliography for ECC: - Lin & Costello, "Error Control Coding: Fundamentals and Applications" (Prentice-Hall) - Peterson & Weldon, "Error Correcting Codes" (MIT Press) - Blahut, "Theory and Practice of Error Control Codes" (Addison-Wesley) - Berlekamp, "Algebraic Coding Theory" (Aegean Park Press) But for beginners, good books are: - Hamming, "Coding and Information Theory" (Prentice-Hall) - Arazi, "A Commonsense Approach to the Theory of Error-Correcting Codes" (MIT Press) Hamming's little book is an easy start, but only goes through single-error correction. Lin & Costello is a pretty good cross-section, and has some good stuff about Reed-Solomon codes used in disks & tapes. For example, on pp.525-531 they discuss the particularly simple single-error correcting shortened (174,171) R-S code over GF(256) used in the IBM 3370 disk. [You can also use it in the unshortened (255,252) form.] This R-S code can easily be done in software, with small table lookups. [I posted some rough, untested pseudo-C for it in this newsgroup back in July...] ========================================================================= ============= attachment 2 of 2 ========================================= ========================================================================= @From: rpw3@rigden.wpd.sgi.com Subject: Re: Compression is great, but, robustness? Organization: Silicon Graphics Inc., Mountain View, CA madler@nntp-server.caltech.edu (Mark Adler) writes: +--------------- | Bill Davidsen notes: | >> You can, of course, put error detection and correction in the data, | >> using something simple like Hamming code, or larger and more complex | >> like Fire codes. +--------------- Fire codes are totally obsolete these days. Their miscorrection rates are quite high compared to the available alternatives. +--------------- | Better still are Reed-Solomon codes. The algebra is a little complicated, | but the run-time of a good implementation is pretty fast. They can correct | rather long error bursts and are naturals for computer data streams--you | can use eight bit symbols in the algebra which correspond to the bytes in | the data stream. +--------------- Particularly simple is the shortened (174,171) R-S code over GF(256) used in the IBM 3370 disk. [You can also use it in the unshortened (255,252) form.] Each of the three checksum bytes is computed independently, rather than as the remainder of one large polynomial, computed (roughly) as: for (s0 = s1 = sm1 = i = 0; i < BLOCK_SIZE; ++i) { s0 = s0 ^ input[i]; s1 = GF_mult_by_alpha[ s1 ^ input[i] ]; sm1 = GF_mult_by_alpha_inverse[sm1 ^ input[i] ]; } where "GF_mult_by_alpha" is a lookup table to multiply an element of GF(256) by the constant "alpha", the chosen primitive element. The two tables are 256 bytes each. [These checksums can also be computed in hardware at the speed of one clock per byte, with a simple PAL per checksum byte (3).] Correction is just as simple, but needs more memory to store the division table (since both operands are variables). You start by computing the "syndrome" bytes the same was as the checksum bytes, but this time the transmitted checksum bytes are folded in too. then, assuming just one byte in error, the location of the byte in error is given by: err_i = GF_log_base_alpha[ GF_divide[s1][s0] ]; where "GF_log_base_alpha" is another 256-byte table and "GF_divide" is a 65,536-byte table. Given the location "err_i" of the error, correction is then simply a matter of: input[err_i] ^= s0; (The "sm1" term is actually redundant, but can be used to help protect against miscorrections that can occur when there are multiple errors. It's cheap insurance.) Although this code corrects only one symbol (byte) in error, interleaving multiple blocks can improve the burst performance almost as well as one likes. [The IBM 3370 uses 3:1 interleaving to correct any burst confined to three consecutive bytes.] Still, it can only correct one burst. But the cost is really low... +--------------- | The guys from JPL would tell you this for no beers at all! +--------------- Reference: Lin & Costello, "Error Control Coding: Fundamentals & Applications", (Prentice-Hall 1983), pp.525-531. @//E*O*F rs.rply.2// chmod u=rw,g=,o= rs.rply.2 echo x - rs.rply.3 sed 's/^@//' > "rs.rply.3" <<'@//E*O*F rs.rply.3//' @From: "samsung@tigger.jvnc.net -- Advanced Media Laboratory of Samsung Electronics 1009 Lenox Dr. Lawrenceville" Subject: Reed Solomon Code Here is some code that came down the net some time ago. I works really well. I have put this code into a Khoros routine. Steve Jaffe samsung@tigger.jvnc.net Received: from munnari.OZ.AU by tigger.jvnc.net (5.65/1.34) id AA17750; Wed, 26 Jun 91 02:11:52 -0400 Received: from augean.ua.oz.au by munnari.oz.au with SMTP (5.64+1.3.1+0.50) id AA22760; Wed, 26 Jun 1991 16:11:30 +1000 (from simon@augean.ua.oz.au) Date: Wed, 26 Jun 1991 16:11:30 +1000 @From: simon@augean.ua.oz.au Message-Id: <9106260611.22760@munnari.oz.au> Received: by augean (5.61+IDA+MU/4.8N) id AA27938; Wed, 26 Jun 1991 15:32:11 +0930 To: ASTERIAN@BNR.CA, Tom.Lyon@Eng.Sun.COM, brianc@ecn.purdue.edu, curiger@iis.ethz.ch, danbl@arcturis.cna.tek.com, dlee@hplabsb.hpl.hp.com, eho@ee.ubc.ca, gd@erg.sri.com, hptasins@sdcc6.UCSD.EDU, jpc@avdms8.msfc.nasa.gov, jsun@sunpix.East.Sun.COM, jw7348@mexico.medtronic.com, kahrs@alice.att.com, lovell@s1.elec.uq.oz.au, magore@watserv1.uwaterloo.ca, nieto@trantor.harris-atd.com, plx!plxsun!ming@Sun.COM, probert%cs@hub.ucsb.edu, rand@zeta.sps.mot.com, rchann@cebnet.CEB.McMaster.CA, rjk@sequent.com, robert@wiliki.eng.hawaii.edu, ron@chopin.udel.edu, ron@monu6.cc.monash.edu.au, rtu@bsac.berkeley.edu, samsung@tigger.jvnc.net, sdw@hpsadsdw.hp.com, srm@matrx.matrix.com, tmb@ai.mit.edu, tony@comperex.cx.oz.au, vchin@kean.ucs.mun.ca Subject: Re: Reed-Solomon code To all those people who requested the Reed-Solomon encoder/decoder program, here it is. I had to dig it out of my archive, and having posted that I had it without checking it was all okay, discovered to my chagrin that it dumped core when run after being compiled using cc -O on our new Sparc 2. (Okay on cc with no optimize, and also on gcc with or without optimize, and ran heaps of times on our old Dec and Sun boxes - funny how these bugs can hide for years!). So after a night of reviewing my coding theory so I could follow the program (it's been a while...), I found and fixed the bug (you guessed it, reading one past the end of an array to give a compiler dependent segmentation violation!). At the same time I put in a couple more comments so it hopefully won't be too hard to follow with the aid of a reference text such as those mentioned in the comments (can't recall the titles off-hand, just the authors, but they are standard texts). Anyhow, do what you will with it. It may or may not suit your needs, but its one way of doing the job (I have no idea of it's speed compared to other techniques or implementations). Hopefully it may save someone some work! Simon ----------------------- Cut here - code follows ------------------------------ /* rs.c */ /* This program is an encoder/decoder for Reed-Solomon codes. Encoding is in systematic form, decoding via the Berlekamp iterative algorithm. In the present form , the constants mm, nn, tt, and kk=nn-2tt must be specified (the double letters are used simply to avoid clashes with other n,k,t used in other programs into which this was incorporated!) Also, the irreducible polynomial used to generate GF(2**mm) must also be entered -- these can be found in Lin and Costello, and also Clark and Cain. The representation of the elements of GF(2**m) is either in index form, where the number is the power of the primitive element alpha, which is convenient for multiplication (add the powers modulo 2**m-1) or in polynomial form, where the bits represent the coefficients of the polynomial representation of the number, which is the most convenient form for addition. The two forms are swapped between via lookup tables. This leads to fairly messy looking expressions, but unfortunately, there is no easy alternative when working with Galois arithmetic. The code is not written in the most elegant way, but to the best of my knowledge, (no absolute guarantees!), it works. However, when including it into a simulation program, you may want to do some conversion of global variables (used here because I am lazy!) to local variables where appropriate, and passing parameters (eg array addresses) to the functions may be a sensible move to reduce the number of global variables and thus decrease the chance of a bug being introduced. This program does not handle erasures at present, but should not be hard to adapt to do this, as it is just an adjustment to the Berlekamp-Massey algorithm. It also does not attempt to decode past the BCH bound -- see Blahut "Theory and practice of error control codes" for how to do this. Simon Rockliff, University of Adelaide 21/9/89 26/6/91 Slight modifications to remove a compiler dependent bug which hadn't previously surfaced. A few extra comments added for clarity. Appears to all work fine, ready for posting to net! Notice -------- This program may be freely modified and/or given to whoever wants it. A condition of such distribution is that the author's contribution be acknowledged by his name being left in the comments heading the program, however no responsibility is accepted for any financial or other loss which may result from some unforseen errors or malfunctioning of the program during use. Simon Rockliff, 26th June 1991 */ #include #include #define mm 4 /* RS code over GF(2**4) - change to suit */ #define nn 15 /* nn=2**mm -1 length of codeword */ #define tt 3 /* number of errors that can be corrected */ #define kk 9 /* kk = nn-2*tt */ int pp [mm+1] = { 1, 1, 0, 0, 1} ; /* specify irreducible polynomial coeffts */ int alpha_to [nn+1], index_of [nn+1], gg [nn-kk+1] ; int recd [nn], data [kk], bb [nn-kk] ; void generate_gf() /* generate GF(2**mm) from the irreducible polynomial p(X) in pp[0]..pp[mm] lookup tables: index->polynomial form alpha_to[] contains j=alpha**i; polynomial form -> index form index_of[j=alpha**i] = i alpha=2 is the primitive element of GF(2**mm) */ { register int i, mask ; mask = 1 ; alpha_to[mm] = 0 ; for (i=0; i>= 1 ; for (i=mm+1; i= mask) alpha_to[i] = alpha_to[mm] ^ ((alpha_to[i-1]^mask)<<1) ; else alpha_to[i] = alpha_to[i-1]<<1 ; index_of[alpha_to[i]] = i ; } index_of[0] = -1 ; } void gen_poly() /* Obtain the generator polynomial of the tt-error correcting, length nn=(2**mm -1) Reed Solomon code from the product of (X+alpha**i), i=1..2*tt */ { register int i,j ; gg[0] = 2 ; /* primitive element alpha = 2 for GF(2**mm) */ gg[1] = 1 ; /* g(x) = (X+alpha) initially */ for (i=2; i<=nn-kk; i++) { gg[i] = 1 ; for (j=i-1; j>0; j--) if (gg[j] != 0) gg[j] = gg[j-1]^ alpha_to[(index_of[gg[j]]+i)%nn] ; else gg[j] = gg[j-1] ; gg[0] = alpha_to[(index_of[gg[0]]+i)%nn] ; /* gg[0] can never be zero */ } /* convert gg[] to index form for quicker encoding */ for (i=0; i<=nn-kk; i++) gg[i] = index_of[gg[i]] ; } void encode_rs() /* take the string of symbols in data[i], i=0..(k-1) and encode systematically to produce 2*tt parity symbols in bb[0]..bb[2*tt-1] data[] is input and bb[] is output in polynomial form. Encoding is done by using a feedback shift register with appropriate connections specified by the elements of gg[], which was generated above. Codeword is c(X) = data(X)*X**(nn-kk)+ b(X) */ { register int i,j ; int feedback ; for (i=0; i=0; i--) { feedback = index_of[data[i]^bb[nn-kk-1]] ; if (feedback != -1) { for (j=nn-kk-1; j>0; j--) if (gg[j] != -1) bb[j] = bb[j-1]^alpha_to[(gg[j]+feedback)%nn] ; else bb[j] = bb[j-1] ; bb[0] = alpha_to[(gg[0]+feedback)%nn] ; } else { for (j=nn-kk-1; j>0; j--) bb[j] = bb[j-1] ; bb[0] = 0 ; } ; } ; } ; void decode_rs() /* assume we have received bits grouped into mm-bit symbols in recd[i], i=0..(nn-1), and recd[i] is index form (ie as powers of alpha). We first compute the 2*tt syndromes by substituting alpha**i into rec(X) and evaluating, storing the syndromes in s[i], i=1..2tt (leave s[0] zero) . Then we use the Berlekamp iteration to find the error location polynomial elp[i]. If the degree of the elp is >tt, we cannot correct all the errors and hence just put out the information symbols uncorrected. If the degree of elp is <=tt, we substitute alpha**i , i=1..n into the elp to get the roots, hence the inverse roots, the error location numbers. If the number of errors located does not equal the degree of the elp, we have more than tt errors and cannot correct them. Otherwise, we then solve for the error value at the error location and correct the error. The procedure is that found in Lin and Costello. For the cases where the number of errors is known to be too large to correct, the information symbols as received are output (the advantage of systematic encoding is that hopefully some of the information symbols will be okay and that if we are in luck, the errors are in the parity part of the transmitted codeword). Of course, these insoluble cases can be returned as error flags to the calling routine if desired. */ { register int i,j,u,q ; int elp[nn-kk+2][nn-kk], d[nn-kk+2], l[nn-kk+2], u_lu[nn-kk+2], s[nn-kk+1] ; int count=0, syn_error=0, root[tt], loc[tt], z[tt+1], err[nn], reg[tt+1] ; /* first form the syndromes */ for (i=1; i<=nn-kk; i++) { s[i] = 0 ; for (j=0; j error */ s[i] = index_of[s[i]] ; } ; if (syn_error) /* if errors, try and correct */ { /* compute the error location polynomial via the Berlekamp iterative algorithm, following the terminology of Lin and Costello : d[u] is the 'mu'th discrepancy, where u='mu'+1 and 'mu' (the Greek letter!) is the step number ranging from -1 to 2*tt (see L&C), l[u] is the degree of the elp at that step, and u_l[u] is the difference between the step number and the degree of the elp. */ /* initialise table entries */ d[0] = 0 ; /* index form */ d[1] = s[1] ; /* index form */ elp[0][0] = 0 ; /* index form */ elp[1][0] = 1 ; /* polynomial form */ for (i=1; i0)) q-- ; /* have found first non-zero d[q] */ if (q>0) { j=q ; do { j-- ; if ((d[j]!=-1) && (u_lu[q]0) ; } ; /* have now found q such that d[u]!=0 and u_lu[q] is maximum */ /* store degree of new elp polynomial */ if (l[u]>l[q]+u-q) l[u+1] = l[u] ; else l[u+1] = l[q]+u-q ; /* form new elp(x) */ for (i=0; i >tt errors and cannot solve */ for (i=0; itt hence cannot solve */ for (i=0; i no errors: output received codeword */ for (i=0; i References: <3b7usa$sen@pita.cs.huji.ac.il> NNTP-Posting-Host: callisto.nt.tuwien.ac.at In article <3b7usa$sen@pita.cs.huji.ac.il> writes: > Hello all, > > I'm writing (or at least trying to write) a new compression > algorithm/program, and like many others, I'm using hash tables > to access strings from the input. > I've experimented a few days, but wasn't able to find anything > satisfactory, so instead of re-inventing the wheel, I ask for > your kind suggestions: What hash funcition is well suited for > this (i.e from strings of varied length to integers). > If ineterst is shown, I'll post a summary of replies. > > Thanx, > > --Eitan. Take a look at the article " Fast Hashing of Variable Length Text Strings " of Peter K. Pearson in Communications of the ACM , June 1990 volume 33 number 6. AFAIN, it is the most proper thing for you. I am writing some compression programs too, so hoping we can exchange some ideas ? All the bests, Khanh Nguyen Vienna University of Technology From cwrea@tuzo.erin Fri Dec 2 21:36:38 1994 Newsgroups: comp.compression Path: solitaire.cv.nrao.edu!hearst.acc.Virginia.EDU!caen!newsxfer.itd.umich.edu!gatech!swrinde!cs.utexas.edu!utnut!erin.utoronto.ca!tuzo.erin!cwrea From: cwrea@tuzo.erin (Chris W. Rea [UL]) Subject: Re: Hash function for compression algorithms Message-ID: Sender: news@credit.erin.utoronto.ca (Usenet News) Nntp-Posting-Host: tuzo.erin Reply-To: cwrea@tuzo.erin.utoronto.ca Organization: Erindale College, University of Toronto, Canada References: <3b7usa$sen@pita.cs.huji.ac.il> Date: Tue, 29 Nov 1994 22:54:09 GMT Lines: 53 In article <3b7usa$sen@pita.cs.huji.ac.il>, Eitan Frachtenberg wrote: > >I'm writing (or at least trying to write) a new compression >algorithm/program, and like many others, I'm using hash tables >to access strings from the input. >I've experimented a few days, but wasn't able to find anything >satisfactory, so instead of re-inventing the wheel, I ask for >your kind suggestions: What hash funcition is well suited for >this (i.e from strings of varied length to integers). >If ineterst is shown, I'll post a summary of replies. This is one I use for a project I'm currently working on: /* HashWord takes W, the ASCII string we want to hash ** S, the length of string W (ie: strlen(W)) ** R, choose the next R'th value to resolve a collision */ ULONG HashWord(UBYTE *W, UBYTE S, UWORD R) { ULONG HashValue=0; UWORD Seed[3]={0,0,0}, i=0; Seed[0]=R+119; Seed[1]=R*727+R*R; Seed[2]=5033-R*R*R; seed48(Seed); for(i=0; i References: NNTP-Posting-Host: nmerha3f.bnr.ca Keywords: JPEG, Tiff Originator: jbotte@nmerha3f Xref: solitaire.cv.nrao.edu comp.compression:14966 alt.binaries.pictures:21113 In article , swlau@cs.cuhk.hk (Lau Siu Wah) writes: |> Does someone know if there is a program to convert compressed image |> in TIFF format to JPEG format? |> |> Clarence. The best PC based system for doing this sort of thing is a program called Graphics Workshop from a company called Alchemy Mindworks. It is shareware and is available on many FidoNet type systems. I do not know if there is an FTP site for it. I get asked this a lot... is there any FTP site that would be willing to carry this package? I would be willing to "upload" it regularly. In addition to converting >from one graphics format to another, the program also allows you to view, crop, scale, dither, and a bunch of other image processing operations. It is free to try out, but you would eventually have to pay for it (there are worldwide "distributors"). If someone volunteers an FTP site, perhaps this could become a FAQ? Jim #include // My opinions are my own, // but you have have them if you like. From ts@uwasa.fi Sun Dec 4 22:54:21 1994 Path: solitaire.cv.nrao.edu!hearst.acc.Virginia.EDU!caen!newsxfer.itd.umich.edu!gatech!swrinde!pipex!sunic!news.funet.fi!zippo.uwasa.fi!uwasa.fi!ts From: ts@uwasa.fi (Timo Salmi) Newsgroups: comp.compression Subject: Re: LHA Latest version Date: 2 Dec 1994 03:21:57 GMT Organization: University of Vaasa Lines: 29 Message-ID: <3bm3sl$d4p@zippo.uwasa.fi> References: <3bl422$6hm@gold.interlog.com> NNTP-Posting-Host: uwasa.fi In article <3bl422$6hm@gold.interlog.com> richard@panchax.gryphon.com writes: :What and where is the latest lha. Trying to fine the FAQ on rtfm but it :was to busy to accept anonymous connections. 44416 Aug 1 1991 garbo.uwasa.fi:/pc/arcers/lha213.exe lha213.exe LHa compression for .lzh files, tight, in English 70656 Nov 24 1992 garbo.uwasa.fi:/pc/arcers/lha255b.exe lha255b.exe LHa compression for .lzh files, Kanji docs, development vers 2849 Apr 20 1993 garbo.uwasa.fi:/pc/arcers/lha255b.txt lha255b.txt LHA 2.55B READ.ME translation by Masaoki Kobayashi 96999 Jan 11 1993 garbo.uwasa.fi:/unix/arcers/lha101u.tar.Z lha101u.tar.Z Lha for .lzh compression on Unix 90174 Nov 9 11:03 garbo.uwasa.fi:/pc/doc-net/faqc9411.zip faqc9411.zip Frequently Asked Questions of comp.compression, J.Gailly 98684 Dec 1 10:20 garbo.uwasa.fi:/pc/INDEX.ZIP INDEX.ZIP The annotated list of garbo.uwasa.fi MsDos files, zipped All the best, Timo .................................................................. Prof. Timo Salmi Co-moderator of comp.archives.msdos.announce Moderating at garbo.uwasa.fi anonymous FTP archives 193.166.120.5 Faculty of Accounting & Industrial Management; University of Vaasa Internet: ts@uwasa.fi BBS +(358)-61-3170972; FIN-65101, Finland From ts@uwasa.fi Sun Dec 4 22:54:30 1994 Path: solitaire.cv.nrao.edu!hearst.acc.Virginia.EDU!caen!hookup!swrinde!pipex!sunic!news.funet.fi!zippo.uwasa.fi!uwasa.fi!ts From: ts@uwasa.fi (Timo Salmi) Newsgroups: comp.compression Subject: Re: GZIP Date: 2 Dec 1994 11:34:00 GMT Organization: University of Vaasa Lines: 20 Message-ID: <3bn0n8$mif@zippo.uwasa.fi> References: <3bm2os$csp@freenet.unbc.edu> NNTP-Posting-Host: uwasa.fi In article <3bm2os$csp@freenet.unbc.edu> ad467@freenet.unbc.edu (Claude Pitre) writes: :I'm looking for GZIP for DOS. Can someone either tell me where I can get :it by ftp or email it to me uuencoded. 117185 Aug 19 10:25 garbo.uwasa.fi:/pc/unix/gzip124.zip gzip124.zip GNU zip for .(g)z files, don't confuse with (pk)zip 98684 Dec 1 10:20 garbo.uwasa.fi:/pc/INDEX.ZIP INDEX.ZIP The annotated list of garbo.uwasa.fi MsDos files, zipped 90174 Nov 9 11:03 garbo.uwasa.fi:/pc/doc-net/faqc9411.zip faqc9411.zip Frequently Asked Questions of comp.compression, J.Gailly All the best, Timo .................................................................. Prof. Timo Salmi Co-moderator of comp.archives.msdos.announce Moderating at garbo.uwasa.fi anonymous FTP archives 193.166.120.5 Faculty of Accounting & Industrial Management; University of Vaasa Internet: ts@uwasa.fi BBS +(358)-61-3170972; FIN-65101, Finland From pecampbe@mtu.edu Sun Dec 4 22:55:35 1994 Path: solitaire.cv.nrao.edu!hearst.acc.Virginia.EDU!caen!sol.ctr.columbia.edu!news.mtu.edu!news.mtu.edu!not-for-mail From: pecampbe@mtu.edu (Paul E. Campbell) Newsgroups: comp.compression,comp.programming Subject: Re: File Structure for Disk Doubler Followup-To: comp.compression,comp.programming Date: 2 Dec 1994 17:04:38 -0500 Organization: Michigan Technological University Lines: 62 Message-ID: <3bo5lm$97r@metlab4.my> References: NNTP-Posting-Host: metlab4.my.mtu.edu X-Newsreader: TIN [version 1.2 PL1] Xref: solitaire.cv.nrao.edu comp.compression:15080 comp.programming:12910 Art Pollard (pollarda@uhunix3.uhcc.Hawaii.Edu) wrote: : I am familier with the Log Structured File System (Sprite) which has been : used in one study (by DEC) to create a compressed file system. However, : the resources for this method are too great for a mere PC. There are : several other approaches simular to this with which I am familier : however, they all have memory requirements which are too great for PCs or : would fragment too quickly to support the high sequential access : across blocks needed while reading a file efficiently. This is what I worked out when I had the time and desire to build a log structured file system (I no longer have the time)... First, you need to do a fast compression routine. Second, on average, most systems use say 8-16K block sizes for good compression. 4K is really too small. Third, the way the LFS structures work is pretty good assuming you have that 1 megabyte free to use for those purposes. So this is my proposed alternative: First, the segment idea is sound. Instead of keeping it in memory, do this: Continuously ADD to the current segment as you get stuff to write to it. This may entail more writing than the other methods (oh well...). Trust your system at least enough to do some caching. On the first pass, don't worry about following a block size requirement religiously. Store each new chunk of data sequentially in the log. Store EVERYTHING in 8-16K chunks. Pack in the data as close to your arbitrary size as possible. Put a CRC on each chunk. That way, when you have to restart (someone shuts off the machine), just start reading in the last segment you marked as in use (in your "superblock" equivalent). Keep reading blocks until you fail to get a good CRC. Even put file pointers in these "chunks". Pointers to "sectors" or other arbitrary pieces of files look like this: ,, That way, you can find the right block to read with: Offset=*segment size+Block start When writing, the rule for whether to write to the current block or start a new one is simple: Known: The size of the block so far The desired block size The size of the data to write out (no compression) If (Desired block size-Current size)>(size needed), append it Then when you finish your write, stick on the extra bits for "end of block" and CRC. When you write again, you start on the current sector where the "end of block" section is at, but overwrite that part (you end up rewriting a few bytes but that doesn't matter too much). Caching helps reduce the number of writes and rewrites you'd normally end up doing. As far as DOS's usual "FAT's" or the equivalent structures in Unix systems, you don't use them. You supply both OS's with as much of a legitimate view of the system (given their myopic view of the world) as possible and handle everything else yourself. The advantage of this system is that you get compression, the advantages of a log structured system, and low memory overhead. From marknel@utdallas.edu Sun Dec 4 22:55:50 1994 Path: solitaire.cv.nrao.edu!hearst.acc.Virginia.EDU!darwin.sura.net!haven.umd.edu!news.umbc.edu!europa.eng.gtefsd.com!howland.reston.ans.net!torn!nott!bcarh189.bnr.ca!bcarh8ac.bnr.ca!corpgate!news.utdallas.edu!marknel From: marknel@utdallas.edu (Mark R Nelson) Newsgroups: comp.compression Subject: Re: Any Recommended Books? Date: 4 Dec 1994 03:41:20 GMT Organization: The University of Texas at Dallas Lines: 20 Message-ID: <3brdp0$gv4@news.utdallas.edu> References: NNTP-Posting-Host: csclass.utdallas.edu NNTP-Posting-User: marknel X-Newsreader: TIN [version 1.2 PL2] Keith Williams (csi083@coventry.ac.uk) wrote: : Someone mentioned a book called "The Data Compression Book" by Nelson.. : on this usegroup earlier... I'm having problems tracking it down and : would appreciate a few more details like publisher etc... _The Data Compression Book_ is published by M&T Books. ISBN 1-55851-216-0. M&T is now owned by Henry Holt, and they don't seem to do any direct sales, so you have to get it through a bookstore. Sorry, but I don't know anything about who carries it in the UK. If you are at the end of your rope and can't find it I can supply mail order/phone dealer numbers from the US. You should also look for _Text Compression_ by Bell, Cleary, and Witten >from Prentice Hall, ISBN 0-13-911991-4. The best reference to references is the comp.compression FAQ, available in the normal places, such as rtfm.mit.edu. Mark Nelson marknel@utdallas.edu From ivanh@iconz.co.nz Fri Dec 9 21:14:58 1994 Path: solitaire.cv.nrao.edu!hearst.acc.Virginia.EDU!darwin.sura.net!tulane!ames!waikato!status.gen.nz!iconz.co.nz!ivanh From: ivanh@iconz.co.nz (Ivan) Newsgroups: comp.compression Subject: Re: Comparison of image compressions Date: Fri, 9 Dec 1994 06:13:35 UNLISTED Organization: Desktop Publishing Services Lines: 37 Message-ID: References: <3c764i$ice@nms.telepost.no> NNTP-Posting-Host: ivanh.iconz.co.nz X-Newsreader: Trumpet for Windows [Version 1.0 Rev B final beta #4] In article <3c764i$ice@nms.telepost.no> hans2@spacetec.no (Hans Sandsdalen) writes: >I am lokking for an article which compares different image compression >techniques, still images. This article may answer your questions. Do not know about Dr. Dobb though apart from that his journels are available on oour local BBS frequently. Path: status.gen.nz!waikato!wupost!simtel.coast.net!msdos-ann-request From: w8sdz@SimTel.Coast.NET (Keith Petersen) Newsgroups: comp.archives.msdos.announce Subject: Info on ARC, ARJ, LZH, ZIP, ZOO, LBR, Compressed & Squeezed files Message-ID: <9410191526.kp18796@SimTel.Coast.NET> Date: Wed, 19 Oct 1994 15:26:37 GMT Expires: 2 Nov 94 20:00:00 GMT Followup-To: comp.archives.msdos.d Sender: msdos-ann-request@simtel.coast.net Organization: SimTel, the Coast to Coast Software Repository (tm) Approved: w8sdz@SimTel.Coast.NET Lines: 188 [File: /SimTel/msdos/starter/00-files.doc Revised: Sept. 15, 1994] All about ARC, ARJ, LZH, ZIP, ZOO, LBR, Compressed and Squeezed files Some of the files available from SimTel, the Coast to Coast Software Repository (tm), have been transformed by using one of the standard freely-distributable utilities that either SQueezes, LiBRaries, ARChives, ARJs, LZHs, ZIPs, or ZOOs files. Some files in other collections have been compressed. ____________________________ Ivan Hyslop Desktop Publishing Services. Howick, Auckland, New Zealand. E-MAIL: ivanh@iconz.co.nz From marknel@utdallas.edu Fri Dec 9 22:30:26 1994 Path: solitaire.cv.nrao.edu!hearst.acc.Virginia.EDU!darwin.sura.net!gatech!howland.reston.ans.net!pipex!bnr.co.uk!bcarh8ac.bnr.ca!corpgate!news.utdallas.edu!marknel From: marknel@utdallas.edu (Mark R Nelson) Newsgroups: comp.compression Subject: Re: PKZIP Runtime Library? Date: 9 Dec 1994 21:54:57 GMT Organization: The University of Texas at Dallas Lines: 20 Distribution: world Message-ID: <3cajnh$83f@news.utdallas.edu> References: <1994Nov24.092605.527@pat.uwe.ac.uk> <786013380snz@jaburt.demon.co.uk> <3bj67l$ova$1@rosebud.ncd.com> NNTP-Posting-Host: csclass.utdallas.edu NNTP-Posting-User: marknel X-Newsreader: TIN [version 1.2 PL2] : Does anybody know of an archiving package or routines for un-archiving useable : from a Windows C program? Whether it's C, Pascal, object, library or .DLL is : no problem. It's also unimportant that the file format be .ZIP or any other : particular format. If you do, please drop me a line ASAP. Thanks!!! Greenleaf Software makes a product called ArchiveLib. It is a C++ class library, but also has a complete interface for C programmers. It has both 16 bit and 32 bit Windows DLLs, along with VB, C, C++, and TPW example programs. If you want more information, you can call Greenleaf Technical Sales at 800-523-9830 or 214-248-2561. Fax 214-248-7830. There is also a small amount of benchmark and demo information in the PCVENB area of CompuServe, but nothing available via ftp at this time. Mark Nelson Greenleaf Software marknel@utdallas.edu From marmisa@tid.tid.es Wed Dec 14 08:54:47 1994 Path: solitaire.cv.nrao.edu!hearst.acc.Virginia.EDU!caen!usenet.cis.ufl.edu!usenet.eel.ufl.edu!news.mathworks.com!europa.eng.gtefsd.com!newsxfer.itd.umich.edu!gatech!howland.reston.ans.net!news.sprintlink.net!EU.net!goya!tid.tid.es!tid.tid.es!not-for-mail From: marmisa@tid.tid.es (Luis Jose Marmisa Gazo) Newsgroups: comp.compression Subject: Re: Improvement of LZW compression. Date: 13 Dec 1994 20:54:34 +0100 Organization: Telefonica I+D Lines: 42 Message-ID: <3cku5q$p8f@tid.tid.es> References: <1994Dec9.143135.351@ign.fr> NNTP-Posting-Host: tid.tid.es X-Newsreader: TIN [version 1.2 PL2] Corine Plazanet (plazanet@sturm.ign.fr) wrote: : I've coded LZW compression algorithm for images, text, fax, exe. I would you to decrease the time processing. : Does anybody know about algorithms to improve LZW or where I can find any? : I heard about H-coding, what about you? : Thanks, : Corinne PLAZANET : IGN. Hi Corinne: I am not an expert in the compression field, but I can suggest to you two fast algorithms: 1) V.42bis from ITU. This is the standard for data compression in modems. An 8 bits microprocessor (Z80) is able to compress / uncompress a bidirectional 14.400 bps data link. I believe that V.42bis is a LZW (LZ78). 2) I had some experience with lzrw3-a from Ross Williams. This is the fastest algorithm in the comparison table of the compression FAQ. However, it is clear that this algoritm is not the best compressing. 'C' sources are available by ftp in: - ftp.adelaide.edu.au:/pub/compression/lzrw3-a.c [129.127.40.3] lzrw3-a is not a LZW algorithm (I think), it's a LZ77. I tested this algorithm on a 16 bits MC68000 microprocessor both for increase the data transfer speed (19200 bps) and for file compression. I got good results. I hope that this information could be useful to you. Best regards, Luis Marmisa (marmisa@tid.es) P.S.: There are some problems with your e-mail address. I can not send you any message. From nevries@aip.nl Fri Dec 16 01:25:08 1994 Path: solitaire.cv.nrao.edu!hearst.acc.Virginia.EDU!portal.gmu.edu!europa.eng.gtefsd.com!howland.reston.ans.net!EU.net!sun4nl!aipnl!nevries From: nevries@aip.nl (Nico de Vries [AIP-NL]) Newsgroups: comp.compression Subject: Re: Improvement of LZW compression. Distribution: world Message-ID: <787391473snx@aip.nl> References: <1994Dec9.143135.351@ign.fr> Date: Wed, 14 Dec 94 07:51:13 GMT Organization: AIP-NL (Ad Infinitum Programs) Lines: 24 plazanet@sturm.ign.fr writes in article <1994Dec9.143135.351@ign.fr>: > > I've coded LZW compression algorithm for images, text, fax, exe. I > would you to decrease the time processing. > Does anybody know about algorithms to improve LZW or where I can find any? > I heard about H-coding, what about you? FTP garbo.uwasa.fi:/pc/programming/lds_11.zip contains 2 sources which are faster than LZW at compression: Finish and LZRW. It also contains the source AR002 which is probably faster at decompressing that your LZW implementation, but slower at compressing. OTOH AR002 has a much better compression ratio than LZW. > Thanks, > > > Corinne PLAZANET > IGN. Take care, Nico de Vries. -- nevries@aip.nl [AIP-NL, UltraCompressor II development] From hle1@delphi.com Mon Dec 26 07:58:03 1994 Path: solitaire.cv.nrao.edu!hearst.acc.Virginia.EDU!caen!math.ohio-state.edu!howland.reston.ans.net!news2.near.net!news.delphi.com!usenet From: hle1@delphi.com Newsgroups: comp.compression Subject: Re: max. huffman code length? Date: Sun, 25 DEC 94 21:40:18 -0500 Organization: Delphi (info@delphi.com email, 800-695-4005 voice) Lines: 17 Message-ID: <5Y7U3Bi.hle1@delphi.com> References: <3da2u2$1ahr@rs18.hrz.th-darmstadt.de> NNTP-Posting-Host: bos1e.delphi.com X-To: Ulrich Graef Ulrich Graef writes: >> is there any way to calculate the maximal codelength one will get with >> dynamic huffman? (2^16 characters of 8 bit) > >Which data and which dynamic Huffman method (See my last post!)? > >If you use the first bit to decide whether to code or to copy the data, >you can limit the size of the code to > > 2^16 * 8 + 1 The Huffman method is somewhat related to the minimum description length (MDL) and minimum message length (MML) methods which is why I am joining in. I have been searching for specific detailed examples of how to code these methods in a spreadsheet. Thanks in advance for any advice you may have. Harry Edwards hle1@delphi.com From graef@iti.informatik.th-darmstadt.de Wed Dec 28 00:35:46 1994 Path: solitaire.cv.nrao.edu!hearst.acc.Virginia.EDU!caen!zip.eecs.umich.edu!newsxfer.itd.umich.edu!gatech!swrinde!pipex!uunet!zib-berlin.de!news.th-darmstadt.de!iti.informatik.th-darmstadt.de!graef From: graef@iti.informatik.th-darmstadt.de (Ulrich Graef) Newsgroups: comp.compression Subject: Re: max. huffman code length? Date: 27 Dec 1994 17:28:42 GMT Organization: TH Darmstadt, FG Systemprogrammierung Lines: 53 Distribution: world Message-ID: <3dpisa$eqp@rs18.hrz.th-darmstadt.de> References: <3da2u2$1ahr@rs18.hrz.th-darmstadt.de> <5Y7U3Bi.hle1@delphi.com> NNTP-Posting-Host: spelunke.iti.informatik.th-darmstadt.de In article <5Y7U3Bi.hle1@delphi.com>, hle1@delphi.com writes: > Ulrich Graef writes: > > >> is there any way to calculate the maximal codelength one will get with > >> dynamic huffman? (2^16 characters of 8 bit) > > > >Which data and which dynamic Huffman method (See my last post!)? > > > >If you use the first bit to decide whether to code or to copy the data, > >you can limit the size of the code to > > > > 2^16 * 8 + 1 > > The Huffman method is somewhat related to the minimum description length > (MDL) and minimum message length (MML) methods which is why I am joining in. > I have been searching for specific detailed examples of how to code > these methods in a spreadsheet. Thanks in advance for any advice you > may have. Harry Edwards hle1@delphi.com I think, the methods to generate Huffmann code with dynamic adaption to the message source are not easy to program in a spreadsheet. This is because you have to build a Huffman tree to be able to generate your binary Huffman prefix-code _and_ you must update the tree in periodic intervals to adapt (dynamically) to the probabilies in your message source. This is hard to program with good performance, even if you have a programming language like C or Pascal. I think the most spreadsheets do not contain features which you need including: - tree representation and manipulation. - iterative processing of data (to perform the coding process and the dynamic model update) Does your spreadsheet allow this (without heavy add-ins)? If yes, which one do you use? If you want to use the spreadsheet anyway, you should consider arithmetic coding. Especially when you only want to compute the minimal representation length it would be very helpful. But the need to use iterative processing of data does not disappear. Greetings, Uli -- Ulrich Graef | home: 06155 62493 int: +49 6155 62493 Lichtenbergweg 11 | office: 06103 752 364 int: +49 6103 752 359 D-64347 Griesheim +------------------------------------------------- Germany | e-mail: graef@iti.informatik.th-darmstadt.de From roe2@ellis.uchicago.edu Wed Dec 28 10:16:09 1994 Newsgroups: comp.compression Path: solitaire.cv.nrao.edu!hearst.acc.Virginia.EDU!caen!msunews!uchinews!ellis!roe2 From: roe2@ellis.uchicago.edu (Cave Newt) Subject: Re: GZIP file format Message-ID: <1994Dec28.062253.4076@midway.uchicago.edu> Sender: news@uchinews.uchicago.edu (News System) Reply-To: roe2@midway.uchicago.edu Organization: University of Chicago References: <3d9ple$dai@sun4.bham.ac.uk> Date: Wed, 28 Dec 1994 06:22:53 GMT Lines: 21 phillipm@eee.bham.ac.uk (Mark Phillips) writes: >I have an application (running under DOS) which generates large >files and I wish to compress them to the PKZIP format on-line, that is, >as they are created, so that they may be transfered and decompressed on >a UNIX platform with PKUNZIP at a later date. > >Is it possible to obtain the PKZIP file format specification (including >a specification of the variant of the Lempel Zif compression algorithm >PKZIP uses ) or, better still, the source code for PKZIP ? Obviously, >I am prepared to pay a reasonable fee for the latter. Obviously you're a bit confused here (gzip has almost nothing to do with PKZIP/Zip), but free source for zip/unzip is available from ftp.uu.net in /pub/archiving/zip. If you build the source code into your app you might be able to pull this off, but anything running under DOS has a huge strike against it already. Forget about "pipes," unless you're running in a DOS box under OS/2 and incorporate the named variety... Greg Roelofs Info-ZIP From scott@santarosa.edu Fri Dec 30 12:18:24 1994 Path: solitaire.cv.nrao.edu!hearst.acc.Virginia.EDU!caen!spool.mu.edu!agate!library.ucla.edu!csulb.edu!nic-nac.CSU.net!floyd!floyd!not-for-mail From: scott@santarosa.edu (Scott Doty) Newsgroups: comp.compression,comp.dsp Subject: Re: Compression algorithms for DSP over SLIP/PPP Followup-To: comp.compression,comp.dsp Date: 29 Dec 1994 23:54:35 -0800 Organization: Santa Rosa Junior College, Santa Rosa, CA (US) Lines: 41 Message-ID: <3e0ebr$mpj@floyd.santarosa.edu> References: <3dikiu$cpr@News1.mcs.com> NNTP-Posting-Host: floyd.santarosa.edu X-Newsreader: TIN [version 1.2 PL2] Xref: solitaire.cv.nrao.edu comp.compression:15530 comp.dsp:13619 Adam W. Dace (thekind@myhost.subdomain.domain) wrote: [snip] > ztalk uses GSM compression, and I've been wondering if there's anything > better out there that I can get specs on *freely*. I've poked through the > FAQ here, albeit a bit quickly and haven't really found much of use. > I'm by no means a good C programmer nor have I played with DSP much until > recently. If anyone could suggest whether GSM is the way to go, or not > please let me know. There are two LPC libraries that I've got to work under Linux: _[ snippet 1 ]_ Ron Frederick (frederic@parc.xerox.com) of Xerox PARC, Palo Alto, CA, contributed the LPC codec which is based on an implementation done by Ron Zuckerman (ronzu@isu.comm.mot.com) of Motorola which was posted to the Usenet group comp.dsp on 26 June 1992. _ _ _ _ ftp from beta.xerox.com, /pub/net-research/lpc.tar.Z 616 bytes/second, appears to run real-time on a 386/33. Speech quality is...er..."usable." :) _[ snippet 2]_ U.S. Department of Defense LPC-10 2400 bps Voice Coder Release 1.0 October 1993 _ _ _ _ ftp from dspsun.eas.asu.edu, in /pub/speech Better audio quality, but seems to be more CPU intensive. I've used the former to implement a general-purpose audio tool, which will compress and transmit (using TCP) audio from a ulaw source (including /dev/audio). I use this to listen to audiocasts over a PPP link. (ftp.santarosa.edu, pub/Linux/vot/zvot-0.81.tar.gz). Hopefully, this signal, and not noise... -Scott From john@pc.xs4all.nl Sat Dec 31 13:26:36 1994 Path: solitaire.cv.nrao.edu!hearst.acc.Virginia.EDU!caen!math.ohio-state.edu!howland.reston.ans.net!news.sprintlink.net!EU.net!sun4nl!hacktic!pc.xs4all.nl!pc.xs4all.nl!john From: john@pc.xs4all.nl (Jan-Pieter Cornet) Newsgroups: comp.compression Subject: Re: compressor for word-aligned data? Message-ID: <123094173545HNR.1.4@pc.xs4all.nl> Date: Fri, 30 Dec 94 17:35:45 +0100 References: Organization: Jan-Pieter Cornet at home, A Jacobsstr 8, 2642 AD Pijnacker, Netherlands X-Newsreader: HNR 1.4 by Kafka & The Dude Lines: 24 arndt@marvin.physik.uni-giessen.de (Arndt Brenschede) once said: > we are dealing with large data-files from containing essentially > 12-bit adc data. The files are word-aligned, that means, 16 bit are used > instead of 12. It would be nice > sometimes to compress these files, but gzip (and probably most > other compressers) does no good job. Try UC2 in the "tight/multimedia" mode (UC A -TT arch file ...). When your file is indeed better compressed in 16-bit mode, UC2 should auto-detect this and produce a tighter archive. If you want something on another platform than DOS/Windows/OS2, your best bet is to write a filter (actually, two filters with opposite effects) that pre-processes your file into something that compresses better. Examples: take word-wise delta values, or re-order the bytes in your file from 0 1 2 3 4 5 ... to: 0 2 4 ... 1 3 5 ... Good luck, =====BEGIN FRACTAL-COMPRESSED SIGNATURE===== | Jan-Pieter Cornet !PGP0XA4E77CCB/KVC=1FCBE41048A009550F68867928EB8DDF | =====END FRACTAL-COMPRESSED SIGNATURE===== ;-) My v2.6 decompressor (out soon!) will expand this to a 72 minutes MPEG movie!