Trade-offs in Entropy Coding

Well, I suspect no one takes me all that seriously about compression, partly because I suspect I tend to use slightly unusual designs, and tend to focus more on speed than on ratio.

There are a variety of entropy coders, with different trade-offs:

  • Static Huffman
    • Basically the “gold standard” of entropy coders which use a normal bitstream.
    • Pros:
      • Has generally pretty fast encode/decode speeds.
      • Theoretically optimal for a bitstream.
    • Cons:
      • Requires transmitting / initializing / … a Huffman table (non-zero cost).
      • For chaotic streams (such as small messages over UDP), ensuring that the decoder has a matching table can become problematic.
      • More so the space cost of a Huffman table may be a problem for small messages, as can the constant overhead of initializing the tables.
  • Adaptive Huffman
    • Usually unfavorable as it tends to be very slow vs most other options.
    • Need to adjust trees per-symbol.
    • Does not favor the use of a lookup-table driven decoder.
  • Rice Coding
    • Useful, but largely forgotten.
    • Pros
      • Very little constant overhead.
      • Is easy to make adaptive, allowing a simple single-pass encoder.
      • Can achieve similar speeds to static Huffman.
    • Cons
      • Natively only matches certain distributions.
      • Absent length-limiting hacks, has the potential for excessively long symbols.
    • Tricky Hacks
      • Length limiting to avoid extremely long symbols.
      • Reorganizing symbols can allow for more Huffman-like behavior.
        • Dynamic symbol swapping can be fast, effective, and is adaptive.
        • Will generally call this category. “Rice+STF”
  • Bitwise Range Coding
    • Compresses good, but not particularly fast.
    • Pros:
      • Adaptive
      • Can beat Huffman variants regarding compression.
      • Still faster than Adaptive Huffman.
      • Can be similarly flexible as a raw bitstream.
    • Cons
      • Slower than Huffman and Rice
      • Slower than a raw bitstream (if used as a bitstream).
    • Tricky Hacks
      • Can be stuck on the back-end of a Huffman or Rice coder for improving compression (slightly) with less speed cost than using it directly.
      • Sadly, the compression gains of using it over a raw bitstream are frequently not high enough to make its speed costs worthwhile.
  • Symbol Oriented Range Coding
    • Compresses good, also not exactly fast.
    • Cons (vs Bitwise range coding)
      • Doesn’t really compress any better
      • Much more complex and often slower
      • Much less flexible about how it is used.
  • ANS and friends
    • A sort of magic unicorn I don’t really understand that well.
    • Pros
      • Speed can be more competitive with that of Static Huffman.
      • Compression is more like that of a Range Coder.
    • Cons
      • I personally can’t really make much sense of how it works.
      • Requires encoding symbols in reverse order.
      • Seems to have some restrictions similar to symbol-oriented range coders.
      • Its high speeds only seem to materialize on certain hardware in my past tests.

Relative Tradeoffs:

  • Between Huffman and Range Coding
    • Have only rarely seen ratio differences much over about 10%
    • Cases like high symbol skew (where range coding fares well) can usually be addressed more cheaply via other means.
    • A slowdown of typically 2-4x isn’t usually worth a 5-10% gain in compression.
  • Between Huffman and Rice+STF
    • Huffman generally compresses better and is faster than Rice+STF past a certain message size.
    • Rice+STF can allow for a simpler encoder/decoder though, and is faster for small payloads.
      • Its origins actually came about as a cheaper alternative to Deflate-style Huffman-coded Huffman tables.
      • In some cases, the space savings of not transmitting a Huffman table with the data may offset its less-than-optimal coding, putting it ahead.
  • Huffman or Rice + Bitwise Range Coder Backend
    • Can regain some of the compression advantages of Range coding.
    • Has less steep of an impact on codec speed
      • Given the range coder deals only with the already-compressed data.
    • A drawback is that now one has both the code complexity of a Huffman or Rice coder in addition to the Range coder, in addition to potentially other unique funkiness (such as interleaving multiple bitstreams to allow for multiple RC contexts, etc).
  • Not using any entropy coding
    • Typically the fastest option.
    • Typically the worst compression
      • Because, you know, raw bytes.
    • May still be a good option where speed is a lot more important than compression ratio.

 

It can be noted that most of my older designs had used static Huffman, but many of my newer ones have used Rice+STF variants. Generally the relative speed, simplicity, and flexibility has been enough to offset the arguably “bad” compression.

If the total ratio difference is maybe 20% between a “bad” Rice-coded backend, and a much slower range-coded backend, but the result is a 400% speed difference, is it worthwhile? Probably depends whether size or speed is more relevant.

Both may save significantly over the raw uncompressed data, which is usually where most of the gains lie. It often seems more sensible to invest a lot of the time/effort budget in areas which have a more direct effect on overall compression (such as any filtering/quantization/transform steps in image/video compression, or the effectiveness of the LZ matching and similar for LZ77 compressors).

For example, in an image codec, one may find it may well matter a lot more what type of DC predictor is used, or how deltas or coefficients are represented, than the specific choice of entropy coder back-end or similar.

Though, granted, I remain sort of an outcast here.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s