PIKApp/plug-ins/file-dds/README.dxt

The following notes are from an email I received from Fabian 'ryg' Giesen. They
should help answer some questions regarding the code found in dxt.c

---

mul8bit: This formula is equivalent to (a*b)/255 for a fairly large set of
values (didn't bother finding out the exact bounds). It's a fairly well-known
trick also used in other parts of the PIKA codebase (among others) and was
first documented by Jim Blinn, I think. A good reference is his book "Dirty
Pixels".

---

lerp_rgb: The expression computed is exactly equivalent to mul8bit(a[i],255-f)
+ mul8bit(b[i],f) - I just verified that by brute force for -255 <= b[i]-a[i]
<= 255 because I couldn't be bothered to find a derivation for this :) . You
customarily use a factor between 0 and 256 include for LERPing if you can, but
normal DXT blocks have colors placed at 1/3 and 2/3 between the two
interpolated colors. 255 is divisible by 3, so lerp_rgb can later be used in
eval_colors to determine the result of

  a*(1/3) + b*(2/3)  and  a*(2/3) + b*(1/3)

exactly, which is nice :)

---

dither_block: This is just Floyd-Steinberg dithering. Distributing the error
terms to the adjacent pixels for each source pixel is the customary variant to
write this, but since blocks are so small, it's nearly all boundary
cases; "gathering" the error terms per source pixel turned out to be simpler.

---

match_colors_block: This includes a few tricks. We want to map each source
color to its nearest representable color (index), using the euclidean distance
as a metric.

The obvious, brute-force way is to just compare squared distances to the 4
representable colors for each source pixel (using, for example,
color_distance); this requires a lot of arithmetic operations.

Instead, the code uses the fact that the 4 colors lie on a line in RGB space
(only approximately in truth, since we have discrete steps). It's a well-known
fact in geometry that if P is the closest point to the line L in 3D space, and
Q is the point closest to P on L, then (P-Q) is orthogonal to the direction of
L. So (in R3 at least), we can simply determine Q by projecting P onto the
direction vector of L, which is done by the 16 dot products in the first for
loop. Since the RGB values have discrete steps in reality, this is just an
approximation, but a quite good one.

The loop after that determines where the 4 actually representable colors lie
along the line. After that, you simply need to determine which of those 4
values your current pixel's dot product is closest to. Instead of doing a
bunch of comparisons per pixel, the code computes the points along the line
at which the decision would change (that's c0pt, halfpt and c3pt). This would
still require 3 comparisons; by testing the middle point - which is halfpt -
first, one point is always excluded from consideration, which reduces the
number of compares to two in all cases. No big deal, but hey, why not :)

Similarly, instead of dithering full RGB values, I just dither the dot product
values. Again, by my experiments this works just as well and reduces the
amount of work significantly.

---

optimize_colors_block: This first determines min/max/mean for r,g,b and the
covariance matrix for the color distribution. The latter is used to determine
the principal component (=eigenvector with largest eigenvalue) of that color
distribution, or the direction along which the colors in the block vary most
(in layman's terms) - the eigenvector is determined using simple power
iteration (a standard technique). That iteration needs a seed vector; I just
use (max_r-min_r,max_g-min_g,max_b-min_b), which works well in practice. If
the iteration converges to a vector with very small magnitude (or zero), which
can happen sometimes, it just defaults to an approximation of the YCbCr Y
vector (scaled appropriately to make sure no precision is lost with the dot
products).

This is then used as an initial approximation for the direction of the line
through RGB color space that is used to select colors for that block. It
simply uses the two most extreme points along that axis as the two colors
stored in the block.

---

refine_block: This takes a block and a chosen set of color indices, and tries
to determine the optimal endpoints for these indices (i.e. the full process
is: take block, use color distribution to get rough estimate of optimal
direction, assign color indices accordingly, use these to get better
endpoints, assign indices again). The computation just solves a least-squares
system to minimize the square error between the actual pixels and the
interpolated colors (solving for the two extremal colors). The least-squares
computation turns out to boil down to solving a 2x2 system of linear equations
for each of the RGB color channels; the actual solution is computed using
Cramer's rule.

The code is somewhat weird (especially the "prods"/"akku" thing), but that's
just to reduce the amount of computation done (this piece of code is a hot
spot, so it's worth it).

The (!yy || !xx || xx * yy == xy*xy) part checks whether the system of linear
equations is degenerate. After pondering about this some months ago, I found
out that the only case in which this can ever happen is when all pixels of the
source block get mapped to the same color value. But that case can be handled
better in any case, by just using the single-color lookup tables. I've
attached the new version of refine_block using that observation - it both
increases performance (a bit) and image quality, so it's pretty neat.

---

encode_alpha_block_DXT5: The only thing that shouldn't be obvious is the index
computation. This just uses some two's complement arithmetic and bit shuffling
to avoid computing

  7 * (in_alpha - min_alpha) / (max_alpha - min_alpha)

which would be more expensive. (The extra calc with idx is just because of the
weird DXT color numbering).

---

Some more notes on the general flow:

The computation without dithering is just as I explained in the part about
refine_block:
1. Calc initial endpoints directly from block colors
2. Determine color indices for these endpoints
3. Optimize endpoints given color indices
4. Determine new color indices given optimized endpoints

With dithering, there's a twist: The first two steps are done using a version
of the block dithered to colors that are representable using 16-bit 565 RGB
values. I've found that this significantly improves visual quality with
dithering; if you don't do this, the colors inside a block typically vary too
little for dithering to be useful. This process decreases objective quality
but typically looks notably better.

The single-color match (omatch5/omatch6) trick: If the block only contains a
single color (or, for the improved version of refine_block, if the color
values are sufficiently close to all map to the same index), an optimal
solution can be used instead.

You normally want solid-color blocks to map to solid-color block (because
dithering patterns are very obvious otherwise). This means that all color
indices for the block are going to be identical, i.e. all 0, 1, 2 or 3. All-0
is symmetrical to all-1 (with the endpoints flipped), and all-2 is symmetrical
to all-3 (again with the endpoints flipped). So you only need to consider
all-0 or all-2 indices for the block. Furthermore, all-0 means that the first
endpoint specified in the block gets used for all pixels; you can get the same
result by setting both endpoints to the same value and using index 2 for
everything.

In short, you can always set all indices to 2 without sacrificing any quality
whatsoever. For any of the color components R,G,B, you then want to determine
5-bit or 6-bit values a and b such that

  expand[a]*(2/3) + expand[b]*(1/3) is as close as possible to R/G/B

and that's exactly what's in omatch5 (for 5-bit values) and omatch6 (for 6-bit
values).

If you use the 3-color+transparency mode of DXT1, you need separate versions
for omatch5/omatch6 for this case, since the interpolated value is exactly
halfway between the two endpoints instead of 1/3 of the way along. But I
recommend against that mode, because even if your top-level mipmap has 1-bit
transparency, mipmaps will have more than 2 distinct values, and the DXT mode
is selected per texture. That's why my original code doesn't support the
3-color mode of DXT1 at all: I don't think it's useful in practice.

Not sure if all of this is useful to you or not, but I guess it might make
sense to at least put this in a separate text file or whatever, because
otherwise it's really quite hard to see what's going on in some places.

Cheers,
-Fabian "ryg" Giesen
Initial checkin of Pika from heckimp 2023-09-26 00:35:21 +02:00			`The following notes are from an email I received from Fabian 'ryg' Giesen. They`
			`should help answer some questions regarding the code found in dxt.c`

			`---`

			`mul8bit: This formula is equivalent to (a*b)/255 for a fairly large set of`
			`values (didn't bother finding out the exact bounds). It's a fairly well-known`
			`trick also used in other parts of the PIKA codebase (among others) and was`
			`first documented by Jim Blinn, I think. A good reference is his book "Dirty`
			`Pixels".`

			`---`

			`lerp_rgb: The expression computed is exactly equivalent to mul8bit(a[i],255-f)`
			`+ mul8bit(b[i],f) - I just verified that by brute force for -255 <= b[i]-a[i]`
			`<= 255 because I couldn't be bothered to find a derivation for this :) . You`
			`customarily use a factor between 0 and 256 include for LERPing if you can, but`
			`normal DXT blocks have colors placed at 1/3 and 2/3 between the two`
			`interpolated colors. 255 is divisible by 3, so lerp_rgb can later be used in`
			`eval_colors to determine the result of`

			`a(1/3) + b(2/3) and a(2/3) + b(1/3)`

			`exactly, which is nice :)`

			`---`

			`dither_block: This is just Floyd-Steinberg dithering. Distributing the error`
			`terms to the adjacent pixels for each source pixel is the customary variant to`
			`write this, but since blocks are so small, it's nearly all boundary`
			`cases; "gathering" the error terms per source pixel turned out to be simpler.`

			`---`

			`match_colors_block: This includes a few tricks. We want to map each source`
			`color to its nearest representable color (index), using the euclidean distance`
			`as a metric.`

			`The obvious, brute-force way is to just compare squared distances to the 4`
			`representable colors for each source pixel (using, for example,`
			`color_distance); this requires a lot of arithmetic operations.`

			`Instead, the code uses the fact that the 4 colors lie on a line in RGB space`
			`(only approximately in truth, since we have discrete steps). It's a well-known`
			`fact in geometry that if P is the closest point to the line L in 3D space, and`
			`Q is the point closest to P on L, then (P-Q) is orthogonal to the direction of`
			`L. So (in R3 at least), we can simply determine Q by projecting P onto the`
			`direction vector of L, which is done by the 16 dot products in the first for`
			`loop. Since the RGB values have discrete steps in reality, this is just an`
			`approximation, but a quite good one.`

			`The loop after that determines where the 4 actually representable colors lie`
			`along the line. After that, you simply need to determine which of those 4`
			`values your current pixel's dot product is closest to. Instead of doing a`
			`bunch of comparisons per pixel, the code computes the points along the line`
			`at which the decision would change (that's c0pt, halfpt and c3pt). This would`
			`still require 3 comparisons; by testing the middle point - which is halfpt -`
			`first, one point is always excluded from consideration, which reduces the`
			`number of compares to two in all cases. No big deal, but hey, why not :)`

			`Similarly, instead of dithering full RGB values, I just dither the dot product`
			`values. Again, by my experiments this works just as well and reduces the`
			`amount of work significantly.`

			`---`

			`optimize_colors_block: This first determines min/max/mean for r,g,b and the`
			`covariance matrix for the color distribution. The latter is used to determine`
			`the principal component (=eigenvector with largest eigenvalue) of that color`
			`distribution, or the direction along which the colors in the block vary most`
			`(in layman's terms) - the eigenvector is determined using simple power`
			`iteration (a standard technique). That iteration needs a seed vector; I just`
			`use (max_r-min_r,max_g-min_g,max_b-min_b), which works well in practice. If`
			`the iteration converges to a vector with very small magnitude (or zero), which`
			`can happen sometimes, it just defaults to an approximation of the YCbCr Y`
			`vector (scaled appropriately to make sure no precision is lost with the dot`
			`products).`

			`This is then used as an initial approximation for the direction of the line`
			`through RGB color space that is used to select colors for that block. It`
			`simply uses the two most extreme points along that axis as the two colors`
			`stored in the block.`

			`---`

			`refine_block: This takes a block and a chosen set of color indices, and tries`
			`to determine the optimal endpoints for these indices (i.e. the full process`
			`is: take block, use color distribution to get rough estimate of optimal`
			`direction, assign color indices accordingly, use these to get better`
			`endpoints, assign indices again). The computation just solves a least-squares`
			`system to minimize the square error between the actual pixels and the`
			`interpolated colors (solving for the two extremal colors). The least-squares`
			`computation turns out to boil down to solving a 2x2 system of linear equations`
			`for each of the RGB color channels; the actual solution is computed using`
			`Cramer's rule.`

			`The code is somewhat weird (especially the "prods"/"akku" thing), but that's`
			`just to reduce the amount of computation done (this piece of code is a hot`
			`spot, so it's worth it).`

			`The (!yy \|\| !xx \|\| xx * yy == xy*xy) part checks whether the system of linear`
			`equations is degenerate. After pondering about this some months ago, I found`
			`out that the only case in which this can ever happen is when all pixels of the`
			`source block get mapped to the same color value. But that case can be handled`
			`better in any case, by just using the single-color lookup tables. I've`
			`attached the new version of refine_block using that observation - it both`
			`increases performance (a bit) and image quality, so it's pretty neat.`

			`---`

			`encode_alpha_block_DXT5: The only thing that shouldn't be obvious is the index`
			`computation. This just uses some two's complement arithmetic and bit shuffling`
			`to avoid computing`

			`7 * (in_alpha - min_alpha) / (max_alpha - min_alpha)`

			`which would be more expensive. (The extra calc with idx is just because of the`
			`weird DXT color numbering).`

			`---`

			`Some more notes on the general flow:`

			`The computation without dithering is just as I explained in the part about`
			`refine_block:`
			`1. Calc initial endpoints directly from block colors`
			`2. Determine color indices for these endpoints`
			`3. Optimize endpoints given color indices`
			`4. Determine new color indices given optimized endpoints`

			`With dithering, there's a twist: The first two steps are done using a version`
			`of the block dithered to colors that are representable using 16-bit 565 RGB`
			`values. I've found that this significantly improves visual quality with`
			`dithering; if you don't do this, the colors inside a block typically vary too`
			`little for dithering to be useful. This process decreases objective quality`
			`but typically looks notably better.`

			`The single-color match (omatch5/omatch6) trick: If the block only contains a`
			`single color (or, for the improved version of refine_block, if the color`
			`values are sufficiently close to all map to the same index), an optimal`
			`solution can be used instead.`

			`You normally want solid-color blocks to map to solid-color block (because`
			`dithering patterns are very obvious otherwise). This means that all color`
			`indices for the block are going to be identical, i.e. all 0, 1, 2 or 3. All-0`
			`is symmetrical to all-1 (with the endpoints flipped), and all-2 is symmetrical`
			`to all-3 (again with the endpoints flipped). So you only need to consider`
			`all-0 or all-2 indices for the block. Furthermore, all-0 means that the first`
			`endpoint specified in the block gets used for all pixels; you can get the same`
			`result by setting both endpoints to the same value and using index 2 for`
			`everything.`

			`In short, you can always set all indices to 2 without sacrificing any quality`
			`whatsoever. For any of the color components R,G,B, you then want to determine`
			`5-bit or 6-bit values a and b such that`

			`expand[a](2/3) + expand[b](1/3) is as close as possible to R/G/B`

			`and that's exactly what's in omatch5 (for 5-bit values) and omatch6 (for 6-bit`
			`values).`

			`If you use the 3-color+transparency mode of DXT1, you need separate versions`
			`for omatch5/omatch6 for this case, since the interpolated value is exactly`
			`halfway between the two endpoints instead of 1/3 of the way along. But I`
			`recommend against that mode, because even if your top-level mipmap has 1-bit`
			`transparency, mipmaps will have more than 2 distinct values, and the DXT mode`
			`is selected per texture. That's why my original code doesn't support the`
			`3-color mode of DXT1 at all: I don't think it's useful in practice.`

			`Not sure if all of this is useful to you or not, but I guess it might make`
			`sense to at least put this in a separate text file or whatever, because`
			`otherwise it's really quite hard to see what's going on in some places.`

			`Cheers,`
			`-Fabian "ryg" Giesen`