Heresy II - Why 4D Homogeneous Transform/Clip/Project is Wasteful

John Hilton

4.94/5 (31 votes)

Mar 25, 2010

CPOL

14 min read

67109

757

See why your GPU is getting so hot.

Download NewClip3D.zip - 3.02 KB

Introduction

Practically every 3D graphics system today is based on 4D homogeneous transform/clip/project calculations. This technology has proven itself over the decades as 3D graphics has matured. Some non-homogeneous systems came and went and they handled orthographic or parallel views more efficiently than homogeneous systems but failed to deliver the goods when it came to perspective views. By today’s standards homogeneous math has won.

Yet a little unexplored feature of the clipping algorithm appears to have been overlooked as the various transform/clip/project implementations were developed. For some reason no one seems to have considered using the x = 0 and y = 0 planes as two of the six clipping planes. Direct3D was architected with a z = 0 clipping plane, presumably for the small efficiencies it provides over OpenGL’s corresponding z = -w clipping plane.

Reworking the math for the x = 0 and y = 0 clipping planes, plus a few other tricks, and comparing a 3D based algorithm with 4D homogeneous computations actually reveals the proposed 3D algorithm as being significantly more efficient for both parallel and perspective views. Perhaps it is time to revisit the transform/clip/project technology for it to take the next evolutionary step.

Homogeneous Math in a Nutshell

Homogeneous mathematics is a very elegant technology for dealing with 3D vectors and their transformations. Homogeneous coordinates are 4D vectors created by appending a ‘w’ component of 1 to a 3D vector. For instance, the (x y z) point (1.2 3.4 5.6) becomes the (x y z w) point (1.2 3.4 5.6 1.0). To revert back to a 3D point divide by w, for example, (1.2 2.4 48.0 2.0) becomes (0.6 1.2 24.0). Common 3D transformations can all be implemented with a 4x4 matrix multiplication…

Translate – add a translation vector
Rotate – take 3 dot products with the rotated coordinate system’s x, y and z axes
The rotated x, y and z axes are (xx xy xz), (yx yy yz) and (zx zy zz).
Scale – multiply each coordinate by a factor
Skew (shear) – add scaled values of another coordinate. For example, skewing x and y in z…
Perspective projection – scale x, and y by the relative distance from the eye
If you hold up a sheet of glass and close one eye then trace the image of the real world seen through the sheet the traced points are ( x•f(z) y•f(z) ) where f(z)=EyeZ/(EyeZ-z) and EyeZ is the distance from the sheet of glass to your eye. The coordinate system for these values has the origin at the point on the glass closest to your eye, +X is right, +Y up and +Z back.

Transforming the 4D vector down to 2D…

Any number of 4x4 homogeneous matrices can be multiplied together to combine any sequence of transformations into a single matrix. Graphics systems commonly create matrices to map an object’s native 3D coordinates into a lighting space, where lighting computations are performed, then, using another matrix, map the lighting space coordinates into clipping space where clipping operations are performed.

Remember, what is important are the changes made to the vectors. How these changes are made is less important although this does affect efficiency. Homogeneous math is simply a set of techniques that can be used to implement these transformational changes.

4D Homogeneous Transform/Clip/Project

Viewing is the mapping of items within a view volume to the screen. For a rectangular image the view volume is either a rectangular prism or a truncated rectangular pyramid, a.k.a. a frustum. Items outside the view volume are ignored, items inside the view volume are rendered and items crossing the boundaries of the view volume are clipped to the boundaries and any remaining portion is then rendered.

OpenGL^®, Direct3D^® and other 3D graphics APIs all describe two types of projections; orthographic and perspective. There is actually considerable, but unrecognized, confusion surrounding these projections. In particular, the perspective projection’s origin is chosen, for what seem to be logical reasons, the eyepoint. The eyepoint happens to be a singularity that forces the lower right homogeneous projection matrix value to zero. Orthographic projections have a lower right value of 1 so this leads to the incorrect conclusion that the two projections are fundamentally different. In fact the orthographic projection is simply a special case of the more general perspective projection. See my October 2009 article, A New Perspective on Viewing, for more details.

Homogeneous clipping takes an object’s 3D points then, using an homogeneous matrix multiply, transforms the points into homogeneous clipping space. The homogeneous OpenGL view volume is a cube with opposing 3D corners at (-w -w -w w) to (w w w w) which corresponds to the 3D cube at (-1 -1 -1) to (1 1 1). The surfaces of this cube lie in the planes, using homogeneous formulae, x = -w, y = -w, z = -w, x = w, y = w and z = w. The simplicity and symmetry of these homogeneous plane formulae lead to consistent clipping implementations and this provides the tremendous appeal for homogeneous clipping.

These simple formulae give the impression that homogeneous clipping treats orthographic and perspective views in the same way. It does but only because the homogenous math ends up with w being comparable to 1, for parallel views, or comparable to z for perspective views. The corresponding 3D clippings planes are x = ±1 and y = ±1 for parallel and x = ±z and y = ±z for perspective.

Practically all graphic items are essentially reduced to just line segments that are clipped in the homogeneous clipping space. Since a line segment may cross any number of clipping planes, from none to six, the clipping algorithm needs to determine which clipping planes are crossed. These are simple comparison, or subtraction, operations such as;
x<w y<w z<w x>-w y>-w z>-w
Each comparison determines which side of the clipping plane the point is on and, with some bit-mask logic, the line segment can be trivially accepted, if totally inside the view volume, or rejected, if totally outside. See any of the general graphics textbooks for the details.

Line segments that cross one or more clipping planes need to be trimmed to these clipping planes. Trimming will determine the portion of the line segment, if any, that is within the view volume. Calculating all three coordinate values for each 3D point of intersection is not as efficient as calculating a parametric value, being the relative distance along the line, which is the t parameter in the formula
Parametric equation

These relative distances are compared to determine the actual portion, if any, of the line segment within the view volume. If there is a trimmed line segment within the view volume the relative distance(s) can be used to calculate the trimmed 4D homogeneous line segment.

For a line passing through all six clipping planes the relative distance is calculated using, for instance;

dw = w1-w0
a = (x1-x0)+dw
b = (y1-y0)+dw
dz = z1-z0
tLft = (-x0-w0) / a
tRgt = (x1-w1) / a
tBtm = (-y0-w0) / b
tTop = (y1-w1) / b
tBck = (z1-w1) / (dw-dz)
tFrt = -z0 / dz

Considerable logic is being left out of this discussion. This logic can be found in the general graphics text books.

Once the one or two bounding t parameters are known the clipped 4D coordinates are calculated based on the vector formula

Parametric equation

The clipped line segment’s endpoints are then transformed to canonical 3D space by dividing by w to yield (xc/wc, yc/wc, zc/wc). These are further scaled and translated into pixel and depth buffer space where the geometry is turned into pixel colour and depth buffer values.

New 3D Transform/Clip/Project

A regular space is defined as a space where angles are correct. World space is defined to be a regular space and, in general, objects are defined in regular object spaces. Non-uniform scaling and skewing transformations map a regular space into a non-regular space where angles between lines are not correct. For real world transformations only translations and rotations are valid as real world solid objects do not scale or skew.

The line clipping calculation is valid in both regular and non-regular spaces as the calculation is based on a parametric distance along the line segment. This means clipping space can be chosen to optimize the clipping calculations. Homogeneous clipping does just that by selecting a cube, for OpenGL, and a square prism, for DirectX. In the same way 3D clipping is free to select any suitable view volume. The shape proposed here uses the three clipping planes passing through the origin; x = 0, y = 0 and z = 0; the front clipping plane is z = 1 and the top and right clipping planes are x=1-iez*z and y=1-iez*z where iez=1/EyeZ. The clipping space eyepoint is (0 0 EyeZ). A parallel projection has EyeZ at infinity so iez = 0.

Side view of view volume

Figure 1. Side view of the clipping space view volume.

Any rectangular frustum or prism view volume in world space can be rotated, non-uniformly scaled, skewed and translated using a 3x3 matrix multiply and adding a 3D translation vector to map the world space view volume to the proposed clipping space view volume.

For each point the following values are calculated;

OneMinusIezZ = 1 - iez * z
RgtOffset = x - OneMinusIez
TopOffset = y - OneMinusIez
FrtOffset = z - 1;

The 3D parametric line clipping calculations for a line passing through all six planes are, for instance;

dx = x1 - x0
dy = y1 - y0
dz = z1 - z0
dRgt = Rgt1 - Rgt0
dTop = Top1 - Top0
tLft = -x0/dx
tRgt = -RgtOffset0/dRgt
tBtm = -y0/dy
tTop = -TopOffset0/dTop
tBck = -z0/dz
tFrt = -FrtOffset0/dz

With the one or two bounding t parameters the vector calculation provides the 3D clipped point…

Parametric equation

The 3D point is now projected to canonical space using

Projected point

What about Z?

Hmmm. Zed or Zee is a bit of an oddity. What really needs to be done with z? Well, from a display hardware point of view z is used for occlusion. A z-buffer is filled with depth values during rendering to determine if a new pixel is in front of the existing pixel. Line rasterizing algorithms are very efficient and use simple adders to draw a line. It would be great to rasterize the depth values in the same way. That means, for each pixel generated, the depth calculation would simply add a consistent delta z or delta depth value.

Now if you take the display space line created by incrementing, say, the pixel x coordinate and adding the appropriate delta z value to the z coordinate and map this line back to the clipping space view volume using a simple scaling of the z depth value you end up with a curve, not a straight line. (The curve does appear as a straight line when viewed from the eyepoint.) To map a line in display space to a line in clipping space requires a non-linear scaling in the z direction.

NewClip3D/WrldClpCnDsplySpc600x170.GIF

Figure 2. World space to clipping space to canonical space to display space

So what is this non-linear ‘squishing’ calculation? Homogeneous math to the rescue! A by-product of the homogeneous mapping of the view space view volume to canonical homogeneous space is the required non-linear z direction squishing. See figure 3. Unfortunately there is a side effect; resolution in the z direction varies with distance and depends on the ratio of the front and back clipping plane distances from the eyepoint. Too big a ratio and the bulk of the clipping space view volume gets squashed in hard into the front of the canonical and display space view volumes leaving very poor z-buffer resolution near the back clipping plane. This resolution issue is well documented in the graphics literature.

Perspective z squish

Figure 3. Clipping space to canonical space z ‘squishing’

Figure 3 shows the z direction squishing for iez = 1/1.5. The reverse squish calculation is

NewClip3D/ZcanonCalc2.GIF

The display space z coordinate list
(0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1.0)
maps to the canonical space list
(0 0.3 0.5 0.643 0.75 0.833 0.9 0.955 1.0)

Another more recent solution to the non-linear z direction issue is w-buffering which, instead of adding a consistent depth value during rendering, calculates the non-linear depth value to match an iterative consistent delta z in clipping space. The calculation requires a division for each and every pixel but the occlusion performance is significantly improved allowing a reduction in the number of depth buffer bits and/or a much greater ratio between the front and back clipping plane distances. W-buffering is gaining popularity as the cost of the additional computations is outweighed by the savings in depth buffer RAM.

The main point of this section is to point out why the same calculation is required for z as with homogeneous transformations. The second point is that either z-buffering or w-buffering can be used with the proposed transform/project/clip technique. There may be some extra optimizations for w-buffering but this needs further investigation.

Comparing Transform/Clip/Project Computations

Now for some gory detail…

4D Homogeneous		Proposed 3D method
Transform (x y z 1)* [4x4]	12 *, 12 ±	(x y z) * [3x3] + (tx ty tz)	9 *, 9 ±
Trivial accept/ reject test x>-w y>-w z>-w x<w y<w z<w	6 ±	a=1-iez*z RgtOff=x-a TopOff=y-a x<0 y<0 z<0 z>1 RgtOff>0 TopOff>0	1 * 2 ± See note 2
Worst case clipping dz = z0-z1 dw = w0-w1 a = (x0-x1)+dw b = (y0-y1)+dw tlft = (-x0-w0) / a trgt = (x0-w0) / a tbtm = (-y0-w0) / b ttop = (y0-w0) / b tfrt = -z0 / dz tbck = (z0-w0) / (dw-dz) tEnter=max(tltf,tbtm,tbck) tExit=min(trgt,ttop,tfrt) tEnter<tExit Pin = P0 + dP * tEnter Pout = P0 + dP * tExit	6 ± 6 ± 6 / 5 ± 8 *, 8 ±	dx=x0-x1 dy=y0-y1 dz=z0-z1 dRgt=RgtOff0-RgtOff1 dTop=TopOff0-TopOff1 tLft=-x0/dx tRgt=-RgtOff/dRgt tBtm=-y0/dy tTop=-TopOff/dTop tBck=-z0/dz tFrt=(1-z1)/dz tEnter=max(tltf,tbtm,tbck) tExit=min(trgt,ttop,tfrt) tEnter<tExit Pin = P0 + dP * tin Pout = P0 + dP * tout	5 ± 6 / 5 ± (t comp) 6 *, 6 ±
Perspective and parallel project xcan = xc / wc ycan = yc / wc zcan = zc / wc	3 / (trivial) 6 / (clipped)	Perspective project (parallel is optimized away) xcan = xc / a ycan = yc / a zcan = zc / a	3 / (trivial) 6 / (clipped)
Totals for trivial accept/ reject line segments	3 / 12 * 18 ±		3 / 10 * (-17%) 11 ± (-39%)
Totals for worst case line segments crossing all six clipping planes	9 / 20 * 43 ±		9 / (same) 16 * (-20%) 27 ± (-37%)

Notes:

The analysis is based on items like polylines and triangle fans where each new point defines a new graphic item. Items like line segments would double many operations.
In hardware implementations comparisons and arithmetic operations with 0 and 1 and negations can be highly optimized and so are not counted.

For a given view the savings depend on how many clipping operations are required and this depends on the view. So the lower limit for savings are 17% reduction in multiplies and 37% reduction in add/subtracts. Parallel projections eliminate a further 3 divides and 1 multiply.

The iez perspective value can be optimized for some systems, such as low cost hand held game devices, to provide a set of values so as to simplify the multiply to a few add/subtract/shifts.

Pros and Cons

Clearly the real benefits of this optimized transform/clip/project algorithm derive from the reduced computations. The only possible drawback so far conceived is the ability of homogeneous matrices to combine two or more perspective projections. Personally I have never come across this in practice. Does anyone know of any 3D program that actually combines two or more perspective transformations or similar weird combinations?

From a theoretical degrees-of-freedom perspective a 4x4 homogeneous transformation has 15 degrees of freedom to map any six planar-sided volume to, say, a cube. A review of “A New Perspective on Viewing” shows 3 rotate, 3 translate and 7 view volume shape values cover all types of views. The ZNear and ZFar values can be further optimized to ZHalfDepth so a total of 12 parameters or degrees of freedom are required for 3D viewing. The proposed transform/clip/project technique employs 13 independent parameters. To illustrate one type of restriction a truncated wedge shaped view volume could be mapped by a 4x4 homogeneous matrix but not by the proposed technique. I have never seen or even heard of this shape being hinted at in any graphics literature or 3D programs.

The Code

A single C++ file with a main() entry point is provided. It only implements the clipping algorithm in clipping space (of course!). The program reads the eyepoint and a list of 3D points, one per line, from standard input and outputs the clipped vectors or the text "Rejected" to standard output.

Conclusion

The proposed 3D transform/clip/project technique reduces the number of computations for 3D graphics tranform/clip/project. Any 3D computational efficiency leads to faster frame rates and/or reduced power usage, both worthwhile outcomes. Lower power means longer life batteries for portable devices and a reduced carbon footprint for the planet.

Still, has something has been overlooked? If so, who will point it out?

References

Computer Graphics: Principles and Practice, 2nd Edition, 1990, Foley et al.
Mathematics for 3D Game Programming and Computer Graphics, 2nd Edition, 2004, Lengyel
http://www.opengl.org/resources/faq/technical/depthbuffer.htm
A New Perspective on Viewing

History

March 24, 2010: Initial post.
March 24, 2010: Fixed typo "eiz" to "iez" and fixed 4th reference link.
February 21, 2014: Fixed pseudo code formatting.