O'Rourke Review

Joseph O'Rourke (Northampton, MA, U.S.A.), Computing Reviews, 1994

Kanatani is in the grip of a vision, a view of computer vision that he communicates brilliantly in this monograph. He interprets all geometric computation arising in vision as aspects of what he aptly names ``computational projective geometry.'' He starts with a model placing the origin of a three-dimensional space at the lens of a camera, with the image plane at z = f where f is the camera focal length. Every space point projects to an image point P, which is represented by the unit vector form the origin pointing toward P. This ``N-vector'' (his terminology) m uses normalized homogeneous coordinates. Similarly, a space line projects to an image line l, which is represented by a unit N-vector n normal to the plane through the origin that intersects the image plan at l. He formulates a wide variety of vision computations in terms of these N-vectors. By the end of the book, he has convincingly made the case for his viewpoint as being comprehensive and superior to more ad hoc approaches.

This book's coverage can be partitioned roughly into three parts: pure projective geometry, phrased in the author's idiosyncratic terminology; three-dimensional interpretation and reconstruction; and statistical analysis under simple noise models. Typical results in the projective geometry sections include a beautiful characterization of two orthogonal space lines: their vanishing points are conjugate in the image plane (p. 55). (Two N-vectors are conjugate if their dot product is zero.) This characterization permits testing for orthogonality by examining vanishing points in the image plane. A related result is that three mutually orthogonal space lines (for example, lines determined by the edges incident to a corner of a box) have a deep relationship to a particular conic, whose properties permit reconstruction of the line's space orientation from relationships in the image plane (p. 237).

The cluster of topics centering on three-dimensional interpretation---stereo, motion parallax, orthogonal frame reconstruction, structure from motion, and optic flow---are given a pleasantly consistent treatment. Thus, three-dimensional camera rotation is characterized by the same ``epipolar'' equation (p. 157) that described covering stereo two chapters earlier (p. 96). Kanatani's viewpoint is high enough to enable the broad picture to be discerned.

One gem in this material is Horn's ``critical surface'' (p. 181), which characterizes the configurations of feature points that lead to ambiguous three-dimensional reconstruction. It is established that this surface is a hyperboloid of one sheet (or some degenerate version thereof). That the same results holds for optic flow (p. 219) follows easily from Kanatani's consistent formulation.

The statistical analyses consists primarily of long calculations of covariance matrices for the computation of, for example, vanishing points (p. 298) or reconstructed camera rotations (p. 308). The book also contains a useful section on a hypothesis-and-test paradigm, with hypothesis credibility expressed in terms of chi-square values.

The mathematical prerequisites are stiff: for example, Lagrange multipliers and tensors are used without apology. The material is presented in an austere style, with sparse motivation and few examples. I recommend reading the ``Bibliographical Notes'' relegated to the end of each chapter before the chapter they follow, to set the historical context. The exercises all request mathematical derivations, and all are solved in an appendix. Although I can imagine a graduate course based on the book, it is priced beyond the reach of most individuals. The typography is beautiful, virtually typo-free (I found only one typo), and printed on such heavy bond that more than once I found myself attempting to separate what turned out to be a single page.