Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Differential Scene Flow from Light Field Gradients

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

This paper presents novel techniques for recovering 3D dense scene flow, based on differential analysis of 4D light fields. The key enabling result is a per-ray linear equation, called the ray flow equation, that relates 3D scene flow to 4D light field gradients. The ray flow equation is invariant to 3D scene structure and applicable to a general class of scenes, but is under-constrained (3 unknowns per equation). Thus, additional constraints must be imposed to recover motion. We develop two families of scene flow algorithms by leveraging the structural similarity between ray flow and optical flow equations: local ‘Lucas–Kanade’ ray flow and global ‘Horn–Schunck’ ray flow, inspired by corresponding optical flow methods. We also develop a combined local–global method by utilizing the correspondence structure in the light fields. We demonstrate high precision 3D scene flow recovery for a wide range of scenarios, including rotation and non-rigid motion. We analyze the theoretical and practical performance limits of the proposed techniques via the light field structure tensor, a \(3 \times 3\) matrix that encodes the local structure of light fields. We envision that the proposed analysis and algorithms will lead to design of future light-field cameras that are optimized for motion sensing, in addition to depth sensing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. Structure tensors have been researched and defined differently in the light field community (e.g., Neumann et al. 2004). Here it is defined by the gradients w.r.t. the 3D motion and is thus a \(3\times 3\) matrix.

  2. Although the structure tensor theoretically has rank 2, the ratio \(\frac{\lambda _1}{\lambda _2}\) of the largest and second largest eigenvalues can be large. This is because the eigenvalue corresponding to Z motion depends on the range of (uv) coordinates, which is limited by the size of the light field window. Therefore, a sufficiently large window size is required for motion recovery.

References

  • Adelson, E. H., & Wang, J. Y. A. (1992). Single lens stereo with a plenoptic camera. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 14(2), 99–106.

    Article  Google Scholar 

  • Alexander, E., Guo, Q., Koppal, S., Gortler, S., & Zickler, T. (2016). Focal flow: Measuring distance and velocity with defocus and differential motion. In European conference on computer vision (ECCV) (pp. 667–682). Heidelberg: Springer.

  • Aujol, J. F., Gilboa, G., Chan, T., & Osher, S. (2006). Structure-texture image decomposition-modeling, algorithms, and parameter selection. International Journal of Computer Vision (IJCV), 67(1), 111–136. https://doi.org/10.1007/s11263-006-4331-z.

    Article  MATH  Google Scholar 

  • Black, M. J., & Anandan, P. (1996). The robust estimation of multiple motions: Parametric and piecewise-smooth flow fields. Computer Vision and Image Understanding, 63(1), 75–104.

    Article  Google Scholar 

  • Bok, Y., Jeon, H. G., & Kweon, I. S. (2017). Geometric calibration of micro-lens-based light field cameras using line features. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 39(2), 287–300.

    Article  Google Scholar 

  • Brox, T., Bruhn, A., Papenberg, N., & Weickert, J. (2004). High accuracy optical flow estimation based on a theory for warping. In European conference on computer vision (ECCV) (pp. 25–36).

  • Brox, T., Bruhn, A., Papenberg, N., & Weickert, J. (2014). High accuracy optical flow estimation based on a theory for warping. In European conference on computer vision (ECCV) (Vol. 3024, pp. 25–36). Springer.

  • Bruhn, A., Weickert, J., & Schnörr, C. (2005). Lucas/Kanade meets Horn/Schunck: Combining local and global optic flow methods. International Journal of Computer Vision (IJCV), 61(3), 211–231.

    Article  Google Scholar 

  • Chandraker, M. (2014a). On shape and material recovery from motion. In European conference on computer vision (ECCV) (pp. 202–217). Heidelberg: Springer.

    Chapter  Google Scholar 

  • Chandraker, M. (2014b). What camera motion reveals about shape with unknown BRDF. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2171–2178). Washington: IEEE.

  • Chandraker, M. (2016). The information available to a moving observer on shape with unknown, isotropic brdfs. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 38(7), 1283–1297.

    Article  Google Scholar 

  • Dansereau, D. G., Mahon, I., Pizarro, O., & Williams, S. B. (2011). Plenoptic flow: Closed-form visual odometry for light field cameras. In IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 4455–4462). Washington: IEEE.

  • Dansereau, D. G., Schuster, G., Ford, J., & Wetzstein, G. (2017). A wide-field-of-view monocentric light field camera. In IEEE conference on computer vision and pattern recognition (CVPR). Washington: IEEE.

  • Gottfried, J. M., Fehr, J., & Garbe, C. S. (2011). Computing range flow from multi-modal kinect data. In International symposium on visual computing (pp. 758–767). Heidelberg: Springer.

  • Hasinoff, S. W., Durand, F., & Freeman, W. T. (2010). Noise-optimal capture for high dynamic range photography. In: IEEE conference on computer vision and pattern recognition (CVPR) (pp. 553–560). IEEE.

  • Haussecker, H. W., & Fleet, D. J. (2001). Computing optical flow with physical models of brightness variation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6), 661–673.

    Article  Google Scholar 

  • Heber, S., & Pock, T. (2014). Scene flow estimation from light fields via the preconditioned primal–dual algorithm. In X. Jiang, J. Hornegger, & R. Koch (Eds.), Pattern recognition (pp. 3–14). Cham: Springer International.

    Google Scholar 

  • Horn, B. K., & Schunck, B. G. (1981). Determining optical flow. Artificial Intelligence, 17(1–3), 185–203.

    Article  Google Scholar 

  • Hung, C. H., Xu, L., & Jia, J. (2013). Consistent binocular depth and scene flow with chained temporal profiles. International Journal of Computer Vision (IJCV), 102(1–3), 271–292.

    Article  Google Scholar 

  • Jaimez, M., Souiai, M., Gonzalez-Jimenez, J., & Cremers, D. (2015). A primal-dual framework for real-time dense rgb-d scene flow. In IEEE international conference on robotics and automation (ICRA) (pp. 98–104). Washington: IEEE.

  • Jo, K., Gupta, M., & Nayar, S. K. (2015). SpeDo: 6 DOF ego-motion sensor using speckle defocus imaging. InIEEE international conference on computer vision (ICCV) (pp. 4319–4327). Washington: IEEE.

  • Johannsen, O., Sulc, A., & Goldluecke, B. (2015). On linear structure from motion for light field cameras. In IEEE international conference on computer vision (ICCV) (pp. 720–728). Washington: IEEE.

  • Letouzey, A., Petit, B., & Boyer, E. (2011). Scene flow from depth and color images. In British machine vision conference (BMVC) (pp. 46–56). BMVA Press.

  • Levoy, M., & Hanrahan, P. (1996). Light field rendering. In SIGGRAPH conference on computer graphics and interactive techniques (pp. 31–42). New York: ACM.

  • Li, Z., Xu, Z., Ramamoorthi, R., & Chandraker, M. (2017). Robust energy minimization for BRDF-invariant shape from light fields. In IEEE conference on computer vision and pattern recognition (CVPR) (Vol. 1). Washington: IEEE.

  • Lucas, B. D., Kanade, T., et al. (1981). An iterative image registration technique with an application to stereo vision. In International joint conference on artificial intelligence (pp. 674–679). San Francisco: Morgan Kaufmann.

  • Ma, S., Smith, B. M., & Gupta, M. (2018). 3D scene flow from 4D light field gradients. In European conference on computer vision (ECCV) (Vol. 8, pp. 681–698). Springer International Publishing.

  • Navarro, J., & Garamendi, J. (2016). Variational scene flow and occlusion detection from a light field sequence. In International conference on systems, signals and image processing (IWSSIP) (pp. 1–4). Washington: IEEE.

  • Neumann, J., Fermuller, C., & Aloimonos, Y. (2003). Polydioptric camera design and 3d motion estimation. In IEEE conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. II–294). Washington: IEEE.

  • Neumann, J., Fermüller, C., & Aloimonos, Y. (2004). A hierarchy of cameras for 3d photography. Computer Vision and Image Understanding, 96(3), 274–293.

    Article  Google Scholar 

  • Ng, R., Levoy, M., Brédif, M., Duval, G., Horowitz, M., & Hanrahan, P. (2005). Light field photography with a hand-held plenoptic camera. Computer Science Technical Report CSTR, 2(11), 1–11.

    Google Scholar 

  • Odobez, J. M., & Bouthemy, P. (1995). Robust multiresolution estimation of parametric motion models. Journal of Visual Communication and Image Representation, 6(4), 348–365.

    Article  Google Scholar 

  • Phong, B. T. (1975). Illumination for computer generated pictures. Communications of the ACM, 18(6), 311–317. https://doi.org/10.1145/360825.360839.

    Article  Google Scholar 

  • Ranjan, A., Jampani, V., Balles, L., Kim, K., Sun, D., Wulff, J., et al. (2018). Competitive collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. arXiv: 1805.09806 [cs]

  • Shi, J., & Tomasi, C. (1994). Good features to track. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 593–600). Washington: IEEE.

  • Smith, B., O’Toole, M., & Gupta, M. (2018). Tracking multiple objects outside the line of sight using speckle imaging. In IEEE conference on computer vision and pattern recognition (CVPR). IEEE.

  • Smith, B. M., Desai, P., Agarwal, V., & Gupta, M. (2017). CoLux: Multi-object 3d micro-motion analysis using speckle imaging. ACM Transactions on Graphics, 36(4), 1–12.

    Google Scholar 

  • Srinivasan, P. P., Tao, M. W., Ng, R., & Ramamoorthi, R. (2015). Oriented light-field windows for scene flow. In IEEE international conference on computer vision (ICCV) (pp. 3496–3504). Washington: IEEE.

  • Sun, D., Roth, S., & Black, M. J. (2010). Secrets of optical flow estimation and their principles. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2432–2439). Washington, IEEE.

  • Sun, D., Sudderth, E. B., & Pfister, H. (2015). Layered RGBD scene flow estimation. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 548–556). Washington: IEEE.

  • Tao, M. W., Hadap, S., Malik, J., & Ramamoorthi, R. (2013). Depth from combining defocus and correspondence using light-field cameras. In IEEE international conference on computer vision (ICCV) (pp. 673–680). Washington: IEEE.

  • Vedula, S., Baker, S., Rander, P., Collins, R., & Kanade, T. (1999). Three-dimensional scene flow. In IEEE international conference on computer vision (ICCV) (Vol. 2, pp. 722–729). Washington: IEEE.

  • Vijayanarasimhan, S., Ricco, S., Schmid, C., Sukthankar, R., & Fragkiadaki, K. (2017). SfM-Net: Learning of structure and motion from video. arXiv:1704.07804 [cs]

  • Wang, T. C., Chandraker, M., Efros, A. A., & Ramamoorthi, R. (2016). SVBRDF-invariant shape and reflectance estimation from light-field cameras. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5451–5459). Washington: IEEE.

  • Wanner, S., & Goldluecke, B. (2014). Variational light field analysis for disparity estimation and super-resolution. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 36(3), 606–619.

    Article  Google Scholar 

  • Wedel, A., Rabe, C., Vaudrey, T., Brox, T., Franke, U., & Cremers, D. (2008). Efficient dense scene flow from sparse or dense stereo data. In European conference on computer vision (ECCV) (pp. 739–751). Heidelberg: Springer.

  • Yin, Z., & Shi, J. (2018). GeoNet: Unsupervised learning of dense depth, optical flow and camera pose. In IEEE/CVF conference on computer vision and pattern recognition (pp. 1983–1992). Salt Lake City, UT: IEEE. https://doi.org/10.1109/CVPR.2018.00212, https://ieeexplore.ieee.org/document/8578310/

  • Zhang, Y., Li, Z., Yang, W., Yu, P., Lin, H., & Yu, J. (2017). The light field 3d scanner. In IEEE international conference on computational photography (ICCP) (pp. 1–9). Washington: IEEE.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohit Gupta.

Additional information

Communicated by Yair Weiss.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors would like to thank ONR Grant No. N00014-16-1-2995 and Defense Advanced Research Projects Agency Grant No. DARPA REVEAL program for funding this research.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 23017 KB)

Supplementary material 2 (pdf 1402 KB)

Appendices

Appendix A: Proof of Result 2

Result 2 (Rank of structure tensor) Structure tensor \(\mathbf {S}\) has three possible ranks: 0, 2, and 3 for a local 4D light field window. These correspond to scene patches with no texture (smooth regions), an edge, and 2D texture, respectively.

Proof

We first show the three cases of rank 0, 2 and 3, and then we prove that the structure tensor cannot be rank 1.

Since \(rank(\mathbf {S})=rank(\mathbf {A}^T\mathbf {A})=rank(\mathbf {A})\), we only need to look at the rank of the \(n\times 3\) matrix \(\mathbf {A}\):

$$\begin{aligned} \mathbf {A}=\begin{bmatrix}\mathbf {A}_1&\mathbf {A}_2&\mathbf {A}_3\end{bmatrix} =\begin{bmatrix} L_{X1}&\quad L_{Y1}&\quad L_{Z1} \\ L_{X2}&\quad L_{Y2}&\quad L_{Z2} \\ \vdots&\quad \vdots&\quad \vdots \\ L_{Xn}&\quad L_{Yn}&\quad L_{Zn} \\ \end{bmatrix}. \end{aligned}$$
(16)

Case 1: Smooth region. In this case, \(L_X = L_Y = L_Z = 0\) for all the locations in the light field window. Therefore, all the entries of \(\mathbf {A}\) are zero, resulting in a rank 0 structure tensor. All three eigenvalues \(\lambda _1, \lambda _2, \lambda _3 = 0\). As a result, it has a 3-D null space, and no motion vector can be recovered reliably.

Case 2: Single step edge. Without loss of generality, suppose the light field window corresponds to a fronto-parallel scene patch with a vertical edge, i.e., \(L_Y = 0\) everywhere and thus \(\mathbf {A}_2=\mathbf {0}\).

Consider a point P on the edge. Consider two rays from P that are captured on two horizontally-separated sub-aperture images indexed by \((x_a,y)\) and \((x_b,y)\). Let the coordinates of the two rays be \((x_a,y,u_a,v_a)\) and \((x_b,y,u_b,v_b)\), and let the light field gradients at these two rays be \((L_{Xa},L_{Ya},L_{Za})\) and \((L_{Xb},L_{Yb},L_{Zb})\). Note that these two gradients are two rows in \(\mathbf {A}\). Recall that

$$\begin{aligned} L_{Z}=-\frac{u}{\varGamma }L_{X}-\frac{v}{\varGamma }L_{Y} . \end{aligned}$$
(17)

Since \(L_{Ya}=0\) and \(L_{Yb}=0\), we have,

$$\begin{aligned} L_{Za}=-\frac{u_a}{\varGamma }L_{Xa},\; L_{Zb}=-\frac{u_b}{\varGamma }L_{Xb}. \end{aligned}$$
(18)

Next, suppose there exists \(k\ne 0\) such that \(\mathbf {A}_3=k\mathbf {A}_1\). This implies

$$\begin{aligned} kL_{Xa}=-\frac{u_a}{\varGamma }L_{Xa},\;kL_{Xb}=-\frac{u_b}{\varGamma }L_{Xb}. \end{aligned}$$
(19)

By eliminating k from Eq. 19, we get \(u_a=u_b\). However, since the scene point has a finite depth, the disparity \(u_a - u_b \ne 0\). Therefore, \(u_a \ne u_b\). This contradiction means such k doesn’t exist and \(\mathbf {A}_1\) and \(\mathbf {A}_3\) are linearly independent, which means the rank of \(\mathbf {A}\) (and \(\mathbf {S}\)) is 2. As a result \(\mathbf {S}\) has a 1-D null space (only one eigenvalue \(\lambda _3 = 0\)) and a 2D family of motions (motion orthogonal to the edge) can be recovered.

Case 3: 2D texture. In general \(\mathbf {A}_1\), \(\mathbf {A}_2\) and \(\mathbf {A}_3\) are nonzero and independent. The structure tensor is full rank (rank \(=3\)) and the entire space of 3D motions are recoverable.

Now we show that the rank cannot be 1.

(Proof by contradiction) Assume there exists a 4D patch such that its corresponding matrix \(\mathbf {A}\) is rank 1.

First \(\mathbf {A}_1\) and \(\mathbf {A}_2\) cannot both be zero. If we assume they are both zero then according to Eq. 17 all entries in \(\mathbf {A}_3\) will also be zero, which results in a rank 0 matrix. Therefore \(\mathbf {A}_1\ne \mathbf {0}\) or \(\mathbf {A}_2\ne \mathbf {0}\).

Without loss of generality, assume \(\mathbf {A}_1 \ne 0\). Since \(\mathbf {A}\) is rank 1, there exists \(k,l\in \mathbb {R}\) such that

$$\begin{aligned} \mathbf {A}_2= & {} k\mathbf {A}_1, \end{aligned}$$
(20)
$$\begin{aligned} \mathbf {A}_3= & {} l\mathbf {A}_1. \end{aligned}$$
(21)

Let us pick a ray \(\mathbf {x_a}=(x_a,y_a,u_a,v_a)\) with light field gradient \((L_{Xa},L_{Ya},L_{Za})\) such that \(L_{Xa}\ne 0\). Such \(\mathbf {x_a}\) exists because \(\mathbf {A}_1\ne 0\). Note that this ray is captured by the sub-aperture image indexed by \((x_a,y_a)\). Assume the scene point corresponding to \(\mathbf {x_a}\) is observed in another sub-aperture image \((x_b,y_b)\) with \(y_a=y_b\), in other words a sub-aperture image that is at the same horizontal line as \((x_a,y_a)\). Denote the corresponding ray as \(\mathbf {x_b}=(x_b,y_a,u_b,v_b)\) with light field gradient \((L_{Xb},L_{Yb},L_{Zb})\). \(L_{Xb}\) is also nonzero.

From Eq. 20 we know that \(L_{Ya}=kL_{Xa}\) and \(L_{Yb}=kL_{Xb}\). According to Eq. 17 we have

$$\begin{aligned} L_{Za}= & {} -\frac{u_a}{\varGamma }L_{Xa}-\frac{v_a}{\varGamma }\cdot L_{Ya}=-\frac{u_a+kv_a}{\varGamma }L_{Xa}, \end{aligned}$$
(22)
$$\begin{aligned} L_{Zb}= & {} -\frac{u_b}{\varGamma }L_{Xb}-\frac{v_b}{\varGamma }\cdot L_{Yb}=-\frac{u_b+kv_b}{\varGamma }L_{Xb}. \end{aligned}$$
(23)

From Eq. 21 we know that \(L_{Za}=l L_{Xa}\), \(L_{Zb}=l L_{Xb}\), and from Eqs. 2223 we have

$$\begin{aligned} l=-\frac{u_a+kv_a}{\varGamma }=-\frac{u_b+kv_b}{\varGamma }. \end{aligned}$$
(24)

However since \(x_a\ne x_b\), \(y_a=y_b\), we have \(u_a\ne u_b\) and \(v_a=v_b\) due to simple epipolar geometry. Therefore Eq. 24 cannot hold, which means our assumption is false, and \(rank(\mathbf {A})\) cannot be 1. \(\square \)

Appendix B: Implementation Details

1.1 B.1 Global Method

In Sect. 5, we introduced the global ‘Horn–Schunck’ ray flow method, which solves for the 3D scene motion by minimizing a functional:

$$\begin{aligned} E(\mathbf {V})= & {} \underbrace{E_D (\mathbf {V})}_\mathbf{Error term } + \underbrace{E_S (\mathbf {V})}_\mathbf{Smoothness term } , \ \ \text {where}\nonumber \\ E_D (\mathbf {V})= & {} \int _{\varOmega }\left( L_X V_X + L_Y V_Y + L_Z V_Z + L_t \right) ^2 dx\,dy\,du\,dv , \nonumber \\ E_S (\mathbf {V})= & {} \int _{\varOmega } \left( \lambda |\nabla V_X|^2+\lambda |\nabla V_Y|^2+\lambda _Z|\nabla V_Z|^2 \right) dx\,dy\,du\,dv .\nonumber \\ \end{aligned}$$
(25)

This is a convex functional and its minimum can be found by the Euler–Lagrange equations,

$$\begin{aligned} L_X(L_XV_X+L_YV_Y+L_ZV_Z)-\lambda \varDelta V_X&=-L_XL_t, \nonumber \\ L_Y(L_XV_X+L_YV_Y+L_ZV_Z)-\lambda \varDelta V_Y&=-L_YL_t , \nonumber \\ L_Z(L_XV_X+L_YV_Y+L_ZV_Z)-\lambda _Z\varDelta V_Z&=-L_ZL_t. \end{aligned}$$
(26)

These equations can be discretized as a sparse linear system, and is solved using Successive Over-Relaxation (SOR).

1.2 B.2 Structure-Aware Global Method

In this section we discuss an enhanced version of the structure-aware global method, which adopts the enhancing techniques for the local and global method, as discussed in Sect. 6 of the main paper.

1.2.1 B.2.1 Data Term

The data term is defined as:

$$\begin{aligned} E_D (\mathbf {V})=\int _{\varOmega _c}\sum _{\mathbf {x}_i\in \mathscr {P}(u,v)}h_i\rho _D((L_0(\mathbf {x_i})-L_1(\mathbf {w}(\mathbf {x_i},\mathbf {V})))^2)du\,dv, \end{aligned}$$
(27)

where \(\mathscr {P}(u,v)\) is the 2D plane defined in Equation (11) in the main paper.

Weighted 2D window Notice that each ray in the 2D plane is given a different weight \(h_i\), which is given by

$$\begin{aligned}&h_i=h(\mathbf {x_i},\mathbf {x_c})=h_g(\mathbf {x_i},\mathbf {x_c})\cdot h_o(\mathbf {x_i},\mathbf {x_c}), \end{aligned}$$
(28)
$$\begin{aligned}&h_g(\mathbf {x_i},\mathbf {x_c}) = e^{-\frac{(x_i-x_c)^2+(y_i-y_c)^2+(u_i-u_c+\alpha (x_i-x_c))^2+(v_i-v_c+\alpha (y_i-y_c))^2}{\sigma _g^2}},\nonumber \\\end{aligned}$$
(29)
$$\begin{aligned}&h_o(\mathbf {x_i},\mathbf {x_c}) = e^{-\frac{(d_{\alpha i}-d_{\alpha c})^2}{\sigma _o^2}}, \end{aligned}$$
(30)

where \(\mathbf {x_c}\) denotes the center ray of the window. \(d_\alpha =1/\alpha \) and is proportional to the actual depth of the scene point.

\(h_g\) defines a Gaussian weight function that is based on the distance between \(\mathbf {x_i}\) and \(\mathbf {x_c}\) in the 2D plane. \(h_o\) defines an occlusion weight by penalizing the difference in the estimated disparity \(\alpha \) at \(\mathbf {x_i}\) and \(\mathbf {x_c}\). Notice that not all rays on \(\mathscr {P}(u,v)\) corresponds to the same scene point as \(\mathbf {x}_c\) because of occlusion. If the scene point corresponding to \(\mathbf {x}_i\) occludes or is occluded by the scene point corresponding to \(\mathbf {x}_c\), they will have a different \(\alpha \) and thus a small value of \(h_o\).

1.2.2 B.2.2 Smoothness Term

The smoothness term is defined as

$$\begin{aligned} E_S(\mathbf {V})&=\int _{\varOmega }g(\mathbf {x})(\lambda \sum _{i=1}^2\rho _S(V_{X(i)}^2) +\lambda \sum _{i=1}^2\rho _S(V_{Y(i)}^2)\nonumber \\&\quad +\lambda _Z\sum _{i=1}^2\rho _S(V_{Z(i)}^2))du\, dv, \end{aligned}$$
(31)

where \(V_{X(i)}\) is short for \(\frac{\partial V_X}{\partial u^{(i)}}\). (For simplicity we denote uv as \(u^{(1)},u^{(2)}\) respectively.) \(g(\mathbf {x})\) is a weight function that varies across the light field. The error term \(E_C(\mathbf {V})\) uses the warp function (Eq. 8) in the main paper.

Best practices from optical flow We choose the penalty function \(\rho _D\), \(\rho _S\) to be the generalized Charbonnier penalty function \(\rho (x^2)=(x^2+\epsilon ^2)^a\) with \(a=0.45\) as suggested in Sun et al. (2010).

Weight function for the regularization term The weight function \(g(\mathbf {x})\) consists of two parts, which is combined using a harmonic mean:

$$\begin{aligned} g(\mathbf {x}) = \frac{g_c(\mathbf {x})g_d(\mathbf {x})}{g_c(\mathbf {x})+g_d(\mathbf {x})}. \end{aligned}$$
(32)

Consistency between XY-motion and Z-motion. In practice we notice that motion discontinuity is preserved better in XY-motion than in Z-motion. To improve the accuracy of the Z-motion, we solve the 3D motion \(\mathbf {V}\) in a two-step process. We compute an initial estimate of the XY-motion, denoted as \(\mathbf {U}=(U_X,U_Y)\), in the first pass. We then use \(\mathbf {U}\) to compute a weight map for the regularization term:

$$\begin{aligned} g_c(\mathbf {x}) = \frac{1}{1+(|\nabla U_X|^2+|\nabla U_Y|^2)/\sigma _c^2}, \end{aligned}$$
(33)

where \(\mathscr {N}(\mathbf {x})\) denotes a local neighborhood of the point \(\mathbf {x}\). Then the full 3D motion \(\mathbf {V}\) is computed in a second pass. Notice that \(g(\mathbf {x})\) is small where gradient of \(\mathbf {U}\) is large, in other words the regularization term will contribute less to the whole energy where there is a discontinuity in \(\mathbf {U}\).

Consistency between motion boundaries and depth boundaries. We also assume the motion boundaries are likely to align with depth boundaries. In other words, we give a lower weight for points where the depth gradient is large:

$$\begin{aligned} g_d(\mathbf {x}) = \frac{1}{1+|\nabla d_\alpha |^2/\sigma _d^2}. \end{aligned}$$
(34)

1.2.3 B.2.3 Optimization

The error term \(E_D(V)\) can be linearized as ,

$$\begin{aligned} E_D' (\mathbf {V})=\int _{\varOmega _c}\sum _{\mathbf {x}_i\in \mathscr {P}(u,v)}h_i\rho _D((L_{Xi}V_X+L_{Yi}V_Y+L_{Zi}V_Z+L_{ti})^2)du\,dv. \end{aligned}$$
(35)

Then the entire energy \(E'=E_D'+E_S\) can be minimized using Euler–Lagrange equations:

$$\begin{aligned}&\sum _{\mathbf {x}_i\in \mathscr {P}(u,v)}h_i\rho _D'L_X\delta _L-\lambda \sum _{i=1}^2\frac{\partial }{\partial u^{(i)}}(g\rho _S'(V_{X(i)})V_{X(i)})= \nonumber \\&\quad -\sum _{\mathbf {x}_i\in \mathscr {P}(u,v)}h_i\rho _D'L_XL_t, \nonumber \\&\sum _{\mathbf {x}_i\in \mathscr {P}(u,v)}h_i\rho _D'L_Y\delta _L-\lambda \sum _{i=1}^2\frac{\partial }{\partial u^{(i)}}(g\rho _S'(V_{Y(i)})V_{Y(i)})=\nonumber \\&\quad -\sum _{\mathbf {x}_i\in \mathscr {P}(u,v)}h_i\rho _D'L_YL_t, \nonumber \\&\sum _{\mathbf {x}_i\in \mathscr {P}(u,v)}h_i\rho _D'L_Z\delta _L-\lambda _Z\sum _{i=1}^2\frac{\partial }{\partial u^{(i)}}(g\rho _S'(V_{Z(i)})V_{Z(i)})=\nonumber \\&\quad -\sum _{\mathbf {x}_i\in \mathscr {P}(u,v)}h_i\rho _D'L_ZL_t , \end{aligned}$$
(36)

where \(\rho _D'\) is short for \(\rho _D'((L_XV_X+L_YV_Y+L_ZV_Z+L_t)^2)\), \(\delta _L=L_XV_X+L_YV_Y+L_ZV_Z\). Again, these equations are discretized and solved using SOR. The linearization step can then be repeated in an iterative, multi-resolution framework.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, S., Smith, B.M. & Gupta, M. Differential Scene Flow from Light Field Gradients. Int J Comput Vis 128, 679–697 (2020). https://doi.org/10.1007/s11263-019-01230-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-019-01230-z

Keywords