Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-031-31435-3_25guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Raw or Cooked? Object Detection on RAW Images

Published: 27 April 2023 Publication History

Abstract

Images fed to a deep neural network have in general undergone several handcrafted image signal processing (ISP) operations, all of which have been optimized to produce visually pleasing images. In this work, we investigate the hypothesis that the intermediate representation of visually pleasing images is sub-optimal for downstream computer vision tasks compared to the RAW image representation. We suggest that the operations of the ISP instead should be optimized towards the end task, by learning the parameters of the operations jointly during training. We extend previous works on this topic and propose a new learnable operation that enables an object detector to achieve superior performance when compared to both previous works and traditional RGB images. In experiments on the open PASCALRAW dataset, we empirically confirm our hypothesis.

References

[1]
Åström F, Zografos V, and Felsberg M Kämäräinen J-K and Koskela M Density driven diffusion Image Analysis 2013 Heidelberg Springer 718-730
[2]
Bayer, B.E.: Color imaging array. United States Patent 3,971,065 (1976)
[3]
Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 2, pp. 60–65. IEEE (2005)
[4]
Buckler, M., Jayasuriya, S., Sampson, A.: Reconfiguring the imaging pipeline for computer vision. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 975–984 (2017)
[5]
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, and Zagoruyko S Vedaldi A, Bischof H, Brox T, and Frahm J-M End-to-end object detection with transformers Computer Vision – ECCV 2020 2020 Cham Springer 213-229
[6]
Ciufolini I and Paolozzi A Mathematical prediction of the time evolution of the COVID-19 pandemic in Italy by a gauss error function and monte Carlo simulations Eur. Phys. J. Plus 2020 135 4 355
[7]
Condat, L.: A simple, fast and efficient approach to denoisaicking: Joint demosaicking and denoising. In: 2010 IEEE International Conference on Image Processing, pp. 905–908. IEEE (2010)
[8]
Dai L, Liu X, Li C, and Chen J Bartoli A and Fusiello A AWNet: attentive wavelet network for image ISP Computer Vision – ECCV 2020 Workshops 2020 Cham Springer 185-201
[9]
Dubois, E.: Filter design for adaptive frequency-domain Bayer demosaicking. In: 2006 International Conference on Image Processing, pp. 2705–2708. IEEE (2006)
[10]
Foi A, Trimeche M, Katkovnik V, and Egiazarian K Practical poissonian-gaussian noise modeling and fitting for single-image raw-data IEEE Trans. Image Process. 2008 17 10 1737-1754
[11]
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
[12]
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256. JMLR Workshop and Conference Proceedings (2010)
[13]
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
[14]
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
[15]
Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016)
[16]
Hirakawa K and Parks TW Adaptive homogeneity-directed demosaicing algorithm IEEE Trans. Image Process. 2005 14 3 360-369
[17]
Hong, Y., Wei, K., Chen, L., Fu, Y.: Crafting object detection in very low light. In: BMVC, vol. 1, p. 3 (2021)
[18]
HP, A.W., Prasetyo, H., Guo, J.M.: Autoencoder-based image companding. In: 2020 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-Taiwan), pp. 1–2. IEEE (2020)
[19]
Ignatov, A., Van Gool, L., Timofte, R.: Replacing mobile camera ISP with a single deep learning model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 536–537 (2020)
[20]
Krawczyk, G., Myszkowski, K., Seidel, H.P.: Lightness perception in tone reproduction for high dynamic range images. In: Computer Graphics Forum, vol. 24, pp. 635–646. Amsterdam: North Holland, 1982- (2005)
[21]
Kriesel D Traue keinem scan, den du nicht selbst gefälscht hast Mitteilungen der Deutschen Mathematiker-Vereinigung 2014 22 1 30-34
[22]
Langseth, R., Gaddam, V.R., Stensland, H.K., Griwodz, C., Halvorsen, P.: An evaluation of debayering algorithms on GPU for real-time panoramic video recording. In: 2014 IEEE International Symposium on Multimedia, pp. 110–115. IEEE (2014)
[23]
Li, X., Gunturk, B., Zhang, L.: Image demosaicing: a systematic survey. In: Visual Communications and Image Processing 2008, vol. 6822, pp. 489–503. SPIE (2008)
[24]
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
[25]
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
[26]
Lin T-Y et al. Fleet D, Pajdla T, Schiele B, Tuytelaars T, et al. Microsoft COCO: common objects in context Computer Vision – ECCV 2014 2014 Cham Springer 740-755
[27]
Liu, Z., et al.: SWIN transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
[28]
Malvar, H.S., He, L.W., Cutler, R.: High-quality linear interpolation for demosaicing of bayer-patterned color images. In: 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 3, pp. iii–485. IEEE (2004)
[29]
Meng, D., et al.: Conditional DETR for fast training convergence. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3651–3660 (2021)
[30]
Morawski, I., Chen, Y.A., Lin, Y.S., Dangi, S., He, K., Hsu, W.H.: GENISP: neural ISP for low-light machine cognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 630–639 (2022)
[31]
Mujtaba, N., Khan, I.R., Khan, N.A., Altaf, M.A.B.: Efficient flicker-free tone mapping of HDR videos. In: 2022 IEEE 24th International Workshop on Multimedia Signal Processing (MMSP), pp. 01–06. IEEE (2022)
[32]
Olli Blom, M., Johansen, T.: End-to-end object detection on raw camera data (2021)
[33]
Omid-Zohoor, A., Ta, D., Murmann, B.: Pascalraw: raw image database for object detection (2014)
[34]
Poynton, C.: Digital video and HD: Algorithms and Interfaces. Elsevier (2012)
[35]
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
[36]
Reinhard, E., Stark, M., Shirley, P., Ferwerda, J.: Photographic tone reproduction for digital images. In: Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques, pp. 267–276 (2002)
[37]
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
[38]
[39]
Shekhar Tripathi, A., Danelljan, M., Shukla, S., Timofte, R., Van Gool, L.: Transform your smartphone into a DSLR camera: Learning the ISP in the wild. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision. ECCV 2022. ECCV 2022. LNCS, pp. 625–641. Springer, Cham (2022).
[40]
Suma R, Stavropoulou G, Stathopoulou EK, Van Gool L, Georgopoulos A, and Chalmers A Evaluation of the effectiveness of HDR tone-mapping operators for photogrammetric applications Virtual Archaeol. Rev. 2016 7 15 54-66
[41]
Sun, Z., Cao, S., Yang, Y., Kitani, K.M.: Rethinking transformer-based set prediction for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3611–3620 (2021)
[42]
Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9627–9636 (2019)
[43]
Wang, Y., Zhang, X., Yang, T., Sun, J.: Anchor DETR: query design for transformer-based detector. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 2567–2575 (2022)
[44]
Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). https://github.com/facebookresearch/detectron2
[45]
Yeo IK and Johnson RA A new family of power transformations to improve normality or symmetry Biometrika 2000 87 4 954-959
[46]
Yoshimura, M., Otsuka, J., Irie, A., Ohashi, T.: Dynamicisp: dynamically controlled image signal processor for image recognition. arXiv preprint arXiv:2211.01146 (2022)
[47]
Yoshimura, M., Otsuka, J., Irie, A., Ohashi, T.: Rawgment: noise-accounted raw augmentation enables recognition in a wide variety of environments. arXiv preprint arXiv:2210.16046 (2022)
[48]
Zhang, H., et al.: Dino: DETR with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:2203.03605 (2022)
[49]
Zhang X, Zhang L, and Lou X A raw image-based end-to-end object detection accelerator using hog features IEEE Trans. Circuits Syst. I: Regular Papers 2021 69 1 322-333
[50]
Zhang, Z., Wang, H., Liu, M., Wang, R., Zhang, J., Zuo, W.: Learning raw-to-srgb mappings with inaccurately aligned supervision. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4348–4358 (2021)
[51]
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint arXiv:1904.07850 (2019)
[52]
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2020)

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
Image Analysis: 22nd Scandinavian Conference, SCIA 2023, Sirkka, Finland, April 18–21, 2023, Proceedings, Part I.
Apr 2023
456 pages
ISBN:978-3-031-31434-6
DOI:10.1007/978-3-031-31435-3

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 27 April 2023

Author Tags

  1. Object Detection
  2. Image Signal Processing
  3. Machine Learning
  4. Deep Learning

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Oct 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media