-
A scalable framework for annotating photovoltaic cell defects in electroluminescence images
Authors:
Urtzi Otamendi,
Inigo Martinez,
Igor G. Olaizola,
Marco Quartulli
Abstract:
The correct functioning of photovoltaic (PV) cells is critical to ensuring the optimal performance of a solar plant. Anomaly detection techniques for PV cells can result in significant cost savings in operation and maintenance (O&M). Recent research has focused on deep learning techniques for automatically detecting anomalies in Electroluminescence (EL) images. Automated anomaly annotations can im…
▽ More
The correct functioning of photovoltaic (PV) cells is critical to ensuring the optimal performance of a solar plant. Anomaly detection techniques for PV cells can result in significant cost savings in operation and maintenance (O&M). Recent research has focused on deep learning techniques for automatically detecting anomalies in Electroluminescence (EL) images. Automated anomaly annotations can improve current O&M methodologies and help develop decision-making systems to extend the life-cycle of the PV cells and predict failures. This paper addresses the lack of anomaly segmentation annotations in the literature by proposing a combination of state-of-the-art data-driven techniques to create a Golden Standard benchmark. The proposed method stands out for (1) its adaptability to new PV cell types, (2) cost-efficient fine-tuning, and (3) leverage public datasets to generate advanced annotations. The methodology has been validated in the annotation of a widely used dataset, obtaining a reduction of the annotation cost by 60%.
△ Less
Submitted 15 December, 2022;
originally announced December 2022.
-
Integrating pre-processing pipelines in ODC based framework
Authors:
U. Otamendi,
I. Azpiroz,
M. Quartulli,
I. Olaizola
Abstract:
Using on-demand processing pipelines to generate virtual geospatial products is beneficial to optimizing resource management and decreasing processing requirements and data storage space. Additionally, pre-processed products improve data quality for data-driven analytical algorithms, such as machine learning or deep learning models. This paper proposes a method to integrate virtual products based…
▽ More
Using on-demand processing pipelines to generate virtual geospatial products is beneficial to optimizing resource management and decreasing processing requirements and data storage space. Additionally, pre-processed products improve data quality for data-driven analytical algorithms, such as machine learning or deep learning models. This paper proposes a method to integrate virtual products based on integrating open-source processing pipelines. In order to validate and evaluate the functioning of this approach, we have integrated it into a geo-imagery management framework based on Open Data Cube (ODC). To validate the methodology, we have performed three experiments developing on-demand processing pipelines using multi-sensor remote sensing data, for instance, Sentinel-1 and Sentinel-2. These pipelines are integrated using open-source processing frameworks.
△ Less
Submitted 4 October, 2022;
originally announced October 2022.
-
Geo-imagery management and statistical processing in a regional context using Open Data Cube
Authors:
U. Otamendi,
I. Azpiroz,
M. Quartulli,
I. Olaizola,
F. J. Perez,
D. Alda,
X. Garitano
Abstract:
We propose a methodology to manage and process remote sensing and geo-imagery data for non-expert users. The proposed system provides automated data ingestion and manipulation capability for analytical data-driven purposes. In this paper, we describe the technological basis of the proposed method in addition to describing the tool architecture, the inherent data flow, and its operation in a specif…
▽ More
We propose a methodology to manage and process remote sensing and geo-imagery data for non-expert users. The proposed system provides automated data ingestion and manipulation capability for analytical data-driven purposes. In this paper, we describe the technological basis of the proposed method in addition to describing the tool architecture, the inherent data flow, and its operation in a specific use case to provide statistical summaries of Sentinel-2 regions of interest corresponding to the cultivation of polygonal areas located in the Basque Country (ES).
△ Less
Submitted 4 October, 2022;
originally announced October 2022.
-
Closed-Form Diffeomorphic Transformations for Time Series Alignment
Authors:
Iñigo Martinez,
Elisabeth Viles,
Igor G. Olaizola
Abstract:
Time series alignment methods call for highly expressive, differentiable and invertible warping functions which preserve temporal topology, i.e diffeomorphisms. Diffeomorphic warping functions can be generated from the integration of velocity fields governed by an ordinary differential equation (ODE). Gradient-based optimization frameworks containing diffeomorphic transformations require to calcul…
▽ More
Time series alignment methods call for highly expressive, differentiable and invertible warping functions which preserve temporal topology, i.e diffeomorphisms. Diffeomorphic warping functions can be generated from the integration of velocity fields governed by an ordinary differential equation (ODE). Gradient-based optimization frameworks containing diffeomorphic transformations require to calculate derivatives to the differential equation's solution with respect to the model parameters, i.e. sensitivity analysis. Unfortunately, deep learning frameworks typically lack automatic-differentiation-compatible sensitivity analysis methods; and implicit functions, such as the solution of ODE, require particular care. Current solutions appeal to adjoint sensitivity methods, ad-hoc numerical solvers or ResNet's Eulerian discretization. In this work, we present a closed-form expression for the ODE solution and its gradient under continuous piecewise-affine (CPA) velocity functions. We present a highly optimized implementation of the results on CPU and GPU. Furthermore, we conduct extensive experiments on several datasets to validate the generalization ability of our model to unseen data for time-series joint alignment. Results show significant improvements both in terms of efficiency and accuracy.
△ Less
Submitted 16 June, 2022;
originally announced June 2022.
-
A survey study of success factors in data science projects
Authors:
Iñigo Martinez,
Elisabeth Viles,
Igor G. Olaizola
Abstract:
In recent years, the data science community has pursued excellence and made significant research efforts to develop advanced analytics, focusing on solving technical problems at the expense of organizational and socio-technical challenges. According to previous surveys on the state of data science project management, there is a significant gap between technical and organizational processes. In thi…
▽ More
In recent years, the data science community has pursued excellence and made significant research efforts to develop advanced analytics, focusing on solving technical problems at the expense of organizational and socio-technical challenges. According to previous surveys on the state of data science project management, there is a significant gap between technical and organizational processes. In this article we present new empirical data from a survey to 237 data science professionals on the use of project management methodologies for data science. We provide additional profiling of the survey respondents' roles and their priorities when executing data science projects. Based on this survey study, the main findings are: (1) Agile data science lifecycle is the most widely used framework, but only 25% of the survey participants state to follow a data science project methodology. (2) The most important success factors are precisely describing stakeholders' needs, communicating the results to end-users, and team collaboration and coordination. (3) Professionals who adhere to a project methodology place greater emphasis on the project's potential risks and pitfalls, version control, the deployment pipeline to production, and data security and privacy.
△ Less
Submitted 17 January, 2022;
originally announced January 2022.
-
A novel method for error analysis in radiation thermometry with application to industrial furnaces
Authors:
Iñigo Martinez,
Urtzi Otamendi,
Igor G. Olaizola,
Roger Solsona,
Mikel Maiza,
Elisabeth Viles,
Arturo Fernandez,
Ignacio Arzua
Abstract:
Accurate temperature measurements are essential for the proper monitoring and control of industrial furnaces. However, measurement uncertainty is a risk for such a critical parameter. Certain instrumental and environmental errors must be considered when using spectral-band radiation thermometry techniques, such as the uncertainty in the emissivity of the target surface, reflected radiation from su…
▽ More
Accurate temperature measurements are essential for the proper monitoring and control of industrial furnaces. However, measurement uncertainty is a risk for such a critical parameter. Certain instrumental and environmental errors must be considered when using spectral-band radiation thermometry techniques, such as the uncertainty in the emissivity of the target surface, reflected radiation from surrounding objects, or atmospheric absorption and emission, to name a few. Undesired contributions to measured radiation can be isolated using measurement models, also known as error-correction models. This paper presents a methodology for budgeting significant sources of error and uncertainty during temperature measurements in a petrochemical furnace scenario. A continuous monitoring system is also presented, aided by a deep-learning-based measurement correction model, to allow domain experts to analyze the furnace's operation in real-time. To validate the proposed system's functionality, a real-world application case in a petrochemical plant is presented. The proposed solution demonstrates the viability of precise industrial furnace monitoring, thereby increasing operational security and improving the efficiency of such energy-intensive systems.
△ Less
Submitted 10 January, 2022;
originally announced January 2022.
-
ArchABM: an agent-based simulator of human interaction with the built environment. $CO_2$ and viral load analysis for indoor air quality
Authors:
Iñigo Martinez,
Jan L. Bruse,
Ane M. Florez-Tapia,
Elisabeth Viles,
Igor G. Olaizola
Abstract:
Recent evidence suggests that SARS-CoV-2, which is the virus causing a global pandemic in 2020, is predominantly transmitted via airborne aerosols in indoor environments. This calls for novel strategies when assessing and controlling a building's indoor air quality (IAQ). IAQ can generally be controlled by ventilation and/or policies to regulate human-building-interaction. However, in a building,…
▽ More
Recent evidence suggests that SARS-CoV-2, which is the virus causing a global pandemic in 2020, is predominantly transmitted via airborne aerosols in indoor environments. This calls for novel strategies when assessing and controlling a building's indoor air quality (IAQ). IAQ can generally be controlled by ventilation and/or policies to regulate human-building-interaction. However, in a building, occupants use rooms in different ways, and it may not be obvious which measure or combination of measures leads to a cost- and energy-effective solution ensuring good IAQ across the entire building. Therefore, in this article, we introduce a novel agent-based simulator, ArchABM, designed to assist in creating new or adapt existing buildings by estimating adequate room sizes, ventilation parameters and testing the effect of policies while taking into account IAQ as a result of complex human-building interaction patterns. A recently published aerosol model was adapted to calculate time-dependent carbon dioxide ($CO_2$) and virus quanta concentrations in each room and inhaled $CO_2$ and virus quanta for each occupant over a day as a measure of physiological response. ArchABM is flexible regarding the aerosol model and the building layout due to its modular architecture, which allows implementing further models, any number and size of rooms, agents, and actions reflecting human-building interaction patterns. We present a use case based on a real floor plan and working schedules adopted in our research center. This study demonstrates how advanced simulation tools can contribute to improving IAQ across a building, thereby ensuring a healthy indoor environment.
△ Less
Submitted 14 January, 2022; v1 submitted 2 November, 2021;
originally announced November 2021.
-
Segmentation of cell-level anomalies in electroluminescence images of photovoltaic modules
Authors:
Urtzi Otamendi,
Iñigo Martinez,
Marco Quartulli,
Igor G. Olaizola,
Elisabeth Viles,
Werther Cambarau
Abstract:
In the operation & maintenance (O&M) of photovoltaic (PV) plants, the early identification of failures has become crucial to maintain productivity and prolong components' life. Of all defects, cell-level anomalies can lead to serious failures and may affect surrounding PV modules in the long run. These fine defects are usually captured with high spatial resolution electroluminescence (EL) imaging.…
▽ More
In the operation & maintenance (O&M) of photovoltaic (PV) plants, the early identification of failures has become crucial to maintain productivity and prolong components' life. Of all defects, cell-level anomalies can lead to serious failures and may affect surrounding PV modules in the long run. These fine defects are usually captured with high spatial resolution electroluminescence (EL) imaging. The difficulty of acquiring such images has limited the availability of data. For this work, multiple data resources and augmentation techniques have been used to surpass this limitation. Current state-of-the-art detection methods extract barely low-level information from individual PV cell images, and their performance is conditioned by the available training data. In this article, we propose an end-to-end deep learning pipeline that detects, locates and segments cell-level anomalies from entire photovoltaic modules via EL images. The proposed modular pipeline combines three deep learning techniques: 1. object detection (modified Faster-RNN), 2. image classification (EfficientNet) and 3. weakly supervised segmentation (autoencoder). The modular nature of the pipeline allows to upgrade the deep learning models to the further improvements in the state-of-the-art and also extend the pipeline towards new functionalities.
△ Less
Submitted 14 January, 2022; v1 submitted 21 June, 2021;
originally announced June 2021.
-
Data Science Methodologies: Current Challenges and Future Approaches
Authors:
Iñigo Martinez,
Elisabeth Viles,
Igor G. Olaizola
Abstract:
Data science has employed great research efforts in developing advanced analytics, improving data models and cultivating new algorithms. However, not many authors have come across the organizational and socio-technical challenges that arise when executing a data science project: lack of vision and clear objectives, a biased emphasis on technical issues, a low level of maturity for ad-hoc projects…
▽ More
Data science has employed great research efforts in developing advanced analytics, improving data models and cultivating new algorithms. However, not many authors have come across the organizational and socio-technical challenges that arise when executing a data science project: lack of vision and clear objectives, a biased emphasis on technical issues, a low level of maturity for ad-hoc projects and the ambiguity of roles in data science are among these challenges. Few methodologies have been proposed on the literature that tackle these type of challenges, some of them date back to the mid-1990, and consequently they are not updated to the current paradigm and the latest developments in big data and machine learning technologies. In addition, fewer methodologies offer a complete guideline across team, project and data & information management. In this article we would like to explore the necessity of developing a more holistic approach for carrying out data science projects. We first review methodologies that have been presented on the literature to work on data science projects and classify them according to the their focus: project, team, data and information management. Finally, we propose a conceptual framework containing general characteristics that a methodology for managing data science projects with a holistic point of view should have. This framework can be used by other researchers as a roadmap for the design of new data science methodologies or the updating of existing ones.
△ Less
Submitted 14 January, 2022; v1 submitted 14 June, 2021;
originally announced June 2021.
-
Distributed mining of large scale remote sensing image archives on public computing infrastructures
Authors:
Luigi Mascolo,
Marco Quartulli,
Pietro Guccione,
Giovanni Nico,
Igor G. Olaizola
Abstract:
Earth Observation (EO) mining aims at supporting efficient access and exploration of petabyte-scale space- and airborne remote sensing archives that are currently expanding at rates of terabytes per day. A significant challenge is performing the analysis required by envisaged applications --- like for instance process mapping for environmental risk management --- in reasonable time. In this work,…
▽ More
Earth Observation (EO) mining aims at supporting efficient access and exploration of petabyte-scale space- and airborne remote sensing archives that are currently expanding at rates of terabytes per day. A significant challenge is performing the analysis required by envisaged applications --- like for instance process mapping for environmental risk management --- in reasonable time. In this work, we address the problem of content-based image retrieval via example-based queries from EO data archives. In particular, we focus on the analysis of polarimetric SAR data, for which target decomposition theorems have proved fundamental in discovering patterns in data and characterize the ground scattering properties. To this end, we propose an interactive region-oriented content-based image mining system in which 1) unsupervised ingestion processes are distributed onto virtual machines in elastic, on-demand computing infrastructures 2) archive-scale content hierarchical indexing is implemented in terms of a "big data" analytics cluster-computing framework 3) query processing amounts to traversing the generated binary tree index, computing distances that correspond to descriptor-based similarity measures between image groups and a query image tile. We describe in depth both the strategies and the actual implementations for the ingestion and indexing components, and verify the approach by experiments carried out on the NASA/JPL UAVSAR full polarimetric data archive. We report the results of the tests performed on computer clusters by using a public Infrastructure-as-a-Service and evaluating the impact of cluster configuration on system performance. Results are promising for data mapping and information retrieval applications.
△ Less
Submitted 17 January, 2015;
originally announced January 2015.
-
Trace transform based method for color image domain identification
Authors:
Igor G. Olaizola,
Marco Quartulli,
Julian Florez,
Basilio Sierra
Abstract:
Context categorization is a fundamental pre-requisite for multi-domain multimedia content analysis applications in order to manage contextual information in an efficient manner. In this paper, we introduce a new color image context categorization method (DITEC) based on the trace transform. The problem of dimensionality reduction of the obtained trace transform signal is addressed through statisti…
▽ More
Context categorization is a fundamental pre-requisite for multi-domain multimedia content analysis applications in order to manage contextual information in an efficient manner. In this paper, we introduce a new color image context categorization method (DITEC) based on the trace transform. The problem of dimensionality reduction of the obtained trace transform signal is addressed through statistical descriptors that keep the underlying information. These extracted features offer a highly discriminant behavior for content categorization. The theoretical properties of the method are analyzed and validated experimentally through two different datasets.
△ Less
Submitted 25 March, 2019; v1 submitted 19 August, 2012;
originally announced August 2012.
-
A review of EO image information mining
Authors:
Marco Quartulli,
Igor G. Olaizola
Abstract:
We analyze the state of the art of content-based retrieval in Earth observation image archives focusing on complete systems showing promise for operational implementation. The different paradigms at the basis of the main system families are introduced. The approaches taken are analyzed, focusing in particular on the phases after primitive feature extraction. The solutions envisaged for the issues…
▽ More
We analyze the state of the art of content-based retrieval in Earth observation image archives focusing on complete systems showing promise for operational implementation. The different paradigms at the basis of the main system families are introduced. The approaches taken are analyzed, focusing in particular on the phases after primitive feature extraction. The solutions envisaged for the issues related to feature simplification and synthesis, indexing, semantic labeling are reviewed. The methodologies for query specification and execution are analyzed.
△ Less
Submitted 19 June, 2012; v1 submitted 4 March, 2012;
originally announced March 2012.