[1,2]\fnmTom F. \surHansen

[1]\orgdivEngineering geology and rock engineering, \orgnameNorwegian Geotechnical Institute, \orgaddress\streetSandakerveien 140, \cityOslo, \postcode0484, \countryNorway

2]\orgdivInformatics Institute, \orgnameUniversity of Oslo, \orgaddress\streetBlindern, \cityOslo, \postcode0316,\countryNorway

Unsupervised machine learning for data-driven classification of rock mass using drilling data
How can a data-driven system handle limitations in existing rock mass classification systems?

tom.frode.hansen@ngi.no Arnstein Aarset * [

Abstract

Rock mass classification systems are crucial for assessing stability and risk in underground construction globally and guiding support and excavation design. However, systems developed primarily in the 1970s lack access to modern high-resolution data and advanced statistical techniques, limiting their effectiveness as decision-support systems. Initially, we outline the limitations observed in this context and later describe how a data-driven system, based on drilling data as detailed in this study, can overcome these limitations. Using extracted statistical information from thousands of MWD-data values in one-meter sections of a full tunnel profile, thus working as a signature of the rock mass, we have demonstrated that it is possible to form well-defined clusters that can act as a foundational basis for various rock mass classification systems. We reduced the dimensionality of 48-value vectors using nonlinear manifold learning techniques (UMAP) and linear principal component analysis (PCA) to enhance clustering. Unsupervised machine learning methods (HDBSCAN, Agglomerative Clustering, K-means) were employed to cluster the data, with hyperparameters optimised through multi-objective Bayesian optimisation for effective clustering. Using domain knowledge, we experienced improved clustering and system tuning opportunities in adding extra features to core clusters of MWD-data. We structured and correlated these clusters with physical rock mass properties, including labels of rock type and rock quality, and analysed cumulative distributions of key MWD-parameters for rock mass assessment to determine if clusters meaningfully differentiate rock masses. The ability of MWD data to form distinct rock mass clusters suggests substantial potential for future classification systems grounded in this objective, data-driven methodology, free from human bias.

keywords:

Rock mass classification, tunneling, measure while drilling, unsupervised machine learning, multi objective optimisation

Highlights

•

Natural clustering in Measure While Drilling data is demonstrated.
•

Clustering is optimised with multi-objective Bayesian optimisation.
•

Clustering is sensitive to feature sets and algorithms for dimension reduction and clustering.
•

Clusters are investigated and organised by physical features.
•

A system approach is sketched for clusters as a foundational basis for rock mass classification systems.

1 Introduction

Rock mass classification systems (RMCS) are widely used as decision support in rock tunnelling. The Q-system [1] and the RMR [2] system are among the most popular [3]. In Scandinavian drill and blast tunnelling, operations typically involve full profile blasting for 5 to 6 meters, followed by 1-2 hours of scaling to remove loose rock. Subsequently, the surface is washed and coated with fibre-reinforced shotcrete roughly an hour later. Once the rock is covered, further physical inspection is not possible. Within this brief period, the face engineer must inspect the rock surface, evaluate input variables (considering a range to accommodate uncertainty), compute a classification value, and assign a specific stability class. This classification directly influences the support class, which dictates the overall rock support strategy. Temporary and permanent supports are usually installed simultaneously, underscoring the need for precise rock mass classification to maintain tunnelling efficiency and stability. The focus is primarily on the newly exposed rock contour, where support systems are implemented, typically an area of $150\,m^{2}$ (calculated as $25\,m\text{ arclength}\times 6\,m$ ). Less emphasis is placed on the tunnel face. However, the focus increases in weaker rock masses where shorter lengths are standard, and advance support becomes critical. Although this description is specific to Scandinavian contexts, the findings and methodologies are equally applicable to other rock tunnelling regimes.

1.1 Limitations in existing systems

RMCSs have been essential for the rock engineering industry and have contributed to increased consistency in rock mass assessment and rock support worldwide for decades. Developed mainly in the 1970s, these systems predate modern data capture technologies such as comprehensive scan/image capturing of newly exposed rock surface profiles, Measure While Drilling (MWD) data, and geophysics from the excavation face. At that time, high-resolution datasets with extensive rock mass coverage and advanced statistical learning techniques were unavailable, and computational power was limited. Given the current advances in data availability and automation potential in Scandinavian tunnelling, these classification systems exhibit limitations that may result in suboptimal decisions. Modern data-driven approaches could address these limitations more effectively. The objective here is not to undermine the existing systems but to highlight the enhancements possible with today’s technology.

1.

Subjectivity in Assessment: These systems rely heavily on the subjective judgement of face engineers, leading to variations in the assessment and support decisions for identical rock conditions [4, 5, 6].
2.

Inconsistent Observations: Face engineers may focus inconsistently on specific features or areas, potentially overlooking variations in the rock mass, and might not be able to perceive the exposed rock in the last blasting round similarly [6, 4]
3.

Safety and Accessibility Limitations: High-risk conditions and physical barriers often prevent thorough inspection of exposed rock, particularly in large double-track tunnels. Several of these tunnels are inspected from the floor at a safe location, leading to a poor assessment. [7, 6]
4.

Constraints on Quantification: Quantifying representative values for the newly exposed rock mass, such as the Rock Quality Designation (RQD), within the limited time frame of a fast-paced tunnel cycle presents significant challenges. The RQD metric, which involves measuring core pieces longer than 10 cm for every meter of rock, is tough to assess accurately under these conditions. [8, 9].
5.

Conservative Over-Supporting: Typically, the poorest rock mass conditions dictate the support for the entire blasted area, which can lead to unnecessary over-support in better-quality sections [7, 6]
6.

Empirical Data Limitations: The original empirical data may not cover all geological conditions, construction geometries, or site-specific factors, resulting in an oversimplified approach to complex geologies [7, 6, 9]
7.

System Update Challenges: Updates to these systems are infrequent and labour-intensive, hindered by non-transparent processes and inherent biases. The process involves trial and error to adapt input configurations to experienced stability in existing sites [6, 7, 5].
8.

Complex Rules for Exceptions: The systems incorporate complicated rules and factors that adjust classifications for different scenarios, which can lead to errors if not applied correctly, E.g. forgetting to multiply the Jn value (number of joint sets) in a junction with three, you might end up with a rock support class that is too low [7, 6].
9.

Visual Assessment Limitations: Current systems focus on visually assessable rock mass, neglecting the stability of rock outside the immediate tunnel profile, which is crucial for overall stability [7, 6].
10.

Non-existing advance rock mass assessment: The existing system cannot effectively assess the rock mass quality in front of the excavation, making it less useful for decision support on advance support and excavation method. You might say you can classify a drilled rock core from the face in RMR or Q-class, but such a process severely impacts the efficient tunnel factory and is only a point value. A system which describes advance support classes (face bolts, stability grouting, spiling bolts, etc.) from data ahead of the tunnel remains elusive [6, 7].s
11.

Concervative Support: Rock support classes are defined by inspecting primarily stable conditions, majorly in civil infrastructure tunnels with a high safety factor and where the rock support is conservative, which may not reflect the actual stability needs [7, 6].
12.

Mismatch in support classes: There are criticisms regarding how the combination of defined classes and linked rock support description can address the right type of rock support in general. [10, 7, 11].
13.

Failure Modes: Existing systems may not adequately consider various failure modes in their classifications, impacting the accuracy of rock support assessments [7, 6]

Existing rock mass classification systems face several limitations: they are challenging to update, inherently conservative, subject to user bias, lack sufficient details, are unsuitable for forecasting, and do not assess rock mass where support is most needed. Additionally, the conservative nature of the industry often places higher trust in the subjective assessments of experienced human experts over automated systems [6, 12]. This resistance to automation may hinder updates that could incorporate more systematic data collection. Transitioning some or all rock mass assessments to a transparent, reproducible, bias-free, and easily updatable data-driven decision support system could address these issues effectively.

1.2 Why do we need data-driven rock mass classification systems?

An accurate and objective understanding of rock mass stability is crucial to optimise rock support, blasting design, and excavation methods in tunnelling and mining. Rock materials are inherently heterogeneous, and the quality of the rock mass can vary significantly over short distances. However, historically, we have not been able to comprehensively describe this complexity to facilitate optimised decision-making in the fast-paced tunnel cycle. Consequently, rock mass quality is grouped into practical and meaningful target classes.

Existing classification systems are impractical in environments that are inaccessible to humans. These include hazardous areas of weak rock in current tunnelling projects, production mining environments operating close to a safety factor 1.0, and extraterrestrial locations planned for future human bases. In these scenarios, automated systems are necessary for thorough rock mass assessments.

A finely tuned rock mass classification on a sufficiently small scale is essential for optimising decisions. Current broad-spectrum decisions are predominantly conservative, leading to the excessive use of steel and concrete for rock support. Moreover, there is a significant gap in our ability to assess rock mass quality ahead of the excavation, which is vital for planning advance support and selecting the appropriate excavation method. Addressing these challenges and the limitations outlined in Section 1.1 requires the development of new, more adaptive systems that can operate autonomously and provide accurate assessments in real time to enhance safety and efficiency in challenging and dynamic environments.

1.3 State of the art in research in data-driven rock mass classification

To date, no purely data-driven rock mass classification system for tunnelling and underground mining encompasses the following properties: (a) the capability to classify rock mass stability of the visually exposed rock mass and outside the tunnel profile, including the area ahead of the tunnel face; (b) independence from existing classification systems; (c) the use of comprehensive data, patterns and decisions from larger rock volumes than individual drillholes, such as entire blasting rounds or a 1 m slice of the tunnel; (d) practical classification of rock mass into stability classes that aid decision-making, beyond merely correlating mechanical properties like UCS, E-modulus, and single features.

The necessity for criterion (a) stems from assessing the rock mass’s stability surrounding and ahead of the tunnel face. Criterion (b) avoids inheriting limitations from previous systems. Criterion (c) acknowledges that the rock mass quality can vary significantly over short distances. It is the combined signature of the rock mass for a bigger volume that should be used to make practical decisions regarding rock support and excavation methods. Classifying from single drillholes might introduce so much noise and variations when making relevant decisions, based on the forecasting information from all the drillholes, that an extra interpretation step is needed to make a decision such as "should I install spiling bolts or not". Criterion (d) highlights that decisions on rock support and excavation methods cannot rely solely on single mechanical properties or features.

To our knowledge, the datasets derived from face seismics and Measure While Drilling (MWD) in hard rock tunnelling are the only ones providing the necessary spatial detail to effectively act as signatures of the rock mass beyond the tunnel profile. Several studies have linked high-resolution datasets to rock mass quality without explicitly aiming to develop a new data-driven classification system. We have categorised these studies into two groups based on their use of these datasets for forecasting purposes. Using geophysical data, Dickmann et al. [13] and Dickmann and Hecht-Méndez [14] have automated the characterisation of rock mass into stability zones ahead of the tunnel and linked rock support to ground treatment using tunnel seismic prediction (TSP). Sapronova et al. [15] employed unsupervised clustering to group similar data into clusters representing different geological conditions, subsequently labelled through supervised learning. These studies show promising results in characterising the rock mass ahead of the tunnel face. However, the development of a complete rock mass classification system is pending, and accuracy remains to be enhanced. Moreover, the use of geophysics at the excavation face involves human intervention. It could significantly affect the drill and blast cycle more than automatically collected MWD data.

Sapronova et al. [16] utilises correlation values between MWD features from single drillholes as feature vectors to predict Q-system classes, which contradicts points b and c regarding independence from existing systems and analysing larger rock volumes. Similarly, Hansen et al. [17] employs statistically derived MWD values from all drillholes in a blasting round to predict Q-classes, yet still breaches point b by relying on existing classification systems. Fernández et al. [18] applies machine learning to single-hole MWD data to detect discontinuities using a calculated discontinuity index, thus contravening point c about using summarised data decisions from larger rock volumes. van Eldert et al. [19] predicts Q-values and rock support from single-hole MWD data, violating points b and c, although the study’s calculation and visualisation of a fracture index from single holes, segmented manually into different rock mass quality zones, moves towards a purely data-driven system. However, segmentation must be automated to reduce human bias, and explicit support classes must be linked to fracture indices. He et al. [20] also uses single-hole MWD data to predict UCS and friction angles with machine learning, infringing upon points c and d, which call for broader data integration and practical decision-making utility. Lastly, Galende-Hernández et al. [21] clusters MWD data and links these to RMR values using expert-based fuzzy rules, breaching point b.

1.4 A foundation for data-driven rock mass classification systems

We can think of two natural strategies for optimising rock mass classification systems in a data-driven way: (a) employing modern data collection and learning techniques to improve the existing systems in various ways (e.g. improve assessment of input variables or extending the systems with new features), or (b) developing entirely new, purely data-driven systems. This study adopts the latter approach, hypothesising that rock mass can be intricately grouped, clustered, and classified using spatially extensive, high-resolution Measure While Drilling (MWD) data, which serves as a signature of the rock mass. Our objective is to address several limitations in existing systems in this context. MWD data is a cost-effective and easily retrievable data source in global tunnelling and mining operations [22], with the added advantage of not impacting the tunnel cycle. We analysed the natural clustering of MWD data, organised as tabular data samples for every meter of tunnel excavation across 15 hard rock tunnels, totalling 23,000 meters and involving approximately 500,000 blasting drillholes of infrastructure tunnel data:

•

Information extraction involved calculating six statistical features from about 5000 values for each of eight MWD parameters, yielding 48 values in total.
•

To enhance clustering, we reduced the dimensionality of the 48-value vectors using nonlinear manifold learning techniques such as UMAP and linear Principal Component Analysis (PCA).
•

We employed unsupervised machine learning techniques (HDBSCAN, Agglomerative Clustering, K-means) to identify natural groupings and structures within the data, creating clusters. We further optimised and explored the hyperparameters in these algorithms using multi-objective optimisation to ensure effective and meaningful clustering.
•

We mapped and structured the clusters according to the physical properties of the rock mass, such as rock type and quality, and the distributions of key MWD parameters essential for rock mass assessment and clustering to determine if the clustering meaningfully differentiated the rock mass.

The subsequent sections will outline the dataset, methods used, results, and their analysis. The methodology section covers dimension reduction, clustering, decision metrics, organising experiments, hyperparameter optimisation, and linking clusters to physical properties of the rock mass. The discussion section explores the implications of these results. Finally, the conclusion and outlook sections summarise our findings and suggest directions for future research.

2 Dataset

The dataset, detailed in Hansen et al. [23], comprises 23,277 derived samples from approximately 500,000 drillholes in 15 hard rock tunnels. It features 48 MWD and two geometric parameters across 15 tunnels with varied geologies, originating from 4,202 blasting rounds. The primary rock types include Precambrian Gneisses, Permian Basalt and Granite, Permian Rhomb porphyry, and Cambro-Silurian shales, limestone, and claystone. Six MWD parameters were logged and preprocessed through normalization or Root Mean Square (RMS) filtering: PenetrNorm, PenetrRMS, RotaPressNorm, FeedPressNorm, HammerPressNorm, WaterflowNorm, and WaterflowRMS. For each 1 m section of the tunnel, encompassing roughly 5,000 sensor values from 120 drillholes for full face excavation, we calculated mean, median, standard deviation, variance, skewness, and kurtosis for each MWD parameter, resulting in 50 feature values (6 statistical metrics times eight parameters) and two geometric parameters (overburden, tunnel width). Fig. 1 visualises the collection and statistical extraction process. Fig. 2 displays the distribution of the median values for all MWD and geometric parameters. Table 1 lists the parameters and the abbreviations used in the study.

Table 1: MWD-parameters with abbreviations and their normalised/filtered forms

Original parameter	Abbreviation for	Description
name and unit	normalised/ filtered form
Penetration rate (m/min)	PenetrNorm	Normalised penetration
Penetration rate (m/min)	PenetrRMS	RMS filtered penetration
Rotation pressure (bar)	RotaPressNorm	Normalised rotation pressure
Rotation pressure (bar)	RotaPressRMS	RMS filtered rotation pressure
Feeder pressure (bar)	FeedPressNorm	Normalised feeder pressure
Hammer pressure (bar)	HammerPressNorm	Normalised hammer pressure
Flushing water flow (l/min)	WaterflowNorm	Normalised waterflow
Flushing water flow (l/min)	WaterFlowRMS	RMS filtered waterflow

Refer to caption — Figure 1: Collection process for MWD-data and extraction of statistical information

3 Methodology

The primary objective is categorising rock masses into groups with similar properties using high-resolution, spatially distributed MWD-data. These groups are then assessed for distinct physical property signatures, which make sense to use as a foundation for decision support systems in underground construction tasks such as rock support, excavation design or grouting effort. Furthermore, the clusters are investigated for alignment with the existing label sets of rock type and rock quality (Q-class). The initial critical step, which this study focuses on, is exploring the natural clustering in drilling data.

We explored various feature sets, dimensionality reduction techniques, clustering algorithms, and hyperparameter tuning to determine effective clustering. Scaling the feature vector is crucial for both dimensionality reduction and clustering, ensuring uniform contribution of features to the analysis. We tested several scaling methods, including MinMaxScaler, StandardScaler, and RobustScaler [24].

3.1 Dimension reduction to improve clustering

Dimension reduction prior to clustering improves computational efficiency and clarity by reducing noise and redundancy, thereby enabling clearer and more significant groupings in the reduced feature space [25]. In addressing the complexities of rock mass categorisation, we have applied dimension reduction techniques. Specifically, we utilised UMAP (Uniform Manifold Approximation and Projection) [26] for its non-linear capabilities and PCA (Principal Component Analysis) [27] for linear reduction. These methods aim to preserve and highlight local and global data structures, facilitating pattern recognition and grouping by clustering algorithms. Common in fields like bioinformatics, this sequential approach of reduction followed by clustering accommodates heterogeneous data [28]. Aware of potential distortions in cluster relevance due to manifold learning [29], we also perform clustering directly on the MWD-features, and we ensure a detailed inspection of their distributions in all experiments (not the dimension-reduced components) and alignment with existing dataset labels. Additionally we employed dimension reduction with UMAP to effectively visualise the high dimensional data with clusters in 2D and 3D plots.

UMAP is particularly useful for clustering, as it can capture the non-linear relationships in the data, which can be important for rock mass classification. UMAP (Uniform Manifold Approximation and Projection) is a dimension reduction technique that can be used in exploratory data analysis and visualisation, as well as in machine learning. It is a non-linear technique that preserves the global and local structure of the data, making it particularly useful for clustering. UMAP works by constructing a high-dimensional graph representation of the data and then optimising a low-dimensional version of this graph to be as structurally similar as possible. This results in a low-dimensional data representation that maintains much of the original structure. Conversely, PCA is selected for its efficiency and simplicity in linear dimension reduction. It works by identifying the directions (principal components) that maximise the variance in the data, which, for a dataset with a vector length of 50, is an effective strategy for reducing dimensionality while retaining as much information as possible. PCA’s strength lies in its ability to provide a clear overview of the data’s linear structure, making it a suitable linear counterpart to UMAP in our study. Using both UMAP and PCA, we can explore the data from different perspectives and potentially improve the performance of the clustering algorithms.

While other dimension reduction techniques exist, both linear—such as Linear Discriminant Analysis (LDA) and Singular Value Decomposition (SVD)—and non-linear—such as t-SNE and autoencoders [27, 30, 31], we opted for UMAP and PCA due to their effectiveness, simplicity, and efficiency in handling the complexities of rock mass data. t-SNE, although powerful for manifold learning, is computationally intensive and challenging to tune and interpret. LDA depends on predefined class labels, which we aim to avoid as initial inputs. SVD is primarily utilised for matrix factorisation in contexts like natural language processing and autoencoders, despite their capability for non-linear dimension reduction, demanding intricate tuning, and being complex to implement. Therefore, UMAP and PCA were selected for their balanced performance, ease of use, and interpretability, which align well with the objectives of our study.

3.2 Clustering

Following dimension reduction employing PCA, UMAP, and direct analysis without dimension reduction, we conducted clustering to understand the structural variations within the MWD signature of the rock mass. This approach is essential for identifying inherent groupings within the MWD data. We selected three distinct clustering algorithms based on their methodological diversity and applicability to our data type:

•

K-means Clustering: This algorithm was chosen for its efficiency in handling big data sets and simplicity, making it highly interpretable. K-means partitions the dataset into K distinct, non-overlapping clusters by minimising the variance within each cluster [32]. This method is particularly effective for identifying spherical clusters in feature space.
•

Agglomerative Clustering: As a hierarchical clustering technique, this algorithm was employed to provide insights into the possible hierarchical structure of the dataset [33]. It progressively merges pairs of clusters that minimally increase a given linkage distance. Agglomerative clustering is useful for our study as it allows the examination of cluster structures at different scales.
•

HDBSCAN: Selected as an advanced density-based algorithm, HDBSCAN extends DBSCAN by converting it into a hierarchical clustering algorithm and introducing a stability-based cluster selection technique. This choice is justified by its ability to handle variable cluster densities, which is crucial for datasets with complex spatial relationships like those found in geotechnical data [34]. HDBSCAN determines the core clusters based on the areas of high density and treats points in low-density regions as outliers. Unlike K-means, which forces every point into a cluster even if it does not logically belong to any, HDBSCAN allows points to remain unclassified if they do not fit well with any cluster (given by the cluster value -1). The ability of HDBSCAN to identify and separate outliers is beneficial in geotechnical data, where anomalous readings can indicate critical phenomena like unstable rock masses or unusual material properties. By effectively isolating these outliers, HDBSCAN can help focus attention on potential areas of concern that might require further investigation or monitoring, enhancing the safety and reliability of geotechnical assessments

The clustering approach does not rely on a predefined number of clusters; instead, it employs the Silhouette, Davies-Bouldin, and Calinski-Harabasz scores to optimise the cluster count. This method ensures that the optimal number of clusters is determined based on the data characteristics, avoiding bias from preset values. The implementation varies across algorithms; for instance, HDBSCAN does not require specifying a cluster number, whereas K-means and agglomerative clustering initially require a set number of clusters. However, as detailed later, the number of clusters is subsequently optimised, mitigating any potential issues from initial settings.

Four distinct feature sets were utilised to analyse natural clustering tendencies in MWD data. Each set is identified by its name at the beginning of the sets described below.

•

All. All 48 MWD-parameters, plus the two geometric parameters overburden thickness and tunnel width (50 in total), to investigate in what way extra feature information, in addition to the MWD rock mass signature, impacts the clustering result.
•

MWD. All 48 MWD parameters are used to inspect clustering results when only the MWD signature is used.
•

MWD_rock. Using domain and data knowledge to reduce the feature set to 30 by removing the 12 MWD-parameters dealing with WaterFlow, which might not impact the rock mass stability, and the correlated standard deviation parameter (correlated to variance).
•

MWD_median. The median of the 48 MWD parameters, giving eight features, is analysed to determine whether it adequately represents the stability of the rock mass.

3.3 Evaluating clustering results

The evaluation of clustering algorithms is a multifaceted process, presenting unique challenges compared to supervised learning. Unlike supervised learning, where precision and recall can directly measure performance, clustering evaluation must focus on the data’s intrinsic structure without relying on predefined labels. This involves assessing whether the algorithm effectively identifies meaningful groups within the data that reflect some form of ground truth or underlying assumptions about data similarity [24]. The evaluation incorporates established clustering scores, qualitative assessments, and visual inspections, demonstrating the thoroughness and complexity of the process.

3.3.1 Evaluating established clustering scores

Both label-based (external) and label-independent (internal) metrics were employed in the evaluation, with a focus on the latter to maintain independence from existing labels. External metrics require true labels to assess the performance of the model. They are used to compare the clustering output to an externally provided true set of labels. External metrics evaluate how well the clustering has performed based on the known classification of the data. The external metrics included the Adjusted Rand Score and Adjusted Mutual Information Score.

Internal metrics do not require true labels and instead evaluate the quality of the clustering using the data itself. Internal metrics typically measure the compactness, separation, or density of the clusters formed by the model. The internal metrics used were the Silhouette coefficient (SC), Davis-Bouldin index (DBI), and Calinski-Harabasz index (CHI).

This balanced approach, incorporating both external and internal metrics, facilitated a thorough evaluation of the clustering algorithms. Below, we define the boundary values and briefly describe the interpretation of each score.

•
Silhouette Coefficient Score:
- –
  
  Boundary Values: The Silhouette Score ranges from -1 to 1.
- –
  
  Interpretation: This metric evaluates the consistency within clusters by comparing the distance between objects within the same cluster to the distance to objects in the nearest cluster [35]. A high value close to 1 indicates that objects are well-matched to their own cluster and distinct from neighbouring clusters, representing optimal clustering. In unsupervised learning, relying solely on one metric for evaluation is often insufficient. However, this study primarily utilises the Silhouette score to rank results, as it slightly more effectively differentiates between superior and inferior clustering outcomes, particularly when analysing cluster visualisations.
•
Davies-Bouldin Index:
- –
  
  Boundary Values: The score begins at 0 and has no predefined upper limit.
- –
  
  Interpretation: This index measures the average ’similarity’ between clusters, where similarity is the ratio of within-cluster distances to between-cluster distances [36]. Lower values indicate that clusters are well-separated and internally cohesive, with the optimal score being 0, suggesting minimal intra-cluster variation and maximal inter-cluster distinction.
•
Calinski-Harabasz Index (also known as the Variance Ratio Criterion):
- –
  
  Boundary Values: There is no upper limit, but higher scores indicate better clustering quality.
- –
  
  Interpretation: This score is calculated by measuring the ratio of the sum of between-clusters dispersion and within-cluster dispersion for all clusters [37]. Essentially, a higher score signifies dense and well-separated clusters, which is considered indicative of a good clustering structure.

Given the complexity of clustering evaluation, we combined these metrics using Pareto front analysis to compare different experimental results comprehensively, described in Section 3.5.

Each feature vector in the dataset is assigned two labels: rock type and a rock mass stability class from the Q-system [1], as determined by face engineers during tunnel excavation. Label-based metrics such as the Adjusted Rand Index and Adjusted Mutual Information were utilised to evaluate the alignment between clustering outcomes and the labels for rock type and rock mass stability. These metrics provide insights into the relevance of clusters to actual rock mass properties, aiding the development of a data-driven rock mass stability classification system. Although the clustering aims to identify groups without strictly adhering to these labels, which were not used as targets in the optimisation process described in Section 3.5, assessing how well these labels align with the clusters is crucial. This alignment helps to validate the clusters against real rock mass properties and facilitates acceptance by the academic community.

•
Adjusted Rand Score:
- –
  
  Boundary Values: The score ranges from -1 to 1.
- –
  
  Interpretation: This metric assesses the similarity between two clusterings by considering all pairs of samples and counting pairs that are assigned in the same or different clusters in the predicted and true clusterings [38]. A score of 1 indicates perfect correspondence between the clustering labels and the true labels, implying an ideal match.
•
Adjusted Mutual Information Score:
- –
  
  Boundary Values: Ranges from 0 to 1.
- –
  
  Interpretation: Adjusted Mutual Information (AMI) is an adjustment of the Mutual Information (MI) score that accounts for chance. It measures the agreement of the two assignments, ignoring permutations [39]. A score of 1 denotes perfect agreement between the clustering labels and the true labels, adjusted for chance, which suggests a flawless clustering output.

We also assessed the Gini index [27], a measure not typically associated with clustering but indicative of sample distribution uniformity within clusters. The Gini index, commonly applied in decision trees to evaluate node purity, varies from 0 to 1, where 1 signifies perfectly uniform cluster sizes. This aspect is not adequately captured by the metrics previously discussed. Despite favourable clustering scores, sample distribution within clusters can be highly uneven, which may not reflect the actual characteristics of the clusters. Experiments were excluded when sample distribution was excessively skewed, often with one cluster containing most samples while others had very few. A Gini index of 1 indicates equally sized clusters; however, values approaching 1 can also suggest suboptimal clustering performance, a potential issue in methods like K-means that prefer circular clusters [24].

3.3.2 Evaluating other aspects not captured by clustering metrics

•

Number of unclustered samples. The HDBSCAN algorithm detects outliers not included in clusters. Experiments were excluded if outliers represented more than 10% of the samples, indicating suboptimal clustering. Non-clustered samples may result from incorrect configuration or genuine outliers in the dataset. Visualising the clusters can typically distinguish between these types.
•

Number of clusters and their sizes. The practical value of clustering diminishes if the number of clusters is excessively high or too low (three or fewer). Experiments with such outcomes were discarded, particularly when a single cluster encompassed nearly all samples, a scenario not adequately addressed by existing cluster scores despite high scores.
•

The compactness of clusters. The compactness of clusters indicates cluster quality and the potential for improvement by modifying configurations or adding features. For instance, a cluster that appears spread out and nearly divided into two suggests a need for refinement.

This qualitative evaluation addresses the limitations of established scores in reflecting the practical utility of clustering outcomes.

3.3.3 Visual inspection of clustering

Visual inspection of clustering, a crucial part of our evaluation process, was facilitated by Plotly Express [40]. This tool allowed us to generate dynamic 2D and 3D plots, enabling detailed clustering examination through interactive manipulation such as rotation and zoom. Two plot variations were created: scatter points labelled by cluster value and rock type and rock mass stability class. Both plot types employ UMAP dimension reduction to 2 or 3 dimensions despite the typical use of more than ten components in the UMAP process for clustering. Additional details, such as rock type, are displayed when hovering over the scatter points. The 3D plots provide a unique perspective, allowing for the evaluation of clustering effectiveness and the relationship between clusters, which is not possible with other metrics. It is important to note that these visual assessments are limited to three dimensions. Thus, in cases where other metrics suggest superior clustering performance, discrepancies observed in the 3D plots do not necessarily indicate poor clustering if higher dimensions were visualised.

This comprehensive evaluation approach, combining established metrics, qualitative assessments, and visual inspections, facilitated a nuanced understanding of the clustering’s effectiveness in classifying rock mass stability, laying the groundwork for a novel data-driven classification system.

3.4 Organising the experimentation process

The experimentation process in this study is structured as a pipeline comprising scaling, dimension reduction, clustering, and evaluation, implemented in Python and hosted on GitHub. Apart from UMAP [26], all components, including scalers, dimension reduction algorithms, clustering algorithms, and metrics, are sourced from Scikit-learn [24].

The study conducted over a thousand experiments, primarily focusing on hyperparameter optimisation, with approximately 20% undergoing detailed scrutiny. For efficient management and inspection of these experiments, we utilised mlflow [41] and hydra [42]. Mlflow was used to store and visualise the results of the experiments, along with the most important parameters and artefact files of features, true labels, and cluster values. Hydra was used to organise and store the detailed configurations of the experiments and to make it easy to run the experiments with different configurations. All configuration values were quality-checked using Pydantic [43] in each experiment. The experiments were run in a makefile, which is a simple way to run the experiments in an efficient and reproducible way [44]. Result plots and compilations were generated reproducibly by retrieving data from mlflow using unique experiment IDs. For further insights into scientific ML-based reproducible and transparent experimentation, see [45].

3.5 Hyperparameter optimization

Optimising hyperparameters is essential for effective clustering, given the sensitivity of results to these settings in dimension reduction and clustering algorithms. Proper hyperparameter configuration significantly impacts clustering outcomes. As outlined in Section 3.3, evaluating multiple metrics is crucial in unsupervised clustering. In this study, the Silhouette score, Davies Bouldin index, and Calinski Harabasz index were selected as objective scores for optimisation. We employed Bayesian multi-objective optimisation, using the Optuna package [46], to adjust the hyperparameters in several stages.

For hyperparameter sampling, we adopted the multi-objective version of the Tree-structured Parzen Estimator (MOTPE) [47]. MOTPE effectively approximates a Pareto front and is more efficient than evolutionary samplers like NSGA-II, which directly optimise a Pareto front. A Pareto front represents the set of solutions considered non-dominated, meaning no solution can improve one objective without worsening at least one other objective [48]. This concept is fundamental in multi-objective optimisation as it visually and numerically illustrates the trade-offs between competing objectives, helping decision-makers choose the most suitable solutions according to their preferences. We provide details on the optimisation process due to its significance.

Approximating the Pareto Front: MOTPE extends the traditional TPE for multiobjective optimisation by defining vector dominance and using a nondomination ranking system. Observations are split into two groups via density functions based on their dominance relations to a reference set $Y^{*}$ , ensuring $p(y\succ Y^{*}\cup y\parallel Y^{*})=\gamma$ . Additionally, the Hypervolume Subset Selection Problem (HSSP) is utilised to select observations that maximise the hypervolume indicator, facilitating an effective and diverse approximation of the Pareto front.

Selection of Pareto Optimized Experiments: In MOTPE, the selection of experiments is guided by the Expected Hypervolume Improvement (EHVI) criterion, which prioritises candidate solutions based on their potential to extend the current Pareto front. During each optimisation iteration, candidates are sampled from a distribution modelled by $l(x)$ , and the one with the highest EHVI is selected for evaluation. This greedy approach ensures continuous refinement of the Pareto front by balancing exploration and exploitation throughout optimisation. MOTPE builds models to estimate the probability distribution of objective values given the hyperparameters and then samples new hyperparameters from areas expected to yield better objective values. The four most important parameters were defined for each algorithm (HDBSCAN, K-means, UMAP, etc.), and ranges of values were set up for the process of choosing new parameters.

1.

Sampling. The MOTP sampler samples a new set of parameters from the hyperparameter space. This sampling is influenced by the past performance of parameter sets, aiming to explore regions with potentially better trade-offs between objectives.
2.

Evaluation. Using a parametric pipeline including scaling, dimension reduction and clustering, a new set of hyperparameters was tested in each iteration, and the three objective metrics were reported from each run. According to each metric’ optimised goal, the target metrics were set to be (ref. order above) maximised, minimised and maximised.
3.

Updating. Based on the outcomes of the evaluations, Optuna updates its understanding of the hyperparameter space. For TPE, it involves updating the internal probabilistic models.
4.

Selection: In the next iteration, Optuna uses the updated information to sample new parameters again, effectively iterating towards better solutions over time. In multi-objective optimisation, the ’better’ solution must consider the trade-offs between competing objectives.

In choosing the final best set of hyperparameters for each pipeline this study focused on the Silhouette score and the Calinski Harabasz index, when several Pareto optimal solutions were found.

3.6 Characterising the clusters

This study primarily investigates the natural clustering of MWD-data to establish a foundation for a data-driven rock mass classification system. The second step involves examining the physical properties of the rock mass within each cluster. Although this research initiates the process and demonstrates promising results and guidelines, further investigation into the properties of each cluster and refinement of cluster compositions are necessary to enhance the system’s industrial applicability. That is a task for future studies. The properties of each cluster have been analysed in multiple ways across three high-performing experiments:

•

The cumulative distribution function (CDF) plots for three important MWD-parameters—Normalised penetration, feeder pressure, and rotation pressure (torque)—are presented for each cluster, as identified in studies by [49, 50]. These parameters, which intuitively reflect the physical properties of the rock mass (e.g., higher penetration suggests softer rock, and higher feeder pressure indicates better rock quality), are analysed to determine if clustering effectively discriminates between rock masses with distinct physical characteristics. A good spread of distinct distribution suggests effective separation, whereas overlapping distributions imply less distinction.
•

A table is provided that lists the median values and sample counts of the three MWD-parameters across clusters, sorted by increasing normalised penetration.
•

To align with existing classification systems, rock types and Q-class labels, determined by majority vote, are assigned to each cluster. This alignment, while not a primary objective, aids in integrating the new system with established classifications and enhancing its acceptance.
•

In one experiment, clusters are sequenced based on the typical compressive strength of the assigned rock types (rock types are well aligned with the clusters in this experiment).

4 Results and analysis

This section is structured into three parts. First, it presents the outcomes of the hyperparameter optimisation process. Next, it details the clustering outcomes for different combinations of feature sets, dimension reduction techniques, and clustering algorithms and analyses the emerging patterns. The analysis is grouped into configurations leading to good or bad/questionable clustering. Finally, it explores the physical properties of each cluster and their association with tunnel decisions.

4.1 Optimising the pipeline of dimension reduction and clustering

Approximately 50 experiments were required for three cluster metrics to converge, enabling the identification of optimal hyperparameters for each pipeline. Optimisation proved essential for generating informative clusters as default algorithm settings were inadequate. The goal was to maximise the Silhouette and Calinski-Harabasz scores while minimising the Davis-Bouldain index. The best hyperparameters identified in each run were logged in mlflow and applied in the final clustering experiments. The results of this hyperparameter optimisation are displayed in an interactive Pareto plot using a Plotly function [40], implemented in the Optuna package [46]. The plot in Fig. 3 shows the trials for an ’MWD’ featureset optimisation involving UMAP and Agglomerative Clustering. Increasing trial numbers are indicated by progressively darker shades of blue, and improved Pareto optimal solutions are marked in darker reds. This presentation method effectively compares experiment outcomes against all objectives and illustrates the typical scatter development from the optimisation process. The MOTPE-sampler methodically explores parameter configurations, gradually converging on a Pareto front of optimal solutions.

The optimisation process, along with reporting and plotting, was executed for each distinct combination of feature set, dimension reduction technique, and clustering algorithm, leading to a comprehensive database of experiments tracked in mlflow. The four most effective experiments for a combination, as determined by the Pareto optimisation process, had their hyperparameters and unique IDs recorded. Each experiment was then thoroughly examined for its clustering results, and at least one experiment per configuration is detailed in two consolidated tables, Table 2 and Table 3 , in Section 4.2. The optimised hyperparameters for three of these experiments are listed in Table 7 in Appendix A. Any parameters not specified revert to the algorithm’s default settings.

Optimisation of hyperparameters significantly enhanced the clustering results, as evidenced by improved scores in the clustering metrics for experiments 0, 3, and 7, detailed in Table 2. The SC, DBI, and CHI metrics showed marked improvements, particularly in pipelines using the HDBSCAN algorithm. Through optimisation, HDBSCAN pipelines transitioned from 25% unclustered samples and over 800 clusters to effective clustering. In contrast, pipelines with the agglomerative clustering algorithm exhibited scores for default parameters that were closer to those of optimised parameters. Additionally, the occasional higher CHI scores for default configurations underscore the necessity of optimising multiple metrics and conducting a comprehensive evaluation of various approaches.

4.2 Comparing clustering results for different experiment setups

In Table 2 and 3, the results of the best-performing experiments, trained with optimised parameters, are presented. The results are grouped by featureset and then sorted after the Silhouette score.

Table 2: Summary of clustering results for four different feature sets, grouped by feature sets. Scores for default algorithm parameters in parenthesis

Id	Feature	Num.	Dim.	Cluster	Num.	Num. dim.	Num. not	Gini
	set	features	red.	alg.	clusters	red.	clustered	index
			alg.			comp.	samples
0	all	50	umap	hdbscan	9(956)	12(2)	0(6140)	0.5(0.6)
1	all	50	umap	hdbscan	9	15	23	0.5
2	mwd	48	umap	aggl. clust.	6	7	0	0.55
3	mwd	48	umap	aggl. clust.	7(6)	6(2)	0(0)	0.57(0.24)
4	mwd	48	umap	hdbscan	5	12	23	0.66
5	mwd	48	umap	kmeans	7	10	0	0.2
6	mwd	48	umap	kmeans	3	4	0	0.36
7	mwd	48	umap	hdbscan	11(838)	3(2)	55(6053)	0.62(0.63)
8	mwd	48	umap	hdbscan	13	12	1195	0.44
9	mwd	48	pca	kmeans	10	2	0	0.22
10	mwd	48	umap	hdbscan	6	2	22	0.69
11	mwd	48	pca	hdbscan	3	5	1842	0.59
12	mwd	48	None	hdbscan	3	—	1	0.67
13	mwd_rock	30	pca	kmeans	6	2	0	0.22
14	mwd_rock	30	umap	hdbscan	11	4	175	0.68
15	mwd_rock	30	umap	hdbscan	9	15	1014	0.64
16	mwd_median	8	umap	hdbscan	6	11	23	0.74

Table 3: Summary of clustering metrics grouped by feature set and ordered by Silhouette score. Adjusted Rand score is calculated for the label rock quality for featureset ‘all’, label rock type for other featuresets. Scores for default algorithm parameters in parenthesis

Id	Feature	Dim.	Cluster alg.	Silhouette	Davies	Calinski	Adjusted
	set	red.			Bouldin	Harabasz	rand
0	all	umap	hdbscan	0.65(0.15)	0.39(1.61)	68134(64776)	0.02(0.002)
1	all	umap	hdbscan	0.65	0.39	61432	0.02
2	mwd	umap	aggl. clust.	0.54	0.54	14031	0.3
3	mwd	umap	aggl. clust.	0.52(0.48)	0.5(0.72)	18316(31696)	0.33(0.27)
4	mwd	umap	hdbscan	0.45	0.44	9343	0.08
5	mwd	umap	kmeans	0.45	0.91	22210	0.25
6	mwd	umap	kmeans	0.45	0.82	15279	0.17
7	mwd	umap	hdbscan	0.44(0.11)	1.09(1.46)	15670(94512)	0.48(0.04)
8	mwd	umap	hdbscan	0.37	1.12	20866	0.24
9	mwd	pca	kmeans	0.32	0.88	13380	0.09
10	mwd	umap	hdbscan	0.3	0.46	7627	0.11
11	mwd	pca	hdbscan	0.23	2.29	974	0.01
12	mwd	None	hdbscan	0.74	0.35	307	0.0
13	mwd_rock	pca	kmeans	0.37	0.85	17916	0.04
14	mwd_rock	umap	hdbscan	0.26	1.33	7635	0.29
15	mwd_rock	umap	hdbscan	0.18	2.76	9536	0.25
16	mwd_median	umap	hdbscan	0.36	0.42	6142	0.17

Configurations leading to good clustering.

We found that using a min-max scaler, scaling to a 0-1 range worked better than other scalers.

Considering the internal metrics (Silhouette score, Calinski Harabasz and Davies Bouldain index) and visual appearance in plots, the pipeline with dimension reduction with UMAP and clustering with HDBSCAN or Agglomerative clustering works equally well, but only for the feature sets ‘All’ (50 features) and ‘MWD’ (48 features). However, in comparing HDBSCAN (experiment 7) and Agglomerative clustering (experiment 3) we want to point out the higher Adjusted Rand Score for HDBSCAN in experiment 7. Visualising the clusters in 3D plots in Fig. 4 and Fig. 10 in Appendix B for experiment 0, using the ‘All’ (MWD and geometric features) featureset and Fig. 5 (experiment 7) and Fig. 11 in Appendix B (experiment 3) for the ‘MWD’ featureset, demonstrates a near perfect match with the plotted clusters. Notably a dimension reduction of each feature set with UMAP to three components has been used to facilitate plotting the scatter points, accounting for the differences in cluster scatter points in Fig. 5 and Fig. 11 in Appendix B to be different from the ones in Fig. 4. In experiment 10, we employed two UMAP components for dimension reduction and visualisation. The resulting plot, shown in Fig. 6, displays a clear alignment between clustering results and visible clusters. However, the pattern of one large cluster alongside several smaller ones may not accurately represent the characteristics of this rock mass data. This assertion is further supported by lower performance scores.

The number and types of features significantly influence clustering outcomes. In the referenced plots, clustering quality appears nearly equal for MWD’ and All’, slightly better for the All’ set, which exhibits greater inter-cluster distance and cluster compactness. Including overburden thickness and tunnel width in the All’ set improves all performance metrics. This underscores the critical role of feature selection and the potential for cluster refinement by adding additional features to the MWD-feature base, characterising the rock mass. When adding new features, it is crucial to ensure a causal relationship based on domain knowledge relevant to the problem.

Transitioning from the 48-feature MWD set to the 30-feature MWD_rock set generally results in slightly lower scores. The clustering quality visibly deteriorates, as evidenced by several clusters overlapping. Similarly, the MWD_median set, with only eight features, shows acceptable scores but poor clustering on visual inspection, worse than the 30-feature set. The Gini index for the smaller feature sets indicates high values, suggesting the presence of one or two large clusters. This observation suggests that while high scores can correlate with effective clustering in 3D visualisations, they do not necessarily guarantee it; visual inspection remains essential to validate the results.

HDBSCAN, in contrast to agglomerative clustering, identifies samples as unclustered if they are outliers. Different experimental setups yield varying numbers of these unclustered samples, typically few. Fig. 5a demonstrates the unclustered blue points among the light and darker green colored clusters. Future research should investigate these outliers to potentially uncover patterns relevant to assessing rock mass stability.

Configurations leading to bad or questionable clustering

Clustering results should not rely solely on the Silhouette score and Davies-Bouldin index; all three metrics are important. To illustrate this, we included experiment 12, which is notable for its lack of dimension reduction. Conducting 50 tests to optimise HDBSCAN without dimension reduction highlighted the necessity of this process for effective clustering. Despite yielding the best performance values, this experiment identified only three clusters, one encompassing over 99% of the samples and two small ones. Similar outcomes were observed for K-means and agglomerative clustering without dimension reduction. Interestingly, experiment 10, with dimension reduction but only using two UMAP components, also demonstrates some of the same patterns. Reducing UMAP components from three to two generally worsens the results, as shown in the bad default value results for experiments 0 and 7.

Including the Calinski-Harabasz index as a metric results in the exclusion of experiment 12 and also experiment 11, which utilised PCA for dimension reduction and HDBSCAN for clustering. Nonetheless, experiment 9, which combined PCA with K-means, also maintained a high Calinski-Harabasz score. Despite generally favourable metrics in the tables, K-means often underperform. This becomes apparent upon examining the plots, where multiple clusters overlap visually. The issue likely stems from K-means’ preference for circular clusters, as noted in MacQueen [32]. This clustering method misrepresents rock mass data, which does not naturally form circular clusters, by forcing points into such clusters even when they do not logically fit. The only metric indicating this issue in Table 2 is the Gini index. K-means shows low values (below 0.3) for this index, suggesting nearly equal-sized clusters. However, it is unrealistic for rock mass data to naturally segment into ten clusters of equal sample size in each plot.

4.3 Ordering, structuring and linking the clusters to physical properties of the rock mass

In Table 4, 5, and 6, we present detailed results from experiments 0, 3, and 7. The clusters are arranged by increasing values of the Penetration MWD feature. It is important not to focus on the absolute values of the MWD features as they are scaled, but rather on their relative sizes.

We observe distinct patterns for experiment 0 using the ’All’ feature set. The number of samples in each cluster is evenly spread, correlating with a low Gini score. Each cluster’s median feature values are distinct, indicating differing physical properties. Had the median values been similar across multiple clusters, it would suggest poor clustering relative to rock mass properties. The overburden feature generally decreases as penetration increases, possibly due to reduced rock stress and more dayrock in areas with lower overburden and higher penetration. Cluster 0, with only 22 samples, appears to be an outlier. The rock quality label from the Q-system is assigned to each cluster by majority vote. This is an example of the opposite of the good discrimination seen in the MWD-features. The Q-classes do not follow the clusters, as seen from the B and C-class samples spread out in all clusters. We can visually inspect the same bad alignment in Fig. 4 and the low adjusted rand score for experiment 0 in Table 3. This indicates that the natural clustering of the rock mass based on MWD features does not align with the Q-classes. We have not assigned the rock type label to these clusters since the overburden thickness and tunnel width feature do not have any intuitive causal relationship with rock type, thus using our domain knowledge. The cumulative distribution plots in Fig. 7 illustrate a broad distribution for Feeder pressure. At the same time, penetration and overburden tend to form three groups, confirming the pattern observed in the median values analysis.

Table 4: Experiment 0. Cluster properties for the feature set ‘all’. The label is rock quality (Q-class). Clustering algorithm is HDBSCAN. Values for three important cluster features are given to relate clusters to relatable properties. Clusters are ordered after increasing value for normalised penetration.

cluster	Num	FeedPressNorm	PenetrNorm	Overburden	Label
	samples	Median [bar]	Median [m/min]	[m]
0	22	28.83	-239.11	15.71	E2
6	405	-19.08	-40.64	20.52	B
1	3680	-1.22	-9.67	170.77	B
7	3863	3.94	-9.4	64.18	B
4	8445	1.08	-6.12	63.51	C
8	1489	-0.18	-5.55	20.52	C
5	2169	18.77	13.57	34.71	B
3	2933	-28.79	17.19	24.85	C
2	271	-32.65	22.8	23.93	C

For experiment 3, cluster 3 appears to be an outlier with only 23 samples. The median values are also here clearly separable for each cluster (seen from Table 5 and CDF’s in Fig. 8), with a tendency of three groups for penetration and a well spread of values for feeder pressure and rotation pressure (torque). Rock types have been assigned to each cluster by majority vote. An adjusted rand score of 0.3 indicates a reasonable alignment with the rock type labels. The lowest penetration value for the strong Hornfels rock type and the highest penetration for the weaker Rhomb porfyric rock aligns well with the physical world.

Table 5: Experiment 3. Cluster properties for the feature set ‘mwd’. The label is rock type. Clusterin algorithm is Agglomerative clustering. Dimension reduction is UMAP. Values for three cluster features are given to relate clusters to relatable properties. Clusters are ordered after increasing value for normalised penetration.

cluster	Num	FeedPressNorm	PenetrNorm	RotaPressNorm	Label
	samples	Median [bar]	Median [m/min]	Median [bar]
3	23	28.4	-239.08	29.47	Granittic_gneiss
6	493	-4.25	-43.78	11.16	Hornfels
4	422	-18.67	-40.67	-4.03	Rhomb_porphyry
0	7477	-0.31	-10.93	0.88	Drammensgranite
5	8942	2.19	-4.83	3.61	Granittic_gneiss
1	2365	18.25	12.42	14.04	Granittic_gneiss
2	1823	-28.52	16.44	2.91	Rhomb_porphyry

Twenty-nine samples are not classified in experiment seven (see Table 6). These are visible as purple-coloured points in the scatter plot of clusters in Fig. 5 and highlighted by the blue-coloured outlier distribution in for the CDF plots of Feederpressure, Penetration and Rotationpressure in Fig. 9. What is apparent in experiment 7, compared to the other experiments, is the increased separation of cluster properties seen by all the Median values for MWD-features in Table 6 and CDFs in Fig. 9. This indicates a clear separation of rock mass properties in the ten defined clusters. We have assigned rock type labels to the clusters by majority vote. In Fig. 5b, we have coloured the scatter points with rock type labels, clearly illustrating the alignment of rock type to the clusters. This alignment is confirmed by the highest adjusted Rand score of the experiments of 0.48 (a score of 1.0 would be a perfect match between clusters and rock type). Such an alignment is also confirmed by the study of Hansen et al. [45] which forecasted rock type from MWD-data with high predictive accuracy (above 96%).

Table 6: Experiment 7. Cluster properties for the feature set ‘MWD’. The label is rock type. Clustering algorithm is HDBSCAN. Dimension reduction is UMAP. Values for three cluster features are given to relate clusters to relatable properties. Clusters are ordered after increasing value for normalised penetration.

cluster	Num	FeedPressNorm	PenetrNorm	RotaPressNorm	Label
	samples	Median [bar]	Median [m/min]	Median [bar]
-1	29	27.54	-238.84	22.4	Granittic_gneiss
7	533	-4.53	-43.1	10.56	Hornfels
0	426	-18.67	-40.62	-4.0	Rhomb_porphyry
8	1149	-5.16	-19.17	-6.86	Granittic_gneiss
9	2928	3.59	-11.64	2.69	Drammensgranite
10	3318	-0.44	-7.25	3.28	Drammensgranite
6	8879	2.19	-4.76	3.61	Granittic_gneiss
3	91	1.08	-0.88	-10.0	Granittic_gneiss
1	1688	15.02	5.91	17.05	Granittic_gneiss
4	658	-14.68	13.04	2.43	Rhomb_porphyry
5	1148	-30.55	18.35	3.41	Blackshale
2	698	18.77	27.76	8.8	Augen_gneiss

As for experiment 3, the strong Hornfels are correctly assigned to a cluster with the lowest penetration and the next-to-highest torque. Regarding penetration, we have an overall pattern for big clusters going from Hornfels to Granite, Gneiss and Shale, which seems like an intuitive order. Rhombphorphyry is a lava rock with distinct properties for different lava flows. The samples of Rhombporfhyry are from several lava flows. The fact that Rhombphorphyry has both low and high penetration can be explained by that fact. Augen Gneiss is a special kind of Gneiss with specific properties, with its eye-shaped mega crystals of feldspar in a matrix of quartz and other minerals, quite different from the other classicly layered Gneisses in the other samples. Evaluation only based on penetration is too simple for this strong rock type. Inspecting Feederpressure (note the pronounced CDF-distribution in Fig. 9) and Rotation pressure we see high values, reflecting the strongness of this rock type. We, therefore, can explain its label on cluster number two.

5 Discussing the implications

In this section, we a outline methods for designing a rock mass classification system aimed at different objectives (stability, grouting effort, blastability) using the established clusters as a foundation, b explore a promising approach, and c detail the applications of the established concept.

5.1 Sketching a new system

MWD-data, represented as extracted vectors of statistical metrics from thousands of values, can be clustered to well defined clusters. For developing a rock mass classification system, it is essential to thoroughly examine and understand the characteristics of each cluster in relation to the specific problem. The arrangement of classes may vary from weak to strong, or potentially in other dimensions depending on the problem requirements. To ensure the clusters are relevant, one must tune the clusters to your problem to exhibit desired properties. The resulting clusters will vary depending on the selected features. To obtain meaningful clusters, you can:

Utilise a core set of MWD features that serve as a signature for the rock mass and naturally form effective clusters. Repeat the following steps until the clusters align with your problem in terms of cluster properties, cluster count, and cluster definition (well-defined and separable). Subsequently, label your clusters with descriptive names that reflect the categories in your new classification system.

1.

Incorporate features known to be important for the problem, ensuring they are easily measurable at the excavation site. Potential features include rock cover, tunnel width, measured water inflow in drillholes, distance to a parallel tunnel, soil cover, distance to a fault, point load value, and distance to a lake. By adding the easily collectable features of rock cover and tunnel width to the core MWD clusters in this study, we significantly enhanced the clustering in both visual and metric terms.

Further refinement may require additional features to achieve a sufficiently detailed cluster system, possibly to accommodate different failure mechanisms. The principal rock stress components are likely essential, given their significant impact on expected failure types. Unlike the time-consuming experiments needed to measure rock mechanical properties (E-modulus, UCS, Poisson’s ratio), which also face scaling issues, approximating rock stress might be feasible and sufficiently accurate for considering rock failure principles. The addition of overburden thickness, a critical variable in estimating vertical stress, exemplifies such a tuning step (see Section 4.2).

When clustering your data, you are not constrained by the requirements of supervised learning, where the model must generalise based on features present in both training and prediction phases. This allows the use of any features in your dataset to form clusters, even those not available during real-world prediction. For example, post-blasting metrics like overbreak can be included in clustering but are unsuitable for supervised learning.
2.

Perform the clustering using the algoritmic pipeline outlined in this study.
3.

Investigate the properties of the clusters by analysing the distributions of cluster features. Begin by examining the distribution of already logged values, such as joint sets or point load tests, if they are not included in the cluster feature vector. Subsequently, gather additional data from the sample sites in the dataset.

You then need to map actions to the clusters, such as associating support classes with RMR and the Q-system. One method involves mapping empirical data of rock support from stable sites to the clustering value of that site (e.g. complementary to support classes for stability classes A, B, C…in the Q-system). Alternatively, you could incorporate descriptions of installed support from stable sites into the cluster feature. This approach would allow clusters to encompass information about specific rock masses and the corresponding stable rock support. When a new rock mass is encountered during excavation, the appropriate rock support can be directly inferred from the cluster. Other methods may also be viable, but exploring these is beyond the scope of this study.

To implement the clustering-based system, you need to build a supervised learning model for classifying a new rock mass encountered during excavation to the correct cluster label. Sapronova et al. [15] outlines a relevant approach. Typically, you would use pre-blasting data available for clustering. If the classification process identifies the rock mass as an outlier not covered by the system, the sample should be closely inspected. If no anomalies are found, it indicates the presence of a rock mass not included in the system, suggesting the need for system updates through reclustering.

Using these principles, you can develop a data-driven system tailored to your specific problem. This study focuses on rock mass stability, primarily concerning permanent or advance rock support. Alternatively, another system could concentrate on blasting design, grouting effort (grouting volume, pumping time), or water leakage, selecting features and designing clusters accordingly. Following the successful alignment of rock type to clusters, as detailed in experiment 7 in Section 4.3, we describe a concrete approach below to initiate the design of an intuitive classification system.

5.2 A promising approach

The clear alignment between rock types and clusters, suggests the potential for developing an intuitive rock mass classification system. Ground control engineers intuitively understand and accept the differences in required rock support between rock types, such as blocky versus tightly jointed granite, or a brittle, porous Rhombphorphyry versus a massive one. This understanding can extend to expectations of water problems or drill bit wear. By labelling the clusters (i.e. the rock mass classes) with descriptive names like "Massive Granite", "Blocky Granite", and "Tightly Jointed Granite", it becomes easier to determine the appropriate rock support for each class. Although this approach may result in a large number of classes, such descriptive labels are likely more intuitive and meaningful for engineers than existing alpha numeric systems. An engineer knows the implications of a blocky granite in terms of rock support, water leakage, etc., but may not fully understand what a Q-class C rock mass entails.

5.3 When and where to use the concept

In this study we have clustered MWD-data, framed as vectors of extracted statistical metrics, collected from thousands of values in one meter sections of a full face blasting round. Conceptually the approach might be used on all kinds of MWD-data, from drilling boltholes, exploratory holes, grouting, and blasting holes, provided there is a sufficient quantity of MWD-values to extract the statistical vector that characterises the rock mass. For instance, the vector could be computed for $1\times 1\times 1$ meter cubes of data, $4\times 4\times 4$ meter cubes, a 1-meter long split profile section, or 1-meter sections in single holes. Determining the limits of this application is a subject for future research.

6 Conclusions

Rock mass classification systems are crucial for mapping risks and guiding support and excavation design globally. However, systems developed primarily in the 1970s lack access to modern high-resolution datasets and advanced statistical learning techniques, which limits their effectiveness as decision-support systems. We have demonstrated that a pipeline of dimension reduction and unsupervised machine learning can effectively form well-defined clusters using extracted statistical information from thousands of MWD-data values that represent the whole encountered rock mass for 1-meter sections in infrastructure tunnels. Such clusters can serve as a robust foundation for various rock mass classification systems. The study yields the following conclusions:

•

The pipeline of a min-max scaler, dimension reduction with UMAP, and clustering using HDBSCAN or Agglomerative Clustering provides effective clustering, as observed visually in a 3D scatter plot of three UMAP components, and demonstrated numerically for a range of cluster metrics.
•

Clustering efficiency depends on the number and type of features. Optimal results were obtained with the largest set of 50 features, which included two geometric features. The set of 48 values, only including MWD features also showed effective clustering.
•

Multi-objective optimisation of algorithm parameters was crucial for achieving effective clustering. Over 1000 experiments were conducted, with most failing.
•

Evaluation of clustering effectiveness required rigorous assessment using internal cluster metrics (Silhouette Coefficient Score, Davies Bouldain Index, Calinski Harabasz Index), the Adjusted Rand Score, and the Gini-index, alongside the count of clusters and unclustered samples, complemented by a visual review in a 3D interactive plot.
•

The Gini-index may serve as a final evaluator when other scores are inconclusive, albeit with nuances. We discarded pipelines with K-means due to its low Gini-index (all other scores were favoruable), indicating unnaturally even cluster sizes, coupled with visual inspection. Several experiments were also discarded due to excessively high Gini-index values, typically indicating one dominant cluster and a few smaller ones.
•

Effective clustering necessitates dimension reduction. The non-linear UMAP algorithm was successfully employed, whereas clustering without dimension reduction or using the linear PCA algorithm yielded poor results.
•

Concerns regarding the use of a manifold learning technique like UMAP, due to its tendency of occassionally creating artificial clusters without physical significance, were disproved. Clear, physically meaningful clusters were demonstrated, contrasting with the ineffectiveness of no dimension reduction or the linear PCA method.
•

Of the two label sets available, rock type aligned well with the defined clusters, unlike the rock mass quality labels (Q-class).
•

Clusters with a core of MWD-features, which act as a signature of the encountered rock mass, can be refined with domain-specific features like rock cover and tunnel width.
•

Analysis of feature distributions involved in clustering revealed that the physical properties of each cluster are specific and align well with a range from weak to strong rock types, indicating meaningful clustering.

7 Outlook and future research

The successful clustering of MWD-data into defined groups, the distinct physical properties of each group, the demonstrated feature tuning possibilities, and the alignment of these clusters with rock types suggest that the described methods could be a foundational basis for new data-driven rock mass classification systems, such as stability systems. Rather than developing a universal stability classification system that encompasses all variants—such as different failure systems (squeezing, low stress, swelling), rock mass categories (strong elastic-behaving rocks, weak plastic rocks), and tunnel geometries—it may be more effective to tailor systems to specific problems. This could be achieved by adapting the classification to the rock mass signature derived from MWD values and incorporating additional features relevant to the problem. The systems can be named accordingly, such as "MWD-system-hard-rock" or "MWD-system-squeeze".

Key areas for future research to further develop a data-driven classification system for rock mass stability include: (a1) closely examining the properties of each rock mass cluster, (a2) refining the clusters by incorporating relevant domain features to address specific problems, (a3) ensuring the clusters are organised meaningfully and align with the particular problem, and labelling them with appropriate stability names, (b) mapping actions to these clusters, and (c) training a supervised learning model to accurately classify new rock masses into the appropriate clusters.

Further research is motivated by the potential of such a system to address the limitations of existing classification systems, as outlined in the introduction. A data-driven approach based on MWD-data is likely to overcome several of these limitations:

•

Automated data collection from sensors provides high-resolution coverage of the entire rock mass without the need for human data assessment, addressing limitations such as 1-Human bias, 2-Inconsistent-assessment, 3-Hazardous-inspection, and 4-Non-representative-quantification.
•

Conceptually the MWD-data signature can be established for small volumes, as long as there are enough MWD-values to extract a signature, say for a $1\times 1\times 1$ $m^{3}$ resolution, allowing for fine-grained assessment and targeted rock support, overcomming limitation 5-Not-finegrained-support.
•

Expanding the dataset with new samples (e.g., different geological conditions or tunnel uses) and retraining the cluster models is an efficient and transparent process, addressing limitation 6-Limited-case-studies and 7-Cumbersome-update.
•

Adapting the system for specific cases, such as mine junctions and tunnel openings, might be managed by adding predictive causal features to the dataset and rerunning the clustering, overcoming the limitation 8-Complex-exception-rules.
•

MWD-data for clustering is efficiently collected from all drillholes, from boltholes radially around the tunnel profile, from blastingholes ahead, and from exploratory holes ahead and outside of the profile, overcomming limitation 9-Only-visual-assessment and 10-No-advance-assessment.
•

A clustering approach simplifies the process of defining multiple classes, potentially allowing for more homogeneous rock mass material classification and targeted rock support with specific safety factors, avoiding a convervative approach with few classes, overcomming limitation 11-Non-optimised-rock-support and 12-Correct-rock-support.
•

The flexibility to define multiple classes, add causal features, and tune classes to specific properties may also address limitation 13-Not-representative-failure-modes.

MWD is cost-effective and readily available in tunnelling and mining worldwide, making it a viable foundation for rock mass classification systems. The findings of this study require further validation, as summarised in this section, and detailed in Sections 4 and 5, particularly regarding the properties and structures of the established clusters. If confirmed, the implications for tunnelling and mining could be substantial. By following the procedures outlined in this study, which utilise core clusters based on rock mass signatures from MWD-data, it should be feasible to develop purely data-driven rock mass classification systems with specific targets like stability, grouting effort, or blastability. Predicting cluster labels from data obtained from long exploratory holes would enable rock mass classification several days in advance, enhancing planning capabilities. Such a decision support system would potentially be easily updatable, transparent, reproducible, and free from human bias. Optimised to address the limitations of existing systems, it could significantly impact the industry and society by optimising decisions, reducing the use of steel and concrete, enhancing safety by mitigating risks associated with complex geology, and increasing tunnelling efficiency.

8 Acknowledgement

The authors gratefully acknowledge Thorvald B. Wetlesen and Ivar Oppen from the tunnel software/hardware company Bever Control, which has facilitate the data from the clients Bane NOR, Statens Vegvesen, Nye Veier, and the contractor AF-Gruppen.

9 Ethics declarations

Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Funding
This research received no specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Contributions
Tom F. Hansen: Conceptualisation, Methodology, Software, Investigation, Data Curation, Visualisation, Writing — Original Draft. Arnstein Aarset: Conceptualisation, Writing — Review & Editing.

The use of generative AI in the writing process
While preparing this work, the authors used GPT-4 from OpenAI to improve the readability and language of some paragraphs in the text. After using this tool/service, the authors reviewed and edited the content as needed, and take full responsibility for the content of the publication.

References

\bibcommenthead
Barton et al. [1974] Barton, N., Lien, R., Lunde, J.: Engineering classification of rock masses for the design of tunnel support. Rock Mechanics Felsmechanik Mécanique des Roches 6, 189–236 (1974) https://doi.org/10.1007/BF01239496
Bieniawski [1973] Bieniawski, Z.T.: Engineering classification of jointed rock masses. Civil Engineering = Siviele Ingenieurswese 1973(12), 335–343 (1973)
Erharter et al. [2023] Erharter, G., Hansen, T.F., Qi, S., Bar, N., Marcher, T.: A 2023 perspective on rock mass classification systems. In: Proceedings of the 15th ISRM Congress 2023 & 72nd Geomechanics Colloquium, Salzburg, Austria (2023)
Skretting et al. [2023] Skretting, E., Erharter, G., Chiu, J.K.Y.: Virtual reality based uncertainty assessment of rock mass characterization of tunnel faces. In: Proceedings of the 15th ISRM Congress 2023 & 72nd Geomechanics Colloquium, Salzburg, Austria (2023)
Şen and Sadagah [2003] Şen, Z., Sadagah, B.H.: Modified rock mass classification system by continuous rating. Engineering Geology 67(3), 269–280 (2003) https://doi.org/10.1016/S0013-7952(02)00185-0
Elmo and Stead [2021] Elmo, D., Stead, D.: The role of behavioural factors and cognitive biases in rock engineering. Rock Mechanics and Rock Engineering 54(5), 2109–2128 (2021) https://doi.org/10.1007/s00603-021-02385-3
Palmstrom and Broch [2006] Palmstrom, A., Broch, E.: Use and misuse of rock mass classification systems with particular reference to the q-system. Tunnelling and Underground Space Technology 21(6), 575–593 (2006) https://doi.org/10.1016/j.tust.2005.10.005
Palmstrom [2005] Palmstrom, A.: Measurements of and correlations between block size and rock quality designation (rqd). Tunnelling and Underground Space Technology 20, 362–377 (2005) https://doi.org/10.1016/j.tust.2005.01.005
Pells et al. [2017] Pells, P., Bieniawski, Z., Hencher, S., Pells, S.: Rock quality designation (rqd): time to rest in peace. Canadian Geotechnical Journal 54(6), 825–834 (2017)
Pells and Bertuzzi [2007] Pells, P., Bertuzzi, R.: Limitations of rock mass classification systems. Tunnels and Tunnelling International, 1–11 (2007)
Ranasooriya and Nikraz [2008] Ranasooriya, J., Nikraz, H.: An evaluation of rock mass classification methods used for tunnel support design. In: ISRM International Symposium - Asian Rock Mechanics Symposium, vol. All Days, pp. 5–2008098 (2008)
Morgenroth et al. [2019] Morgenroth, J., Khan, U.T., Perras, M.A.: An overview of opportunities for machine learning methods in underground rock engineering design. Geosciences (Switzerland) 9, 504 (2019) https://doi.org/10.3390/geosciences9120504
Dickmann et al. [2021] Dickmann, T., Hecht-Méndez, J., Krüger, D., Sapronova, A., Unterlaß, P.J., Marcher, T.: Towards the integration of smart techniques for tunnel seismic applications. Geomechanik und Tunnelbau 14, 609–615 (2021) https://doi.org/10.1002/geot.202100046
Dickmann and Hecht-Méndez [2022] Dickmann, T., Hecht-Méndez, J.: Correlating rock support and ground treatment means with in-tunnel seismic data. (2022). https://www.researchgate.net/publication/364316692
Sapronova et al. [2021] Sapronova, A., Unterlas, P.J., Hecht-Méndez, J., Dickmann, T., Marcher, T.: Sparse data transformation for unsupervised clustering for the exploration ahead of tunnel face 2021(1), 1–5 (2021) https://doi.org/10.3997/2214-4609.202120199
Sapronova et al. [2024] Sapronova, A., Hammoud, A., Klein, F., Marcher, T.: Correlational analysis of mwd data for rock mass characterization and risk assessment. In: Proceedings of the Fourth EAGE Digitalization Conference & Exhibition, vol. 2024, pp. 1–4. European Association of Geoscientists & Engineers, Mar 2024 (2024). https://doi.org/10.3997/2214-4609.202439009 . https://doi.org/10.3997/2214-4609.202439009
Hansen et al. [2024] Hansen, T.F., Erharter, G.H., Liu, Z., Torresen, J.: A comparative study on machine learning approaches for rock mass classification using drilling data. Preprint on arXiv (2024) arXiv:2403.10404 [cs.LG]
Fernández et al. [2023] Fernández, A., Sanchidrián, J.A., Segarra, P., Gómez, S., Li, E., Navarro, R.: Rock mass structural recognition from drill monitoring technology in underground mining using discontinuity index and machine learning techniques. International Journal of Mining Science and Technology 33, 555–571 (2023) https://doi.org/10.1016/j.ijmst.2023.02.004
van Eldert et al. [2020] Eldert, J., Schunnesson, H., Johansson, D., Saiang, D.: Application of measurement while drilling technology to predict rock mass quality and rock support for tunnelling. Rock Mechanics and Rock Engineering 53, 1349–1358 (2020) https://doi.org/10.1007/s00603-019-01979-2
He et al. [2019] He, M., Zhang, Z., Ren, J., Huan, J., Li, G., Chen, Y., Li, N.: Deep convolutional neural network for fast determination of the rock strength parameters using drilling data. International Journal of Rock Mechanics and Mining Sciences 123, 104084 (2019) https://doi.org/10.1016/j.ijrmms.2019.104084
Galende-Hernández et al. [2018] Galende-Hernández, M., Menéndez, M., Fuente, M.J., Sainz-Palmero, G.I.: Monitor-while-drilling-based estimation of rock mass rating with computational intelligence: The case of tunnel excavation front. Automation in Construction 93, 325–338 (2018) https://doi.org/10.1016/j.autcon.2018.05.019
Eldert et al. [2017] Eldert, J.V., Schunnesson, H., Johansson, D.: The history and future of rock mass characterisation by drilling in drifting from sledgehammer to pc-tablet. (2017)
Hansen et al. [2024] Hansen, T.F., Liu, Z., Torressen, J.: Building and analysing a labelled measure while drilling dataset from 15 hard rock tunnels in norway. Preprint on SSRN (2024) https://doi.org/10.2139/ssrn.4729646
Pedregosa et al. [2011] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011)
Bishop [2006] Bishop, C.M.: Pattern Recognition and Machine Learning, 1st edn. Springer, New York, NY, USA (2006). https://www.springer.com/gp/book/9780387310732
McInnes et al. [2018] McInnes, L., Healy, J., Melville, J.: Umap: Uniform manifold approximation and projection for dimension reduction. Preprint on arXiv (2018) arXiv:1802.03426 [stat.ML]
Hastie et al. [2009] Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, New York, NY (2009). https://doi.org/10.1007/978-0-387-84858-7
Chari and Pachter [2023] Chari, T., Pachter, L.: The specious art of single-cell genomics. PLOS Computational Biology 19(8), 1–20 (2023) https://doi.org/10.1371/journal.pcbi.1011288
Schubert and Gertz [2017] Schubert, E., Gertz, M.: Intrinsic t-stochastic neighbor embedding for visualization and outlier detection. In: Beecks, C., Borutta, F., Kruger, P., Seidl, T. (eds.) Similarity Search and Applications, pp. 188–203. Springer, Cham (2017)
Maaten and Hinton [2008] Maaten, L.V.D., Hinton, G.: Visualizing data using t-sne. Journal of Machine Learning Research 9, 2579–2605 (2008)
Goodfellow et al. [2016] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning, 1st edn. MIT Press, Cambridge, MA, USA (2016). http://www.deeplearningbook.org
MacQueen [1967] MacQueen, J.: Some methods for classification and analysis of multivariate observations. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability 1(14), 281–297 (1967)
Johnson [1967] Johnson, S.C.: Hierarchical clustering schemes. Psychometrika 32(3), 241–254 (1967)
Campello et al. [2013] Campello, R.J., Moulavi, D., Sander, J.: Density-based clustering based on hierarchical density estimates. Proceedings of the 17th Pacific-Asia conference on knowledge discovery and data mining, 160–172 (2013)
Rousseeuw [1987] Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics 20, 53–65 (1987)
Davies and Bouldin [1979] Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE transactions on pattern analysis and machine intelligence (2), 224–227 (1979)
Calinski and Harabasz [1974] Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Communications in Statistics 3(1), 1–27 (1974)
Hubert and Arabie [1985] Hubert, L., Arabie, P.: Comparing partitions. Journal of classification 2(1), 193–218 (1985)
Vinh et al. [2010] Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research 11(Oct), 2837–2854 (2010)
Plotly [2022] Plotly: Plotly. https://plotly.com. Accessed: 20.04.2024 (2022)
[41] Chen, A., Chow, A., Davidson, A., DCunha, A., Ghodsi, A., Hong, S.A., Konwinski, A., Mewald, C., Murching, S., Nykodym, T., Ogilvie, P., Parkhe, M., Singh, A., Xie, F., Zaharia, M., Zang, R., Zheng, J., Zumar, C.: Developments in mlflow - a system to accelerate the machine learning lifecycle. In: Proceedings of the Fourth International Workshop on Data Management for End-to-End Machine Learning, pp. 1–4. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3399579.3399867
Yadan [2019] Yadan, O.: Hydra - A framework for elegantly configuring complex applications. Github. Accessed: 2024-04-21 (2019). https://github.com/facebookresearch/hydra
Pydantic Contributors [2024] Pydantic Contributors: Pydantic: Data Validation and Settings Management using Python Type Annotations. [Software]. Accessed: 2024-04-21 (2024). https://docs.pydantic.dev/latest/
GNU Project [2024] GNU Project: GNU Make. Accessed: 2024-04-21 (2024). https://www.gnu.org/distros/distros.html
Hansen et al. [2024] Hansen, T.F., Liu, Z., Torresen, J.: Predicting rock type from mwd tunnel data using a reproducible ml-modelling process. Preprint on SSRN (2024) https://doi.org/10.2139/ssrn.4729647
Akiba et al. [2019] Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019)
Ozaki et al. [2022] Ozaki, Y., Tanigaki, Y., Watanabe, S., Nomura, M., Onishi, M.: Multiobjective tree-structured parzen estimator. Journal of Artificial Intelligence Research 73, 1209–1250 (2022)
Deb [2011] Deb, K.: In: Wang, L., Ng, A.H.C., Deb, K. (eds.) Multi-objective Optimisation Using Evolutionary Algorithms: An Introduction, pp. 3–34. Springer, London (2011). https://doi.org/10.1007/978-0-85729-652-8_1
Schunnesson [1998] Schunnesson, H.: Rock characterisation using percussive drilling. International Journal of Rock Mechanics and Mining Sciences 35, 711–725 (1998) https://doi.org/10.1016/S0148-9062(97)00332-X
Navarro et al. [2018] Navarro, J., Sanchidrian, J.A., Segarra, P., Castedo, R., Paredes, C., Lopez, L.M.: On the mutual relations of drill monitoring variables and the drill control system in tunneling operations. Tunnelling and Underground Space Technology 72, 294–304 (2018) https://doi.org/10.1016/j.tust.2017.10.011

\appendixpage

Appendix A Hyperparameters

Table 7: Best parameters from hyperparameter Optimization for three different pipelines including UMAP and a clustering algorithm

Algorithm	Parameter	Value
Experiment 0
HDBSCAN	min_cluster_size	22
	min_samples	13
	metric	chebyshev
	cluster_selection_epsilon	0.340
UMAP	n_neighbors	197
	min_dist	0.0
	n_components	12
	metric	euclidean
Experiment 3
Agglomerative Clustering	n_clusters	7
	metric	cosine
	linkage	average
	distance_threshold	null
UMAP	n_neighbors	46
	min_dist	0.0
	n_components	6
	metric	euclidean
Experiment 7
HDBSCAN	min_cluster_size	83
	min_samples	14
	metric	manhattan
	cluster_selection_epsilon	0.690
UMAP	n_neighbors	21
	min_dist	0.0
	n_components	3
	metric	euclidean

Unsupervised machine learning for data-driven classification of rock mass using drilling data How can a data-driven system handle limitations in existing rock mass classification systems?