Reducing Model Complexity in Neural Networks by Using Pyramid Training Approaches

Kıvanç, Şahım Giray; Şen, Baha; Nar, Fatih; Ok, Ali Özgün

doi:10.3390/app14135898

Open AccessArticle

Reducing Model Complexity in Neural Networks by Using Pyramid Training Approaches

¹

Department of Computer Engineering, Ankara Yıldırım Beyazıt University, Ankara 06010, Türkiye

²

Department of Geomatics Engineering, Hacettepe University, Ankara 06230, Türkiye

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(13), 5898; https://doi.org/10.3390/app14135898

Submission received: 13 May 2024 / Revised: 28 June 2024 / Accepted: 29 June 2024 / Published: 5 July 2024

(This article belongs to the Special Issue Recent Advances in Automated Machine Learning: 2nd Edition)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Throughout the evolution of machine learning, the size of models has steadily increased as researchers strive for higher accuracy by adding more layers. This escalation in model complexity necessitates enhanced hardware capabilities. Today, state-of-the-art machine learning models have become so large that effectively training them requires substantial hardware resources, which may be readily available to large companies but not to students or independent researchers. To make the research on machine learning models more accessible, this study introduces a size reduction technique that leverages stages in pyramid training and similarity comparison. We conducted experiments on classification, segmentation, and object detection tasks using various network configurations. Our results demonstrate that pyramid training can reduce model complexity by up to 70% while maintaining accuracy comparable to conventional full-sized models. These findings offer a scalable and resource-efficient solution for researchers and practitioners in hardware-constrained environments.

Keywords:

convolutional neural networks; size reduction; pyramid training; feature extraction

1. Introduction

In the rapidly evolving field of machine learning, convolutional neural networks (CNNs) are extensively used for visual information processing, playing pivotal roles in areas such as image classification [1,2], object detection [3,4], facial recognition [5], medical imaging [6,7], and autonomous driving [8]. As these tasks grow in complexity, so too do the models designed to tackle them, often resulting in increased model size and computational demands. Consequently, state-of-the-art models require significant hardware resources, which can be prohibitively expensive for individual researchers or small organizations. Notably, deep architectures such as very deep convolutional networks have demonstrated substantial improvements in large-scale image recognition tasks by significantly increasing the network depth [9], and deep residual networks [10] have also demonstrated substantial improvements in large-scale image recognition tasks by significantly increasing the network depth.

Addressing the challenge of CNN model size optimization has spurred a variety of strategies. Techniques such as quantization [11], pruning [12], knowledge distillation [13], weight sharing [14], factorization [15], low-rank approximation [16], and dynamic network surgery [17] have been developed to manage model complexity by reducing size either before or after training. However, these methods often require trade-offs between model performance and efficiency.

This study introduces a novel “pyramid training” methodology, an innovative approach designed to dynamically adjust and optimize CNNs during the training process itself. Pyramid training involves building smaller networks incrementally—starting with a simple model (A) and progressively integrating it with other networks (B, C, etc.) to form larger, more complex structures. This method not only conserves resources but also adapts to increasing accuracy demands without the exponential growth in computational load typically associated with larger models.

A distinctive feature of our methodology is the implementation of a feature size reduction strategy during network combination. By employing a similarity comparison between features using a cosine similarity matrix, our approach identifies and merges similar feature maps. This process not only helps in condensing the network’s complexity efficiently but also ensures that essential information is preserved, optimizing both the model size and its computational efficiency.

The contributions of this study are summarized as follows:

Introduction of the pyramid training paradigm for efficient optimization of CNNs.
Detailed methodology for iterative network construction and integration to progressively achieve desired accuracy.
Implementation of a feature size reduction strategy using a similarity matrix and averaging technique to enhance the network’s architectural efficiency.

The remainder of this paper is structured as follows. Section 2 reviews related work in model size optimization techniques. Section 3 presents the proposed methodology in detail. Section 4 describes the experimental setup and datasets used for evaluating our approach. Section 5 discusses the experimental results, highlighting the effectiveness of our methods. Finally, Section 6 concludes the paper with a summary of findings and suggestions for future research.

2. Related Work

Quantization has garnered significant attention in machine learning due to its potential to reduce model size without substantially compromising accuracy, a crucial factor in on-device applications where balance between accuracy and latency is vital [11,18]. It is also effective in large-scale multimedia retrieval, managing high-dimensional data efficiently [19]. This method has been extensively used to compress machine learning models for deployment on field-programmable gate arrays (FPGAs) [20]. Moreover, quantization plays a crucial role in optimizing machine learning models for deployment on microcontrollers, with studies comparing quantization and pruning strategies for efficient model deployment [21]. Recent advancements like SmoothQuant have introduced accurate and efficient post-training quantization for large language models, achieving significant efficiency improvements while maintaining high accuracy [22]. However, there is a loss of accuracy during the mapping process, which may degrade a model’s performance, especially in highly sensitive applications. Pruning techniques enhance neural network efficiency by removing redundant model components, thereby reducing computational overhead without affecting accuracy. Innovative approaches, such as the formulation by Molchanov et al., have made significant contributions to efficient inference [23,24]. Further, studies by Frankle et al. emphasize pruning’s utility in removing unnecessary network structures post-training, improving performance [25]. Another study proposes the use of model structural pruning techniques to reduce the parameter count of deep learning models, as discussed in [26]. Despite these benefits, pruning can sometimes lead to suboptimal architectures and may require iterative retraining, which can be time-consuming and computationally expensive.

Knowledge distillation involves training a smaller, more efficient “student” model to emulate a larger “teacher” model, thereby accelerating and compressing machine learning models without losing predictive power, proving especially useful in domains like computer vision and natural language processing [13,27]. A multiscale semantic graph mapping (SGM) loss function is proposed to enable more comprehensive knowledge transfer between teacher and student networks at multiple feature scales, as discussed in [28]. However, the effectiveness of knowledge distillation heavily depends on the quality and architecture of the teacher model. If the teacher model is not well-optimized, the student model may inherit its deficiencies. Weight sharing reduces the number of parameters in a neural network, enhancing generalization and reducing overfitting. It has shown particular promise in neural architecture search (NAS), where careful management of weight sharing can significantly impact model performance [14,29]. Nevertheless, weight sharing can sometimes limit the expressiveness of a model, potentially leading to underfitting if not managed properly. Additionally, it can complicate the training process and requires careful hyperparameter tuning.

Low-rank approximation techniques streamline the training of deep neural networks by leveraging the inherent structures within the data. Hsieh et al.’s method of low-rank matrix factorization exemplifies how these techniques facilitate efficient training processes by focusing on the most expressive features of the data [16]. Additionally, the adaptive quantization method introduced by Nakata integrates low-rank approximations to reduce computational burdens, which is crucial in resource-constrained settings [11]. Moreover, low-rank approximations have been explored in diverse fields such as optical coherence tomography image processing, where low tensor train and low multilinear rank approximations are utilized for compression and de-speckling of images [30]. Matrix factorization techniques have also been widely used in recommender systems to improve performance by reducing the dimensionality of data while retaining essential information [31,32]. However, low-rank approximations can sometimes oversimplify the data representation, leading to a loss in model performance.

Dynamic network surgery (DNS), an advanced form of model optimization, combines network pruning and growth strategies to dynamically adjust network architecture during training. This method, pioneered by Liu et al., allows for adaptive model refinement, enhancing performance while maintaining or even reducing computational requirements [17]. It has been effectively applied in medical image segmentation, demonstrating its versatility across different domains [6]. Despite its advantages, DNS can be complex to implement and requires careful tuning of parameters to balance pruning and regrowth effectively. Additionally, it can introduce instability during training if not managed properly.

The existing methods, while effective in various scenarios, often require trade-offs between model size, accuracy, and computational efficiency. Additionally, many techniques depend heavily on the initial model’s quality or involve complex retraining processes. Our study addresses these gaps by introducing a novel pyramid training methodology coupled with a feature size reduction strategy that leverages similarity comparison. This approach dynamically adjusts model complexity during the training process, reduces redundancy in feature maps, and maintains essential information, thereby achieving significant size reduction without substantial loss in accuracy. Furthermore, our method is complementary and can be combined with other techniques such as quantization and pruning to further enhance model efficiency. Our contributions are particularly relevant for environments with limited hardware resources, making advanced machine learning models more accessible and practical for broader applications.

3. Methodology

In this section, the information about proposed and complementary techniques will be given in great detail.

3.1. Pyramid Training

Pyramid training in this study refers to a novel approach that diverges from traditional multiresolution pyramids, commonly used in object detection and other tasks requiring scale variance [33,34]. Our methodology involves iteratively training and combining smaller, less computationally intensive networks to progressively build up a model capable of achieving the accuracy of much larger networks. By initially training a small network (A) and systematically integrating it with additional networks (B, C, etc.), we construct a larger and more capable model without the substantial resource requirements typically associated with large-scale neural networks. The network (A) is designed to capture fundamental patterns and features with minimal computational resources. Once network A has achieved a satisfactory level of performance, we then introduce an additional network (B), which is trained to complement and enhance the capabilities of network A. During this phase, both networks A and B may be trained jointly or sequentially, depending on the specific requirements of the task. As the training progresses, additional networks (C, D, etc.) are incrementally integrated into the system. Each new network is trained to refine and build upon the features learned by the previous networks, effectively creating a hierarchical structure that grows in complexity and accuracy. A significant feature of this approach is the ability to freeze the initial blocks as we add more layers to the pyramid. Once a block has been trained sufficiently, it can be frozen, meaning its parameters are no longer updated during subsequent training iterations. This not only saves computational resources and memory space but also speeds up the training process, as only the new blocks require intensive computation. This hierarchical approach not only optimizes computational resources but also allows for fine-tuning of individual network segments before they are integrated, enhancing overall model performance and efficiency.

3.2. Similarity Comparison

Within the convolutional layers of a CNN, feature maps are generated that encapsulate the essential characteristics of the input data. These maps often exhibit redundancy, especially in deeper layers of the network, which can be computationally wasteful and detract from model efficiency. To address this, we employ a similarity comparison strategy using cosine similarity metrics to identify and consolidate redundant features [14].

Cosine similarity measures the cosine of the angle between two vectors (feature maps in this context), providing a value between −1 and 1 that indicates how similar the feature maps are in terms of their information content. The formula is given by:

\begin{matrix} \cos ine similarity (A, B) = \frac{A \cdot B}{∥ A ∥ \cdot ∥ B ∥} \end{matrix}

(1)

where

A \cdot B

denotes the dot product of vectors A and B, and

∥ A ∥

and

∥ B ∥

represent their respective magnitudes. This measure allows us to quantify the degree of similarity between pairs of feature maps. Feature maps that exhibit a high degree of similarity (cosine similarity close to 1) are considered redundant. These are then combined using averaging techniques, reducing the overall number of feature maps and, consequently, the model’s complexity and computational load.

To illustrate this concept, an example feature map representation from the MNIST dataset is shown in Figure 1. This visualization demonstrates how similar feature maps, which emerge naturally during the training of convolutional layers, can be identified and merged. By highlighting these redundancies through the feature maps of the well-known digit images, we underline the potential for significant reductions in network complexity through our similarity comparison technique.

Let us assume we have a set of 16 features, represented as vectors

F_{1}, F_{2}, \dots, F_{16}

. To evaluate the similarity between these feature vectors, we calculate the cosine similarity for each pair, which measures the cosine of the angle between two vectors in a multi-dimensional space. The cosine similarity is particularly useful as it normalizes the feature scale, focusing solely on the information content.

\begin{array}{c} F_{1} & F_{2} & \dots & F_{16} \\ F_{1} & 1 & 0.85 & \dots & 0.45 \\ F_{2} & 0.85 & 1 & \dots & 0.50 \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ F_{16} & 0.45 & 0.50 & \dots & 1 \end{array}

As an alternative to cosine similarity, Euclidean distance can also be used to measure the similarity between features. Euclidean distance calculates the straight-line distance between two points in a multidimensional space, making it a geometric measure of similarity. It is defined by the following equation:

\begin{matrix} Euclidean Distance (A, B) = \sqrt{\sum_{i = 1}^{n} {(A_{i} - B_{i})}^{2}} \end{matrix}

(2)

where A and B represent the feature vectors, n is the number of dimensions, and

A_{i}

and

B_{i}

are the

i

th components of vectors A and B, respectively. Unlike cosine similarity, which normalizes the vectors and focuses solely on their orientation, Euclidean distance takes into account both the magnitude and direction of the vectors. This characteristic means that Euclidean distance is sensitive to the scale of the vector components, which can be both an advantage and a limitation depending on the application.

While Euclidean distance provides a direct measure of the physical ’distance’ between feature vectors, it may not be as effective in high-dimensional spaces where different features may vary widely in scale or where the direction of the vectors is more informative than their magnitude. In contrast, cosine similarity, by normalizing vector magnitude, offers resilience to variations in scale and is often preferred in applications such as text analytics and other forms of semantic similarity where the direction of the feature vector (i.e., the angle between vectors) is more critical than their length.

Therefore, the choice between Euclidean distance and cosine similarity should be guided by the specific characteristics of the dataset and the requirements of the application. Euclidean distance may be more suitable for datasets with uniform feature scales and where the magnitude of the data points carries significant meaning. On the other hand, cosine similarity may be the better option when dealing with features of varying magnitudes or when the orientation of the data points is of greater importance than their absolute values.

3.3. Size Reduction

Feature size reduction, also known as feature selection or dimensionality reduction, plays a vital role in machine learning by enhancing model efficiency and generalization capabilities. A primary advantage of this technique is its ability to mitigate the curse of dimensionality. High-dimensional feature spaces often lead to increased computational complexity and a higher risk of overfitting.

In our approach, a similarity matrix facilitates the identification of redundant features, guiding the selection of feature pairs that exceed a predefined similarity threshold. These pairs are deemed combinable. The merging of similar features is accomplished by replacing the original features with their mean or another suitable aggregation method, effectively reducing the number of features iteratively until no additional pairs meet the threshold. This results in a dataset with a minimized set of features, maintaining essential information while eliminating redundancies.

To manage the dimensionality reduction dynamically, we introduce a ’reduction scale’ parameter. This parameter adjusts according to the evolving feature set, starting low when feature numbers are small and incrementing as the feature count increases. This adaptive mechanism allows the reduction process to become more aggressive as the complexity of the dataset grows, thereby ensuring efficient handling of varying feature spaces and achieving more optimized reductions as necessary.

The mean method is commonly used in this reduction process:

\begin{matrix} μ = \frac{1}{n} \sum_{i = 1}^{n} F_{i} \end{matrix}

(3)

where n is the number of similar features to be combined, and

F_{i}

represents the ith feature. While this method is straightforward and maintains the central tendencies of the features, it may not account for variations in feature importance and is susceptible to outliers. An alternative is the weighted mean method, which provides a more nuanced combination:

\begin{matrix} μ = \frac{\sum_{i = 1}^{n} w_{i} \cdot F_{i}}{\sum_{i = 1}^{n} w_{i}} \end{matrix}

(4)

where

w_{i}

represents the weight assigned to the ith feature, allowing for differential importance among features. This method enhances adaptability and allows for customization according to the specific characteristics of the dataset.

Both methods serve to consolidate multiple features into a single, composite feature. The choice between these methods should be dictated by the dataset’s characteristics and the specific goals of the analysis. A systematic approach involving experimentation and validation is essential to determine the most effective method for a given scenario.

Let us assume we have features of shape

(B, N, H, W)

, where B represents the batch size, N represents the feature size, H represents the height, and W represents the width. The general formulation of feature size reduction for both mean and weighted mean methods can be seen below.

\begin{matrix} Mean Reduction {(X)}_{i, j} = \frac{1}{N} \sum_{f = 1}^{N} X_{i, j, f} \end{matrix}

(5)

\begin{matrix} Weighted Mean Reduction {(X, W)}_{i, j} = \frac{\sum_{f = 1}^{N} W_{f} \cdot X_{i, j, f}}{\sum_{f = 1}^{N} W_{f}} \end{matrix}

(6)

where W represents the weight vector for each feature, and the equation calculates the weighted mean reduction at each spatial location (i,j), considering the importance assigned by the weights. In both equations,

X_{i, j, f}

denotes the feature map value at position (i,j) for the f-th feature, and n is the total number of features in the tensor. The weighted mean reduction incorporates the weights,

W_{f}

, associated with each feature for a more nuanced reduction process.

3.4. Quantization

Quantization in machine learning is a critical technique employed to reduce model sizes, thereby enhancing memory efficiency and accelerating inference times. This process involves approximating numerical values with fewer bits, which decreases the precision of parameters but significantly boosts computational efficiency.

3.4.1. Fixed-Point Quantization

Fixed-point quantization simplifies the representation of weights by using a fixed number of bits for both integer and fractional parts. This method is characterized by the use of a scaling factor,

2^{k}

, where k indicates the number of fractional bits. Weights are then quantized by rounding them to the nearest fixed-point value based on this scaling factor. The mathematical formulation is as follows:

\begin{matrix} Q (x) = Round (x \times 2^{k}) \times 2^{- k} \end{matrix}

(7)

Fixed-point quantization is particularly beneficial for hardware implementations that support fixed-point arithmetic, leading to more efficient computations than floating-point operations.

3.4.2. Integer Quantization

Integer quantization further simplifies the model by quantizing weights to integer values, which can be extremely beneficial for reducing memory footprint. This method converts weights by rounding them to the nearest integer, thus simplifying computations and storage requirements:

\begin{matrix} Q (x) = Round (x) \end{matrix}

(8)

This technique is often used in scenarios where extreme memory constraints exist, such as in embedded systems or mobile devices where storage and processing power are limited.

3.4.3. Vector Quantization

Vector quantization groups weights into vectors and quantizes each vector to a centroid that represents the group. This method reduces the number of unique weight values by assigning multiple weights to a single centroid, thus compacting the model’s memory usage. The quantization process is defined by:

\begin{matrix} Q (x_{i}) = {argmin}_{j} | x_{i} - c_{j} | \end{matrix}

(9)

Vector quantization is especially useful in neural networks where redundancy across weights can be leveraged to minimize the overall model size without substantial loss in accuracy. It is suitable for applications requiring models with a small footprint but where some loss of precision is acceptable.

3.5. Experimental Fields

In this section, we explore the application of feature size reduction across different machine learning fields, each presenting unique challenges and opportunities for efficiency improvements. We delve into classification, semantic segmentation, and object detection to evaluate the impact of our proposed methodologies.

3.5.1. Classification

Classification is a foundational task in machine learning, pivotal for decision-making processes in intelligent systems. It involves training models to accurately assign predefined labels to input data instances. We apply feature size reduction to a convolutional neural network (CNN) specifically designed for this purpose, which includes layers for hierarchical feature extraction, spatial downsampling, and high-level abstraction. Our primary objective is to evaluate the influence of feature size reduction on model performance and efficiency, as illustrated in Figure 2, which depicts the integration of pyramid training and feature size reduction within the network.

3.5.2. Semantic Segmentation

Semantic segmentation involves detailed pixel-wise classification, crucial for tasks such as medical image analysis. We employ the U-Net architecture [35], known for its efficacy in detailed segmentation tasks, to implement feature size reduction. The U-shaped architecture of U-Net, which includes skip connections that help preserve spatial hierarchies, poses unique challenges when integrating feature size reduction. These complexities necessitate adaptations in our pyramid training approach to ensure that crucial spatial information is not lost during feature reduction. The tailored approach and its architectural nuances are detailed in Figure 3.

3.5.3. Object Detection

Object detection requires identifying and localizing objects within images, a task we approach by adapting the Faster R-CNN framework [36]. This model uses a two-stage process involving region proposal networks (RPNs) and deep CNNs for refining and classifying proposals. We modify the Faster R-CNN’s backbone to incorporate feature size reduction within its convolutional layers, carefully balancing the trade-offs between detection accuracy and computational demands. The modifications aim to maintain detection performance while enhancing processing speed and reducing memory usage, as detailed in Figure 4.

3.6. Individual Blocks

In the pyramid training framework, the term ‘block’ refers to the smallest trainable and combinable unit within the network. These blocks are fundamental components that serve as the building blocks of our modular network architecture. Blocks in pyramid training offer a modular and iterative approach to constructing neural networks. Each block can be independently trained and optimized before being combined with others to form a more complex and capable network structure. This modular nature not only facilitates fine-tuning at a granular level but also simplifies the management of computational resources. While the typical strategy involves training individual blocks separately and then assembling them, an alternative approach is to combine blocks first and then train the resultant larger structure. This method allows for the joint optimization of interactions between blocks, potentially leading to better integrated system performance. However, each approach has its trade-offs, with the separate training method providing more control over individual block optimization and the combined training method potentially achieving better overall system integration. Using a block-based design addresses several challenges in scaling neural networks. It allows for incremental improvements and updates without the need to retrain the entire network, which can be resource-intensive. Additionally, it provides a systematic way to expand network capacity and complexity by adding new blocks, thereby enabling the construction of a pyramid-like architecture that can achieve comparable accuracy with potentially lower computational overhead. Detailed specifications of two sample blocks used in our experiments are provided in Table 1 and Table 2 and their combination is provided in Table 3. For a comprehensive understanding of how these blocks are integrated into the general pyramid training architecture, refer to Section 3.7.

3.7. Method Description

In the pyramid training framework, we introduce a novel approach for dynamically optimizing neural network models during the training process. This methodology involves training smaller network blocks independently and then integrating them to form a more complex and capable model. This section details the steps involved in our methodology and highlights the innovative aspects of our approach.

We start by training two foundational blocks, Block 1 and Block 2, which are designed to capture and process diverse features from the input data. Initially, Block 1 is trained to extract primary features and patterns essential for the initial stages of learning. Once Block 1 is sufficiently trained, its fully connected (FC) layers are discarded to preserve spatial information when integrating with Block 2. This step ensures that essential spatial relationships are maintained, enhancing the overall model performance.

To address potential dimensional mismatches and redundancies in feature maps, we employ a similarity comparison strategy. This involves constructing a similarity matrix using cosine similarity metrics to identify highly correlated feature maps. Redundant features, identified by high similarity scores, are combined using averaging techniques. This process effectively reduces the feature map count, ensuring compatibility between blocks and enhancing the network’s capacity to handle complex patterns.

The similarity comparison and feature size reduction steps are critical innovations in our approach. They allow us to dynamically adjust the model complexity during training, reducing redundancies and maintaining essential information. This strategy not only optimizes computational resources but also enhances the adaptability and efficiency of the model.

The combined architecture, consisting of Block 1 and Block 2, undergoes further training to refine and enhance the feature representations. This iterative process continues, progressively adding and integrating more blocks (Block 3, Block 4, etc.) until the network achieves the desired level of accuracy. Each cycle of this process enriches the network’s feature understanding and optimizes its overall performance.

A key feature of our pyramid training methodology is the ability to freeze initial blocks once they are sufficiently trained. Freezing these blocks saves computational resources and memory space, speeding up the training process, as only the new blocks require intensive computation. This hierarchical approach allows for fine-tuning individual network segments before they are integrated, ensuring a scalable and resource-efficient solution.

Our method allows for combination with quantization methods, which is why we have explained quantization in great detail in the following sections. By integrating these quantization techniques, such as fixed-point, integer, and vector quantization, we can significantly reduce model size and computational demands. This integration makes our approach versatile and efficient for various practical applications, particularly in resource-constrained environments.

Figure 2 illustrates the structured progression of combining and refining blocks in our pyramid training model.

Through this methodical approach, the pyramid training model leverages both hierarchical learning and similarity-based feature size reduction to construct a robust and efficient network. This process not only enhances the network’s adaptability to new and complex datasets but also optimizes computational resources, making it an effective solution for scaling deep learning architectures.

4. Experiments

This section elucidates the outcomes of the experiments conducted to validate our proposed methods. It details the datasets used, the tasks performed, and the setup for the experiments.

4.1. Dataset and Task

The experiments span three principal areas: classification, segmentation, and object detection.

Classification: we utilized the CIFAR-10 dataset [37], a benchmark for image classification tasks, to evaluate the effectiveness of our pyramid training and feature reduction techniques.
Segmentation: for segmentation tasks, the KITTI dataset [38], which is broadly employed for object detection, tracking, and segmentation, was used.
Object detection: the Penn-Fudan Database for Pedestrian Detection [39] was chosen for object detection experiments. This dataset, comprising diverse urban scenes, focuses on pedestrian detection, offering a challenging testbed for our methodologies.

4.2. Evaluation Metric

Model size, measured in MegaBytes (MB) [10], is a crucial metric for assessing the feasibility of deploying machine learning models, particularly in environments with limited storage capacity, such as mobile and embedded devices. Reducing the model size directly impacts the ease of deployment and the ability to run models on devices with constrained memory. For many practical applications, especially those involving edge devices or mobile platforms, the size of the model is a critical factor determining whether a model can be used effectively. While model size reflects the storage requirements, it does not fully capture the computational complexity of the models.

Floating-point operations per second (FLOPS) [40] is another critical metric that measures the computational complexity of a model. FLOPS indicates the number of floating-point operations a model performs during inference. This metric is essential for understanding the computational demands of a model, particularly in real-time applications where processing speed is crucial. FLOPS provides insight into the efficiency of a model in terms of computational resources, complementing the storage efficiency indicated by the model size.

By considering both model size and FLOPS, we aim to provide a comprehensive evaluation of the models’ practicality for deployment in resource-constrained environments. Model size is critical for determining the feasibility of storing a model on devices with limited memory, while FLOPS offers a measure of the computational effort required to run a model. Together, these metrics provide a balanced view of the trade-offs involved in optimizing models for both storage and computational efficiency.

The primary focus of this study is to explore the storage efficiency and the feasibility of deploying these models in real-world scenarios where memory constraints are a significant concern. However, we also recognize the importance of computational efficiency, particularly for applications on edge devices or mobile platforms. By reducing both the model size and FLOPS, we aim to enhance the overall deployability of machine learning models in various practical settings.

4.3. Experimental Setup

Experiments were executed on two platforms:

A local GPU setup powered by an RTX-3060 with 6 GB of memory.
A cloud-based Google Colab session configured to utilize the T4 GPU runtime.

The following metrics were tracked across all experiments to assess performance:

Test set accuracy for classification tasks.
Training loss and validation loss for segmentation tasks.
Various loss metrics (loss box reg, loss classifier, loss RPN box reg, and total loss) for object detection tasks.
Training time and model size, measured in megabytes (MB) or gigabytes (GB).
Floating-point operations per second (FLOPS) to measure computational complexity.

4.4. Experimental Configurations

Different network configurations were tested to examine the impact of feature size reduction and quantization:

BaseNetwork: this configuration serves as a control, comprising Block 1 and Block 2 combined without any feature size reduction steps.
ReducedNetwork: incorporates a feature size reduction step between Conv2 and Conv3 layers. This network undergoes iterative pyramid training, progressively refining the model and reducing its complexity by minimizing redundant feature representations.
Quantization: both BaseNetwork and ReducedNetwork are subjected to quantization to compress model size and enhance computational efficiency.
Reduced+Quantization: this setup investigates the synergistic effect of combining feature size reduction with quantization, focusing on the rate of model size reduction and operational efficiency.
U-Net: employed for semantic segmentation, this network’s architecture is designed to efficiently segment images by progressively reducing and then expanding the resolution of intermediary representations.
U-Net-Reduced: a simplified variant of U-Net, where feature size reduction techniques are applied to decrease the complexity and computational demands of the network while aiming to maintain effective segmentation performance.
Faster R-CNN: utilizes a two-stage approach for object detection, combining region proposal networks with a CNN classifier. The backbones used in these experiments include CustomBackbone and VGG16, adapted for this framework.
Faster-RCNN-Reduced: a reduced-complexity version of Faster R-CNN, where feature size reduction techniques are implemented within the convolutional layers of the backbone to explore potential benefits in scenarios with limited computational resources.

These configurations are designed to systematically evaluate the effectiveness of our methods across different network architectures and tasks. The results of these experiments, detailed in the subsequent sections, validate the advantages of our approach in terms of efficiency and performance.

4.5. Results

The outcomes of our experiments are comprehensively documented in Table 4, Table 5 and Table 6, which summarize the performance across various configurations and tasks. To further illustrate the findings, Figure 5 provides a visual comparison of model performance across different metrics, including memory size (MB), FLOPS, training time (seconds), and test loss/accuracy fields. This figure highlights the impact of our proposed methods on the efficiency and performance of neural network models.

In the classification task in Table 4, BaseNetwork achieves a test set accuracy of 77.31% with a model size of 0.65 MB and 23.08M FLOPS. ReducedNetwork, designed to decrease complexity, has a slightly lower accuracy of 74.77% but significantly reduces the model size to 0.17 MB and FLOPS to 12.7M. When quantization is applied, the BaseNetwork’s model size is halved to 0.32 MB, maintaining similar FLOPS, while the Reduced+Quantization setup shows the smallest size at 0.10 MB and retains the reduced FLOPS of 12.7M.

For the segmentation task in Table 5, the U-Net model presents a train loss of 1.004, test loss of 1.062, and validation loss of 0.8484, with a model size of 31 MB and 48.4B FLOPS. The U-Net-Reduced model, implementing feature size reduction, achieves a lower train loss of 0.6522 and test loss of 1.027 but has a slightly higher validation loss of 0.8945. The model size is significantly reduced to 7 MB, and FLOPS is also reduced to 12.1B, demonstrating the efficiency of the reduction technique.

In the object detection task in Table 6, the CustomBackbone model shows an overall loss of 1.14, with the specific losses for box regression and classifier being 0.3894 and 0.4018, respectively. This model requires 15.84 GB of memory and has 8.4B FLOPS. The CustomBackbone-Reduced variant, though it incurs a slightly higher overall loss of 1.223, reduces memory allocation to 10.31 GB and FLOPS to 5.6B, indicating better efficiency. The VGG16 model, used for comparison, achieves lower losses but requires significantly more resources, with 214.2B FLOPS and 16.04 GB of memory. The VGG16-Reduced model, while showing an increase in losses, brings down the FLOPS to 208B and memory usage to 9.96 GB, reflecting the trade-offs involved in model size reduction.

The FLOPS and model size results across all tasks highlight the efficiency gains from applying the proposed feature size reduction techniques. These reductions in FLOPS and model sizes illustrate the computational savings, which are particularly beneficial in environments with limited resources. The balance between model accuracy, size, and computational efficiency demonstrates the practicality and effectiveness of the proposed methods in various machine learning tasks.

5. Discussion

The primary contribution of our study is the introduction of a novel pyramid training methodology combined with a feature size reduction strategy. Our goal is to propose a versatile technique that can enhance various existing models by making them more efficient in terms of computational resources and storage requirements. While our experiments demonstrate the effectiveness of our technique on specific tasks (classification, segmentation, and object detection), the purpose is not to claim new state-of-the-art results in these domains but to show how our method can be integrated with different models to improve their efficiency. While our experiments were conducted on relatively smaller datasets and networks, the proposed pyramid training methodology and feature size reduction strategy are designed to be inherently scalable. The incremental training approach allows for the gradual addition of complexity, which can be extended to larger networks. Similarly, the feature size reduction strategy, which leverages similarity comparison, can efficiently handle higher-dimensional data without a significant increase in computational overhead.

6. Conclusions

Our exploration of the pyramid training schema, along with the strategic computation of feature size reduction using similarity comparison, has yielded valuable insights into enhancing neural network architectures for classification tasks. The iterative pyramid training process, combined with feature size reduction, has proven to be an effective strategy in refining the model’s adaptability and computational efficiency. ReducedNetwork, which integrates these methodologies, demonstrated a commendable reduction in model size while maintaining test set accuracy comparable to BaseNetwork.

The application of quantization, a model compression technique, to both BaseNetwork and ReducedNetwork, further underscored the versatility of our approach. The combination with quantization in the Reduced+Quantization configuration achieved the smallest model size, with a sixfold reduction rate. Although this considerable reduction in size introduced a marginal decrease in test set accuracy, it illustrates the potential for substantial model optimization.

The incorporation of similarity comparison in feature size reduction provides several advantages:

Efficiency: it significantly reduces redundancy and enhances computational efficiency, crucial for environments with limited computational resources.
Information preservation: this method ensures that essential features are retained, which is vital for maintaining the effectiveness of the model.
Scalability: similarity comparison is flexible and can be applied across different network layers and architectures, making it widely applicable.
Energy conservation: particularly beneficial for mobile and embedded systems, reducing computational demands helps in conserving energy.

For segmentation tasks, U-Net and its reduced variant, U-Net-Reduced, were evaluated. U-Net-Reduced demonstrated improved efficiency with a significantly lower model size of 7 MB compared with original U-Net’s 31 MB, showcasing the effectiveness of feature dimensionality reduction. However, this reduction in size was accompanied by a slight increase in validation loss, highlighting the typical trade-offs between model complexity and performance.

In object detection tasks, CustomBackbone and VGG16, along with their reduced versions, exhibited notable reductions in model size and memory allocation, demonstrating their potential for resource-efficient deployment. The reduced variants showed increased losses, which underscores the delicate balance required when scaling down model complexity to fit resource constraints.

Ultimately, the trade-offs between model size reduction and potential increases in loss scores are critical considerations for deploying complex networks in resource-constrained environments. The experimental results, especially with the reduced variants of U-Net and CustomBackbone, demonstrate that feature dimensionality reduction techniques can significantly decrease model size, enabling the operation of sophisticated networks in settings with limited memory and computational resources. This trade-off suggests that sacrificing marginal accuracy for resource efficiency could be a viable strategy, particularly when the computational demands of larger models are prohibitive in real-world applications.

Through this study, the effectiveness of similarity comparison in feature size reduction has been highlighted as not only enhancing efficiency but also ensuring the practical deployment of advanced neural architectures in various computational environments. Looking ahead, refining feature size reduction algorithms to enhance efficiency without sacrificing accuracy could yield more robust models. Additionally, future efforts could also focus on optimizing the training process to reduce the considerable increase in training times associated with our method.

Author Contributions

Conceptualization, methodology, writing paper, Ş.G.K.; formal analysis, supervision, B.Ş.; providing research ideas, supervision, F.N.; review, supervision, A.Ö.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data generated or analyzed during this study are included in this article and are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar] [CrossRef]
Sladojevic, S.; Arsenovic, M.; Anderla, A.; Culibrk, D.; Stefanovic, D. Deep Neural Networks Based Recognition of Plant Diseases by Leaf Image Classification. Comput. Intell. Neurosci. 2016, 2016, 3289801. [Google Scholar] [CrossRef] [PubMed]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
Zhao, Z.; Zheng, P.; Xu, S.-T.; Wu, X. Object Detection with Deep Learning: A Review. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 3212–3232. [Google Scholar] [CrossRef]
Khan, A.; Sohail, A.; Zahoora, U.; Qureshi, A. A Survey of the Recent Architectures of Deep Convolutional Neural Networks. Artif. Intell. Rev. 2020, 53, 5455–5516. [Google Scholar] [CrossRef]
Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.; Ciompi, F.; Ghafoorian, M.; van der Laak, J.A.W.; van Ginneken, B.; Sanchez, C.I. A survey on deep learning in medical image analysis. Med. Image Anal. 2017, 42, 60–88. [Google Scholar] [CrossRef] [PubMed]
Singh, S.; Wang, L.; Gupta, S.; Goli, H.; Padmanabhan, P.; Gulyas, B. 3D Deep Learning on Medical Images: A Review. Sensors 2020, 20, 5097. [Google Scholar] [CrossRef] [PubMed]
Junaid, M.; Szalay, Z.; Török, Á. Evaluation of Non-Classical Decision-Making Methods in Self-Driving Cars: Pedestrian Detection Testing on a Cluster of Images with Different Luminance Conditions. Energies 2021, 14, 7172. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Nakata, K.; Miyashita, D.; Deguchi, J.; Fujimoto, R. Adaptive Quantization Method for CNN with Computational-Complexity-Aware Regularization. In Proceedings of the 2021 IEEE International Symposium on Circuits and Systems (ISCAS), Daegu, Republic of Korea, 22–28 May 2021; pp. 1–5. [Google Scholar] [CrossRef]
Cai, Y.; Hua, W.; Chen, H.; Wei, G.Y.; Zhang, X.; Brooks, D.; Wu, C.J. Structured Pruning Is All You Need for Pruning CNNs at Initialization. arXiv 2022, arXiv:2203.02549. [Google Scholar]
Hinton, G.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-Based Learning Applied to Document Recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Howard, A.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Sainath, T.N.; Kingsbury, B.; Sindhwani, V.; Arisoy, E.; Ramabhadran, B. Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, BC, Canada, 26–31 May 2013; pp. 6655–6659. [Google Scholar] [CrossRef]
Guo, Y.; Yao, A.; Chen, Y. Dynamic Network Surgery for Efficient DNNs. Adv. Neural Inf. Process. Syst. 2016, 29, 1379–1387. [Google Scholar]
Jacob, B.; Kligys, S.; Chen, B.; Zhu, M.; Tang, M.; Howard, A.; Adam, H.; Kalenichenko, D. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1–9. [Google Scholar] [CrossRef]
Chen, J.; Cheung, W.K. Similarity Preserving Deep Asymmetric Quantization for Image Retrieval. Proc. Aaai Conf. Artif. Intell. 2019, 33, 8183–8190. [Google Scholar] [CrossRef]
Liu, S.; Liu, L.; Yi, Y. Quantized Reservoir Computing for Spectrum Sensing with Knowledge Distillation. IEEE Trans. Cogn. Dev. Syst. 2023, 15, 88–99. [Google Scholar] [CrossRef]
Loureiro, R.B.; Sá, P.H.M.; Lisboa, F.V.N.; Peixoto, R.M.; Nascimento, L.F.S.; Bonfim, Y.d.S.; Cruz, G.O.R.; Ramos, T.d.O.; Montes, C.H.R.L.; Pagano, T.P.; et al. Efficient Deployment of Machine Learning Models on Microcontrollers: A Comparative Study of Quantization and Pruning Strategies. Blucher Eng. Proc. 2023, 1–15. [Google Scholar] [CrossRef]
Xiao, G.; Lin, J.; Seznec, M.; Wu, H.; Demouth, J.; Han, S. SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models. In Proceedings of the 40th International Conference on Machine Learning (ICML), Honolulu, HI, USA, 23–29 July 2023; pp. 1–15. [Google Scholar]
Molchanov, P.; Tyree, S.; Karras, T.; Aila, T.; Kautz, J. Pruning Convolutional Neural Networks for Resource Efficient Inference. In Proceedings of the International Conference on Learning Representations (ICLR), San Juan, Puerto Rico, 2–4 May 2016; pp. 1–13. [Google Scholar]
Li, H.; Kadav, A.; Durdanovic, I.; Samet, H.; Graf, H.P. Pruning Filters for Efficient ConvNets. arXiv 2016, arXiv:1608.08710. [Google Scholar]
Frankle, J.; Dziugaite, G.K.; Roy, D.; Carbin, M. Stabilizing the Lottery Ticket Hypothesis. arXiv 2019, arXiv:1903.01611. [Google Scholar]
Xie, G.; Hou, G.; Pei, Q.; Huang, H. Lightweight Privacy Protection via Adversarial Sample. Electronics 2024, 13, 1230. [Google Scholar] [CrossRef]
Gou, J.; Yu, B.; Maybank, S.J.; Tao, D. Knowledge Distillation: A Survey. Int. J. Comput. Vis. 2021, 129, 1789–1819. [Google Scholar] [CrossRef]
Yuan, Z.; Yang, Z.; Ning, H.; Tang, X. Multiscale Knowledge Distillation with Attention Based Fusion for Robust Human Activity Recognition. Sci. Rep. 2024, 14, 12411. [Google Scholar] [CrossRef]
Ding, X.; Zhang, X.; Han, J.; Ding, G. Diverse Branch Block: Building a Convolution as an Inception-like Unit. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 10886–10895. [Google Scholar]
Kopriva, I.; Shi, F.; Lai, M.M.; Štanfel, M.; Chen, H.; Chen, X. Low Tensor Train and Low Multilinear Rank Approximations of 3D Tensors for Compression and De-Speckling of Optical Coherence Tomography Images. Phys. Med. Biol. 2023, 68, 125002. [Google Scholar] [CrossRef] [PubMed]
Koren, Y.; Bell, R.M.; Volinsky, C. Matrix factorization techniques for recommender systems. Computer 2009, 42, 30–37. [Google Scholar] [CrossRef]
Rendle, S. Factorization machines. In Proceedings of the 2010 IEEE International Conference on Data Mining, Sydney, Australia, 13–17 December 2010; pp. 995–1000. [Google Scholar] [CrossRef]
Gibson, J.D.; Oh, H. Mutual Information Loss in Pyramidal Image Processing. Information 2020, 11, 322. [Google Scholar] [CrossRef]
Mao, J.; Niu, M.; Bai, H.; Liang, X.; Xu, H.; Xu, C. Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection. arXiv 2021, arXiv:2109.02499. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention (MICCAI), Munich, Germany, 5–9 October 2015; pp. 1–9. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 7–12 December 2015; pp. 1–10. [Google Scholar]
Krizhevsky, A.; Nair, V.; Hinton, G. CIFAR-10. In Proceedings of the Technical Report; Canadian Institute for Advanced Research: Toronto, ON, Canada, 2009; pp. 1–4. [Google Scholar]
Geiger, A.; Lenz, P.; Urtasun, R. The KITTI Vision Benchmark Suite. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, 16–21 June 2012; pp. 1–8. [Google Scholar]
Wang, L.; Sun, J.; Song, G.; Liu, Z.; Ma, K.; Hu, F. Object Detection Combining Recognition and Segmentation. In Proceedings of the 8th Asian Conference on Computer Vision (ACCV), Tokyo, Japan, 18–22 November 2007; pp. 189–199. [Google Scholar]
Jouppi, N.P.; Young, C.; Patil, N.; Patterson, D.; Agrawal, G.; Bajwa, R.; Bates, S.; Bhatia, S.; Boden, N.; Borchers, A.; et al. In-Datacenter Performance Analysis of a Tensor Processing Unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA), Toronto, ON, Canada, 24–28 June 2017; pp. 1–12. [Google Scholar]

Figure 1. Visual representation of feature maps extracted from a convolutional layer of a neural network trained on MNIST dataset [14]. These maps highlight the potential for similarity-based feature reduction, as several maps exhibit similar activation patterns.

Figure 2. The general architecture of pyramid training used in classification task.

Figure 3. The application of feature dimensionality reduction to U-Net architecture.

Figure 4. The application of feature dimensionality reduction to Fast R-CNN architecture.

Figure 5. Model performance comparison charts.

Table 1. Block 1 model details.

Layer	Type	Input Channels	Output Channels	Kernel Size	Stride	Padding
Conv1	Conv2d	3	16	3 × 3	1	1
Pool1	MaxPool2d	-	-	2 × 2	2	0
Conv2	Conv2d	16	32	3 × 3	1	1
Pool2	MaxPool2d	-	-	2 × 2	2	0
FC1	Linear	-	128	-	-	-
FC2	Linear	-	10	-	-	-

Table 2. Block 2 model details.

Layer	Type	Input Channels	Output Channels	Kernel Size	Stride	Padding
Conv3	Conv2d	16	32	3 × 3	1	1
Pool1	MaxPool2d	-	-	2 × 2	2	0
Conv4	Conv2d	32	128	3 × 3	1	1
Pool2	MaxPool2d	-	-	2 × 2	2	0
FC3	Linear	-	128	-	-	-
FC4	Linear	-	10	-	-	-

Table 3. Block 1 combined with Block 2 model details.

Layer	Type	Input Channels	Output Channels	Kernel Size	Stride	Padding
Conv1	Conv2d	3	16	3 × 3	1	1
Pool1	MaxPool2d	-	-	2 × 2	2	0
Conv2	Conv2d	16	32	3 × 3	1	1
Pool2	MaxPool2d	-	-	2 × 2	2	0
Conv3	Conv2d	16	32	3 × 3	1	1
Pool1	MaxPool2d	-	-	2 × 2	2	0
Conv4	Conv2d	32	128	3 × 3	1	1
Pool2	MaxPool2d	-	-	2 × 2	2	0
FC3	Linear	-	128	-	-	-
FC4	Linear	-	10	-	-	-

Table 4. Experimental results for classification task.

Network	Training Time (s)	Test Set Accuracy (%)	Size (MB)	FLOPS
BaseNetwork	366.51	77.31	0.65	23.08M
ReducedNetwork	613.78	74.77	0.17	12.7M
Quantization+BaseNetwork	343.25	76.51	0.32	23.08M
Reduced+Quantization	601.47	73.53	0.10	12.7M

Table 5. Experimental results for segmentation task.

Networks	Train Loss	Test Loss	Validation Loss	Training Time (s)	Size (MB)	FLOPS
U-Net	1.004	1.062	0.8484	1683	31 MB	48.4B
U-Net-Reduced	0.6522	1.027	0.8945	2562	7 MB	12.1B

Table 6. Experimental results for object detection task.

Networks	Loss Box Reg	Loss Classifier	Loss RPN Box Reg	Loss	Training Time (s)	Memory Allocation (GB)	FLOPS
CustomBackbone ^a	0.3894	0.4018	0.3894	1.14	460	15.84	8.4B
CustomBackbone-Reduced	0.4402	0.4343	0.4402	1.223	823	10.312	5.6B
VGG16 ^b	0.1043	0.0701	0.1043	0.216	653	16.04	214.2B
VGG16-Reduced	0.1894	0.1394	0.1894	0.4377	1100	9.96	208B

^a Custom CNN model. ^b VGG model.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kıvanç, Ş.G.; Şen, B.; Nar, F.; Ok, A.Ö. Reducing Model Complexity in Neural Networks by Using Pyramid Training Approaches. Appl. Sci. 2024, 14, 5898. https://doi.org/10.3390/app14135898

AMA Style

Kıvanç ŞG, Şen B, Nar F, Ok AÖ. Reducing Model Complexity in Neural Networks by Using Pyramid Training Approaches. Applied Sciences. 2024; 14(13):5898. https://doi.org/10.3390/app14135898

Chicago/Turabian Style

Kıvanç, Şahım Giray, Baha Şen, Fatih Nar, and Ali Özgün Ok. 2024. "Reducing Model Complexity in Neural Networks by Using Pyramid Training Approaches" Applied Sciences 14, no. 13: 5898. https://doi.org/10.3390/app14135898

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Reducing Model Complexity in Neural Networks by Using Pyramid Training Approaches

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Pyramid Training

3.2. Similarity Comparison

3.3. Size Reduction

3.4. Quantization

3.4.1. Fixed-Point Quantization

3.4.2. Integer Quantization

3.4.3. Vector Quantization

3.5. Experimental Fields

3.5.1. Classification

3.5.2. Semantic Segmentation

3.5.3. Object Detection

3.6. Individual Blocks

3.7. Method Description

4. Experiments

4.1. Dataset and Task

4.2. Evaluation Metric

4.3. Experimental Setup

4.4. Experimental Configurations

4.5. Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI