Neural Architecture Search (NAS) has recently attracted remarkable attention by relieving human experts from the laborious effort of designing neural networks. Early NAS methods mainly utilized evolutionary-based [
9,
36,
37,
38] or reinforcement-learning-based methods [
39,
40,
41]. Despite the efficiencies of the architecture designed by evolutionary-based and reinforcement-learning-based methods, they require tremendous computing resources. For example, the proposed method in [
39] evaluates 20,000 neural candidates in 500 NVIDIA
® P100 GPUs over four days. One-shot architecture search methods [
42,
43,
44] have been proposed to identify optimal neural architectures in a few GPU days (
\(\gt\)1 GPU day [
45]). In particular, Differentiable Architecture Search (DARTS) [
27,
28,
29] is a variation of one-shot NAS methods that relaxes the search space to be continuous and differentiable. The detailed description of DARTS can be found in Section
3.1. Despite the broad successes of DARTS in advancing NAS applicability, achieving optimal results remains a challenge for real-world problems. Many subsequent works investigate some of these challenges by focusing on (i) increasing search speed [
46,
47], (ii) improving generalization performance [
35,
48], (iii) addressing robustness issues [
49,
50,
51], (iv) reducing quantization error [
14,
16], and (v) designing hardware-aware architectures [
52,
53,
54]. On the other hand, few works attempt to prune the search space by removing inferior network operations. [
55,
56,
57,
58,
59,
60]. These works utilized the pruning mechanism to progressively remove some operations from the search space. Unlike them, our method aims to extend the search space to improve the performance of the sparse network by searching for the best operations with sparse weight structures. Technically, our method extends the search space by adding the parametric sparse version of convolution and linear operations to find the best sparse architecture. Therefore, there is a lack of research on sparse weight parameters when designing neural architectures. Our proposed method (DASS) searches for the operations that are most effective for sparse weight parameters in order to achieve higher generalizing performance.