Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

Active expansion sampling for learning feasible domains in an unbounded input space

  • RESEARCH PAPER
  • Published:
Structural and Multidisciplinary Optimization Aims and scope Submit manuscript

Abstract

Many engineering problems require identifying feasible domains under implicit constraints. One example is finding acceptable car body styling designs based on constraints like aesthetics and functionality. Current active-learning based methods learn feasible domains for bounded input spaces. However, we usually lack prior knowledge about how to set those input variable bounds. Bounds that are too small will fail to cover all feasible domains; while bounds that are too large will waste query budget. To avoid this problem, we introduce Active Expansion Sampling (AES), a method that identifies (possibly disconnected) feasible domains over an unbounded input space. AES progressively expands our knowledge of the input space, and uses successive exploitation and exploration stages to switch between learning the decision boundary and searching for new feasible domains. We show that AES has a misclassification loss guarantee within the explored region, independent of the number of iterations or labeled samples. Thus it can be used for real-time prediction of samples’ feasibility within the explored region. We evaluate AES on three test examples and compare AES with two adaptive sampling methods — the Neighborhood-Voronoi algorithm and the straddle heuristic — that operate over fixed input variable bounds.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. Note that in this paper the terms “active learning” and “adaptive sampling” are interchangeable.

  2. A point infinitely far away from previous queries has the \(\bar {f}(\boldsymbol {x})\) close to 0 and the maximum V (x), thus the highest p 𝜖 (x).

  3. In Section 3, we assume that the queried point is the exact solution to the query strategy. However since we approximate the exact solution by using a pool-based sampling setting, the query may be deviate from the exact solution slightly.

  4. Sampling methods like random sampling or Poisson-disc sampling (Bridson 2007) can be used to generate the pool. We use random sampling here thereby for simplicity. The specific choice of the sampling method within the local pool is not central to the overall method.

  5. The optimal query means the exact solution to the AES query strategy shown in (6), (7), or (9).

  6. We can set 𝜖 and τ such that the accuracy bound is as required. Details about how to set hyperparameters are in Section 5.3.

  7. Technically, due to sampling error introduced when generating the pool, the exploitation stage will be influenced by 𝜖 (since \(\bar {f}(\boldsymbol {x}^{*})\) is only ≈ 0). But this effect is negligible compared to 𝜖’s influence on the exploration stage.

  8. For NV algorithm, its pool size refers to the test samples generated for the Monte Carlo simulation.

  9. This difference is because NV’s explored region covers more area than AES at the beginning.

References

  • Agarwal A (2013) Selective sampling algorithms for cost-sensitive multiclass prediction. ICML (3) 28:1220–1228

    Google Scholar 

  • Alabdulmohsin I, Gao X, Zhang X (2015) Efficient active learning of halfspaces via query synthesis. In: Proceedings of the Twenty-Ninth AAAI conference on artificial intelligence. AAAI Press, pp 2483–2489

  • Angluin D (2004) Queries revisited. Theor Comput Sci 313(2):175–194

    Article  MathSciNet  MATH  Google Scholar 

  • Argamon-Engelson S, Dagan I (1999) Committee-based sample selection for probabilistic classifiers. J Artif Intell Res(JAIR) 11:335–360

    MATH  Google Scholar 

  • Awasthi P, Feldman V, Kanade V (2013) Learning using local membership queries Shalev-Shwartz S, Steinwart I (eds), vol 30, Proceedings of Machine Learning Research, Princeton

  • Baram Y, Yaniv RE, Luz K (2004) Online choice of active learning algorithms. J Mach Learn Res 5:255–291

    MathSciNet  Google Scholar 

  • Basudhar A, Missoum S (2008) Adaptive explicit decision functions for probabilistic design and optimization using support vector machines. Comput Struct 86(19):1904–1917

    Article  Google Scholar 

  • Basudhar A, Missoum S (2010) An improved adaptive sampling scheme for the construction of explicit boundaries. Struct Multidiscip Optim 42(4):517–529

    Article  Google Scholar 

  • Bellman R (1957) Dynamic programming. Princeton University Press, Princeton

    MATH  Google Scholar 

  • Bouneffouf D (2016) Exponentiated gradient exploration for active learning. Computers 5(1):1

    Article  Google Scholar 

  • Bridson R (2007) Fast poisson disk sampling in arbitrary dimensions. In: ACM SIGGRAPH 2007 sketches SIGGRAPH ’07. ACM, New York, https://doi.org/10.1145/1278780.1278807, (to appear in print)

  • Bryan B, Nichol RC, Genovese CR, Schneider J, Miller CJ, Wasserman L (2006) Active learning for identifying function threshold boundaries. In: Advances in neural information processing systems, pp 163–170

  • Campbell C, Cristianini N, Smola AJ (2000) Query learning with large margin classifiers. In: Proceedings of the seventeenth international conference on machine learning. Morgan Kaufmann Publishers Inc., pp 111–118

  • Cavallanti G, Cesa-Bianchi N, Gentile C (2009) Linear classification and selective sampling under low noise conditions. In: Advances in neural information processing systems, pp 249–256

  • Cesa-Bianchi N, Gentile C, Orabona F (2009) Robust bounds for classification via selective sampling. In: Proceedings of the 26th annual international conference on machine learning. ACM, pp 121–128

  • Chen W, Fuge M (2017) Beyond the known: detecting novel feasible domains over an unbounded design space. J Mech Des 139(11):111,405

    Article  Google Scholar 

  • Chen Z, Qiu H, Gao L, Li X, Li P (2014) A local adaptive sampling method for reliability-based design optimization using kriging model. Struct Multidiscip Optim 49(3):401–416

    Article  MathSciNet  Google Scholar 

  • Chen Z, Peng S, Li X, Qiu H, Xiong H, Gao L, Li P (2015) An important boundary sampling method for reliability-based design optimization using kriging model. Struct Multidiscip Optim 52(1):55–70

    Article  MathSciNet  Google Scholar 

  • Chen L, Hassani H, Karbasi A (2016) Near-optimal active learning of halfspaces via query synthesis in the noisy setting. arXiv:160303515

  • Chen W, Fuge M, Chazan J (2017) Design manifolds capture the intrinsic complexity and dimension of design spaces. J Mech Des 139(5):051,102. https://doi.org/10.1115/1.4036134

    Article  Google Scholar 

  • Cohn D, Atlas L, Ladner R (1994) Improving generalization with active learning. Mach Learn 15 (2):201–221

    Google Scholar 

  • Dagan I, Engelson SP (1995) Committee-based sampling for training probabilistic classifiers. In: Proceedings of the twelfth international conference on machine learning

  • Dasgupta S, Kalai AT, Monteleoni C (2009) Analysis of perceptron-based active learning. J Mach Learn Res 10:281–299

    MathSciNet  MATH  Google Scholar 

  • Dekel O, Gentile C, Sridharan K (2012) Selective sampling and active learning from single and multiple teachers. J Mach Learn Res 13(Sep):2655–2697

    MathSciNet  MATH  Google Scholar 

  • Devanathan S, Ramani K (2010) Creating polytope representations of design spaces for visual exploration using consistency techniques. J Mech Des 132(8):081,011

    Article  Google Scholar 

  • Freund Y, Seung HS, Shamir E, Tishby N (1997) Selective sampling using the query by committee algorithm. Mach Learn 28(2):133–168

    Article  MATH  Google Scholar 

  • Gotovos A, Casati N, Hitz G, Krause A (2013) Active learning for level set estimation. In: Proceedings of the twenty-third international joint conference on artificial intelligence. AAAI Press, pp 1344–1350

  • Hoang TN, Low BKH, Jaillet P, Kankanhalli M (2014) Nonmyopic 𝜖-bayes-optimal active learning of gaussian processes. In: Xing E P, Jebara T (eds) Proceedings of the 31st international conference on machine learning, PMLR, vol 32. Proceedings of Machine Learning Research, Bejing, pp 739–747

  • Hoi SC, Jin R, Zhu J, Lyu MR (2009) Semisupervised svm batch mode active learning with applications to image retrieval. ACM Trans Inf Syst (TOIS) 27(3):16

    Article  Google Scholar 

  • Hsu WN, Lin HT (2015) Active learning by learning. In: Twenty-Ninth AAAI conference on artificial intelligence

  • Huang YC, Chan KY (2010) A modified efficient global optimization algorithm for maximal reliability in a probabilistic constrained space. J Mech Des 132(6):061,002

    Article  Google Scholar 

  • Huang SJ, Jin R, Zhou ZH (2010) Active learning by querying informative and representative examples. In: Advances in neural information processing systems, pp 892–900

  • Jackson JC (1997) An efficient membership-query algorithm for learning dnf with respect to the uniform distribution. J Comput Syst Sci 55(3):414–440

    Article  MATH  Google Scholar 

  • Kandasamy K, Schneider J, Póczos B (2017) Query efficient posterior estimation in scientific experiments via bayesian active learning. Artif Intell 243:45–56

    Article  MathSciNet  MATH  Google Scholar 

  • Kapoor A, Grauman K, Urtasun R, Darrell T (2010) Gaussian processes for object categorization. Int J Comput Vis 88(2):169–188

    Article  Google Scholar 

  • King RD, Whelan KE, Jones FM, Reiser PG, Bryant CH, Muggleton SH, Kell DB, Oliver SG (2004) Functional genomic hypothesis generation and experimentation by a robot scientist. Nature 427(6971):247–252

    Article  Google Scholar 

  • Krause A, Guestrin C (2007) Nonmyopic active learning of gaussian processes: an exploration-exploitation approach. In: Proceedings of the 24th international conference on machine learning. ACM, pp 449–456

  • Krempl G, Kottke D, Lemaire V (2015) Optimised probabilistic active learning (opal). Mach Learn 100 (2-3):449–476

    Article  MathSciNet  MATH  Google Scholar 

  • Larson BJ, Mattson CA (2012) Design space exploration for quantifying a system model’s feasible domain. J Mech Des 134(4):041,010

    Article  Google Scholar 

  • Lee TH, Jung JJ (2008) A sampling technique enhancing accuracy and efficiency of metamodel-based rbdo: constraint boundary sampling. Comput Struct 86(13):1463–1476

    Article  Google Scholar 

  • Lewis DD, Catlett J (1994) Heterogeneous uncertainty sampling for supervised learning. In: Proceedings of the eleventh international conference on machine learning, pp 148–156

  • Lewis DD, Gale WA (1994) A sequential algorithm for training text classifiers. In: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval. Springer-Verlag New York Inc., New York, pp 3–12

    Google Scholar 

  • Ma Y, Garnett R, Schneider J (2014) Active area search via bayesian quadrature. In: Artificial intelligence and statistics, pp 595–603

  • Mac Aodha O, Campbell ND, Kautz J, Brostow GJ (2014) Hierarchical subquery evaluation for active learning on a graph. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 564–571

  • McCallum A, Nigam K et al (1998) Employing em and pool-based active learning for text classification. In: ICML, vol 98, pp 359–367

  • Nguyen HT, Smeulders A (2004) Active learning using pre-clustering. In: Proceedings of the twenty-first international conference on machine learning ICML ’04. ACM, New York, p 79, https://doi.org/10.1145/1015330.1015349, (to appear in print)

  • Nowacki H (1980) Modelling of design decisions for cad. In: Computer aided design modelling, systems engineering, CAD-Systems. Springer, pp 177–223

  • Orabona F, Cesa-Bianchi N (2011) Better algorithms for selective sampling. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 433–440

  • Osugi T, Kim D, Scott S (2005) Balancing exploration and exploitation: a new algorithm for active machine learning. In: Fifth IEEE international conference on data mining. IEEE

  • Rasmussen C, Williams C (2006) Gaussian processes for machine learning. The MIT Press

  • Ren Y, Papalambros PY (2011) A design preference elicitation query as an optimization process. J Mech Des 133(11):111,004

    Article  Google Scholar 

  • Schohn G, Cohn D (2000) Less is more: active learning with support vector machines. In: ICML, pp 839–846

  • Settles B (2010) Active learning literature survey. Univ Wiscons Madison 52(55–66):11

    Google Scholar 

  • Settles B, Craven M (2008) An analysis of active learning strategies for sequence labeling tasks. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 1070–1079

  • Singh P, Van Der Herten J, Deschrijver D, Couckuyt I, Dhaene T (2017) A sequential sampling strategy for adaptive classification of computationally expensive data. Struct Multidiscip Optim 55(4):1425–1438

    Article  MathSciNet  Google Scholar 

  • Tong S, Koller D (2001) Support vector machine active learning with applications to text classification. J Mach Learn Res 2:45–66

    MATH  Google Scholar 

  • Yang X, Liu Y, Gao Y, Zhang Y, Gao Z (2015a) An active learning kriging model for hybrid reliability analysis with both random and interval variables. Struct Multidiscip Optim 51(5):1003–1016

    Article  MathSciNet  Google Scholar 

  • Yang Y, Ma Z, Nie F, Chang X, Hauptmann AG (2015b) Multi-class active learning by uncertainty sampling with diversity maximization. Int J Comput Vis 113(2):113–127

    Article  MathSciNet  Google Scholar 

  • Yannou B, Moreno F, Thevenot HJ, Simpson TW (2005) Faster generation of feasible design points. In: ASME 2005 international design engineering technical conferences and computers and information in engineering conference. American Society of Mechanical Engineers, pp 355–363

  • Zhu X, Lafferty J, Ghahramani Z (2003) Combining active learning and semi-supervised learning using gaussian fields and harmonic functions. In: ICML 2003 workshop on the continuum from labeled to unlabeled data in machine learning and data mining, vol 3

  • Zhuang X, Pan R (2012) A sequential sampling strategy to improve reliability-based design optimization with implicit constraint functions. J Mech Des 134(2):021,002

    Article  Google Scholar 

Download references

Acknowledgments

The authors thank the anonymous reviewers whose efforts improved the manuscript. This work was funded through a University of Maryland Minta Martin Grant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Chen.

Appendices

Appendix A: Theorem proofs

1.1 A1 Proof of theorem 3

According to (2), given an optimal query x, we have

$$\begin{array}{lllllllllll} |\bar{f}(\boldsymbol{x}^{*})| &= |\boldsymbol{k}(\boldsymbol{x}^{*})^{T} \nabla \log p(\boldsymbol{y}|\hat{\boldsymbol{f}})| \\ &= \left|\boldsymbol{k}(\boldsymbol{x}^{*})^{T} \nabla \log \left[\begin{array}{llllllllll} {\Phi}(y_{1}f_{1})\\ \vdots\\ {\Phi}(y_{t-1}f_{t-1}) \end{array}\right]\right| \\ &= \left|\boldsymbol{k}(\boldsymbol{x}^{*})^{T} \left[\begin{array}{llllllllll} y_{1}\mathcal{N}(f_{1})/{\Phi}(y_{1}f_{1})\\ \vdots\\ y_{t-1}\mathcal{N}(f_{t-1})/{\Phi}(y_{t-1}f_{t-1}) \end{array}\right]\right| \\ &= \left| {\sum}_{i = 1}^{t-1} k(\boldsymbol{x}^{*},\boldsymbol{x}^{(i)})y_{i}\frac{\mathcal{N}(f_{i})}{\Phi(y_{i}f_{i})} \right| \\ &\leq {\sum}_{i = 1}^{t-1} \left| k(\boldsymbol{x}^{*},\boldsymbol{x}^{(i)})y_{i}\frac{\mathcal{N}(f_{i})}{\Phi(y_{i}f_{i})} \right| \\ &< {\sum}_{i = 1}^{t-1} k_{m} \text{sign}(y_{i})y_{i}\frac{\mathcal{N}(f_{i})}{\Phi(y_{i}f_{i})} \\ &= k_{m} \text{sign}(\boldsymbol{y})^{T} \left[\begin{array}{llllllllll} y_{1}\mathcal{N}(f_{1})/{\Phi}(y_{1}f_{1})\\ \vdots\\ y_{t-1}\mathcal{N}(f_{t-1})/{\Phi}(y_{t-1}f_{t-1}) \end{array}\right] \\ &= k_{m} \mu \end{array} $$

where

$$ \begin{array}{llllllllll} k_{m} &= \max_{\boldsymbol{x}^{(i)}\in X_{L}} k(\boldsymbol{x}^{*},\boldsymbol{x}^{(i)}) \\ &= \exp\left( -\frac{\min_{\boldsymbol{x}^{(i)}\in X_{L}} \|\boldsymbol{x}^{*}-\boldsymbol{x}^{(i)}\|^{2}}{2l^{2}}\right) \\ &= e^{-\delta^{2}/(2l^{2})} \end{array} $$
(A1)

and

$$ \mu = \text{sign}(\boldsymbol{y})^{T} \nabla \log p(\boldsymbol{y}|\hat{\boldsymbol{f}}) $$
(A2)

Similarly,

$$ \begin{array}{llllllll} V(\boldsymbol{x}^{*}) &= 1-\boldsymbol{k}(\boldsymbol{x}^{*})^{T}(K+W^{-1})^{-1}\boldsymbol{k}(\boldsymbol{x}^{*}) \\ &> 1-(k_{m}\boldsymbol{1})^{T} (K+W^{-1})^{-1} (k_{m}\boldsymbol{1}) \\ &= 1-{k_{m}^{2}}\boldsymbol{1}^{T}(K+W^{-1})^{-1}\boldsymbol{1} \\ &= 1-{k_{m}^{2}} \nu \end{array} $$
(A3)

where

$$ \nu = \boldsymbol{1}^{T}(K+W^{-1})^{-1}\boldsymbol{1} $$
(A4)

Therefore for the optimal query x we have

$$p_{\epsilon}(\boldsymbol{x}^{*}) = {\Phi}\left( -\frac{|\bar{f}(\boldsymbol{x}^{*})|+\epsilon}{\sqrt{V(\boldsymbol{x}^{*})}}\right) > {\Phi}\left( -\frac{k_{m} \mu+\epsilon}{\sqrt{1-{k_{m}^{2}} \nu}}\right) $$

Both Theorem 1 and 2 state that p 𝜖 (x) = τ, thus

$${\Phi}\left( -\frac{k_{m} \mu+\epsilon}{\sqrt{1-{k_{m}^{2}} \nu}}\right) < \tau $$

When τ = Φ(−η𝜖), we have

$$ \frac{k_{m} \mu+\epsilon}{\sqrt{1-{k_{m}^{2}} \nu}} > \eta\epsilon $$
(A5)

Plugging (A1) into (A5) and solving for the distance δ, we get

$$\delta < \beta l $$

where

$$ \beta=\sqrt{2\log\frac{\mu^{2}+\eta^{2}\epsilon^{2}\nu}{\eta\epsilon\sqrt{\mu^{2}+(\eta^{2}-1)\epsilon^{2}\nu}-\epsilon\mu}} $$
(A6)

1.2 A2 Proof of theorem 5

Theorem 1 states that the optimal query in the exploitation stage lies at the intersection of \(\bar {f}(\boldsymbol {x})= 0\) and p 𝜖 (x) = τ. By substituting Φ(−η𝜖) for τ, we have

$$ V(\boldsymbol{x}^{*}) = \frac{1}{\eta^{2}} $$
(A7)

According to (A3), we have \(V(\boldsymbol {x}^{*})>1-k_m^2\nu \). Combining (A1), (A4), and (A7), we get

$$\delta < \delta_{exploit} = \gamma l $$

where

$$ \gamma = \sqrt{\log\frac{\eta^{2}\nu}{\eta^{2}-1}} $$
(A8)

1.3 A3 Proof of theorem 6

According to (A7), the predictive variance of an optimal query x e x p l o i t in the exploitation stage is

$$V(\boldsymbol{x}_{exploit}) = \frac{1}{\eta^{2}} $$

While in the exploration stage, we have p 𝜖 (x e x p l o r e ) = τ at the optimal query x e x p l o r e (Theorem 2). And by applying (4) and setting τ = Φ(−η𝜖), we have

$$V(\boldsymbol{x}_{explore}) = \frac{1}{\eta^{2}}\left( 1+\frac{|\bar{f}(\boldsymbol{x}_{explore})|}{\epsilon}\right)^{2} $$

Appendix B: Additional experimental results

1.1 B1 Hosaki example

We use the Hosaki example as an additional 2-dimensional example to demonstrate the performance of our proposed method. Different from the Branin example, the Hosaki example has feasible domains of different scales. Its feasible domains resemble two isolated feasible regions — a large “island” and a small one (Fig. 15a). The Hosaki function is

$$g(\boldsymbol{x})=\left( 1-8x_{1}+ 7{x_{1}^{2}}-\frac{7}{3}{x_{1}^{3}}+\frac{1}{4}{x_{1}^{4}}\right){x_{2}^{2}}e^{-x_{2}} $$

We define the label y =  1 if x ∈ {x|g(x) ≤ −  1, 0 < x1, x2 < 5}; and y = − 1 otherwise.

Fig. 15
figure 15

AES with different 𝜖 and η on the Hosaki example

For AES, we set the initial point x(0) = (3, 3). We use a Gaussian kernel with a length scale l =  0.4. The test set to compute F1 scores is generated along a 100 × 100 grid in the region where x1 ∈ [− 3, 9] and x2 ∈ [− 3.5, 8.5]. For NV and straddle, the input space bounds are shown in Table 3.

Table 3 Input space bounds for the NV algorithm and the straddle heuristic (Hosaki example)

Table 4 shows the final F1 scores and running time of AES, NV, and the straddle heuristic. Fig. 15 shows the F1 scores and queries under different 𝜖 and η. Fig. 16 compares the performance of AES and NV with different boundary sizes. Fig. 17 shows the performance of AES and NV under Bernoulli and Gaussian noise.

Table 4 Final F1 scores and running time (Hosaki example)
Fig. 16
figure 16

AES and NV (with different input variable bounds) on the Hosaki example

Fig. 17
figure 17

AES and NV on the Hosaki example using noisy labels

1.2 B2 Results of straddle heuristic

In this section we list experimental results related to the straddle heuristic. Specifically, Fig. 18 shows straddle’s F1 scores and queries using different sizes of input variable bounds, and the comparison with AES. Fig. 19 shows the comparison of AES and straddle under noisy labels.

Fig. 18
figure 18

AES and straddle (with different input variable bounds)

Fig. 19
figure 19

AES and straddle under noisy labels

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, W., Fuge, M. Active expansion sampling for learning feasible domains in an unbounded input space. Struct Multidisc Optim 57, 925–945 (2018). https://doi.org/10.1007/s00158-017-1894-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00158-017-1894-y

Keywords