Active expansion sampling for learning feasible domains in an unbounded input space

Chen, Wei; Fuge, Mark

doi:10.1007/s00158-017-1894-y

Active expansion sampling for learning feasible domains in an unbounded input space

RESEARCH PAPER
Published: 19 January 2018

Volume 57, pages 925–945, (2018)
Cite this article

Structural and Multidisciplinary Optimization Aims and scope Submit manuscript

727 Accesses
13 Citations
Explore all metrics

Abstract

Many engineering problems require identifying feasible domains under implicit constraints. One example is finding acceptable car body styling designs based on constraints like aesthetics and functionality. Current active-learning based methods learn feasible domains for bounded input spaces. However, we usually lack prior knowledge about how to set those input variable bounds. Bounds that are too small will fail to cover all feasible domains; while bounds that are too large will waste query budget. To avoid this problem, we introduce Active Expansion Sampling (AES), a method that identifies (possibly disconnected) feasible domains over an unbounded input space. AES progressively expands our knowledge of the input space, and uses successive exploitation and exploration stages to switch between learning the decision boundary and searching for new feasible domains. We show that AES has a misclassification loss guarantee within the explored region, independent of the number of iterations or labeled samples. Thus it can be used for real-time prediction of samples’ feasibility within the explored region. We evaluate AES on three test examples and compare AES with two adaptive sampling methods — the Neighborhood-Voronoi algorithm and the straddle heuristic — that operate over fixed input variable bounds.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Direct computation of solution spaces

Article 25 November 2016

On Bayesian Search for the Feasible Space Under Computationally Expensive Constraints

Construction of Uniform Designs on Arbitrary Domains by Inverse Rosenblatt Transformation

Notes

Note that in this paper the terms “active learning” and “adaptive sampling” are interchangeable.
A point infinitely far away from previous queries has the $\bar {f}(\boldsymbol {x})$ close to 0 and the maximum V (x), thus the highest p_𝜖 (x).
In Section 3, we assume that the queried point is the exact solution to the query strategy. However since we approximate the exact solution by using a pool-based sampling setting, the query may be deviate from the exact solution slightly.
Sampling methods like random sampling or Poisson-disc sampling (Bridson 2007) can be used to generate the pool. We use random sampling here thereby for simplicity. The specific choice of the sampling method within the local pool is not central to the overall method.
The optimal query means the exact solution to the AES query strategy shown in (6), (7), or (9).
We can set 𝜖 and τ such that the accuracy bound is as required. Details about how to set hyperparameters are in Section 5.3.
Technically, due to sampling error introduced when generating the pool, the exploitation stage will be influenced by 𝜖 (since $\bar {f}(\boldsymbol {x}^{*})$ is only ≈ 0). But this effect is negligible compared to 𝜖’s influence on the exploration stage.
For NV algorithm, its pool size refers to the test samples generated for the Monte Carlo simulation.
This difference is because NV’s explored region covers more area than AES at the beginning.

References

Agarwal A (2013) Selective sampling algorithms for cost-sensitive multiclass prediction. ICML (3) 28:1220–1228
Google Scholar
Alabdulmohsin I, Gao X, Zhang X (2015) Efficient active learning of halfspaces via query synthesis. In: Proceedings of the Twenty-Ninth AAAI conference on artificial intelligence. AAAI Press, pp 2483–2489
Angluin D (2004) Queries revisited. Theor Comput Sci 313(2):175–194
Article MathSciNet MATH Google Scholar
Argamon-Engelson S, Dagan I (1999) Committee-based sample selection for probabilistic classifiers. J Artif Intell Res(JAIR) 11:335–360
MATH Google Scholar
Awasthi P, Feldman V, Kanade V (2013) Learning using local membership queries Shalev-Shwartz S, Steinwart I (eds), vol 30, Proceedings of Machine Learning Research, Princeton
Baram Y, Yaniv RE, Luz K (2004) Online choice of active learning algorithms. J Mach Learn Res 5:255–291
MathSciNet Google Scholar
Basudhar A, Missoum S (2008) Adaptive explicit decision functions for probabilistic design and optimization using support vector machines. Comput Struct 86(19):1904–1917
Article Google Scholar
Basudhar A, Missoum S (2010) An improved adaptive sampling scheme for the construction of explicit boundaries. Struct Multidiscip Optim 42(4):517–529
Article Google Scholar
Bellman R (1957) Dynamic programming. Princeton University Press, Princeton
MATH Google Scholar
Bouneffouf D (2016) Exponentiated gradient exploration for active learning. Computers 5(1):1
Article Google Scholar
Bridson R (2007) Fast poisson disk sampling in arbitrary dimensions. In: ACM SIGGRAPH 2007 sketches SIGGRAPH ’07. ACM, New York, https://doi.org/10.1145/1278780.1278807, (to appear in print)
Bryan B, Nichol RC, Genovese CR, Schneider J, Miller CJ, Wasserman L (2006) Active learning for identifying function threshold boundaries. In: Advances in neural information processing systems, pp 163–170
Campbell C, Cristianini N, Smola AJ (2000) Query learning with large margin classifiers. In: Proceedings of the seventeenth international conference on machine learning. Morgan Kaufmann Publishers Inc., pp 111–118
Cavallanti G, Cesa-Bianchi N, Gentile C (2009) Linear classification and selective sampling under low noise conditions. In: Advances in neural information processing systems, pp 249–256
Cesa-Bianchi N, Gentile C, Orabona F (2009) Robust bounds for classification via selective sampling. In: Proceedings of the 26th annual international conference on machine learning. ACM, pp 121–128
Chen W, Fuge M (2017) Beyond the known: detecting novel feasible domains over an unbounded design space. J Mech Des 139(11):111,405
Article Google Scholar
Chen Z, Qiu H, Gao L, Li X, Li P (2014) A local adaptive sampling method for reliability-based design optimization using kriging model. Struct Multidiscip Optim 49(3):401–416
Article MathSciNet Google Scholar
Chen Z, Peng S, Li X, Qiu H, Xiong H, Gao L, Li P (2015) An important boundary sampling method for reliability-based design optimization using kriging model. Struct Multidiscip Optim 52(1):55–70
Article MathSciNet Google Scholar
Chen L, Hassani H, Karbasi A (2016) Near-optimal active learning of halfspaces via query synthesis in the noisy setting. arXiv:160303515
Chen W, Fuge M, Chazan J (2017) Design manifolds capture the intrinsic complexity and dimension of design spaces. J Mech Des 139(5):051,102. https://doi.org/10.1115/1.4036134
Article Google Scholar
Cohn D, Atlas L, Ladner R (1994) Improving generalization with active learning. Mach Learn 15 (2):201–221
Google Scholar
Dagan I, Engelson SP (1995) Committee-based sampling for training probabilistic classifiers. In: Proceedings of the twelfth international conference on machine learning
Dasgupta S, Kalai AT, Monteleoni C (2009) Analysis of perceptron-based active learning. J Mach Learn Res 10:281–299
MathSciNet MATH Google Scholar
Dekel O, Gentile C, Sridharan K (2012) Selective sampling and active learning from single and multiple teachers. J Mach Learn Res 13(Sep):2655–2697
MathSciNet MATH Google Scholar
Devanathan S, Ramani K (2010) Creating polytope representations of design spaces for visual exploration using consistency techniques. J Mech Des 132(8):081,011
Article Google Scholar
Freund Y, Seung HS, Shamir E, Tishby N (1997) Selective sampling using the query by committee algorithm. Mach Learn 28(2):133–168
Article MATH Google Scholar
Gotovos A, Casati N, Hitz G, Krause A (2013) Active learning for level set estimation. In: Proceedings of the twenty-third international joint conference on artificial intelligence. AAAI Press, pp 1344–1350
Hoang TN, Low BKH, Jaillet P, Kankanhalli M (2014) Nonmyopic 𝜖-bayes-optimal active learning of gaussian processes. In: Xing E P, Jebara T (eds) Proceedings of the 31st international conference on machine learning, PMLR, vol 32. Proceedings of Machine Learning Research, Bejing, pp 739–747
Hoi SC, Jin R, Zhu J, Lyu MR (2009) Semisupervised svm batch mode active learning with applications to image retrieval. ACM Trans Inf Syst (TOIS) 27(3):16
Article Google Scholar
Hsu WN, Lin HT (2015) Active learning by learning. In: Twenty-Ninth AAAI conference on artificial intelligence
Huang YC, Chan KY (2010) A modified efficient global optimization algorithm for maximal reliability in a probabilistic constrained space. J Mech Des 132(6):061,002
Article Google Scholar
Huang SJ, Jin R, Zhou ZH (2010) Active learning by querying informative and representative examples. In: Advances in neural information processing systems, pp 892–900
Jackson JC (1997) An efficient membership-query algorithm for learning dnf with respect to the uniform distribution. J Comput Syst Sci 55(3):414–440
Article MATH Google Scholar
Kandasamy K, Schneider J, Póczos B (2017) Query efficient posterior estimation in scientific experiments via bayesian active learning. Artif Intell 243:45–56
Article MathSciNet MATH Google Scholar
Kapoor A, Grauman K, Urtasun R, Darrell T (2010) Gaussian processes for object categorization. Int J Comput Vis 88(2):169–188
Article Google Scholar
King RD, Whelan KE, Jones FM, Reiser PG, Bryant CH, Muggleton SH, Kell DB, Oliver SG (2004) Functional genomic hypothesis generation and experimentation by a robot scientist. Nature 427(6971):247–252
Article Google Scholar
Krause A, Guestrin C (2007) Nonmyopic active learning of gaussian processes: an exploration-exploitation approach. In: Proceedings of the 24th international conference on machine learning. ACM, pp 449–456
Krempl G, Kottke D, Lemaire V (2015) Optimised probabilistic active learning (opal). Mach Learn 100 (2-3):449–476
Article MathSciNet MATH Google Scholar
Larson BJ, Mattson CA (2012) Design space exploration for quantifying a system model’s feasible domain. J Mech Des 134(4):041,010
Article Google Scholar
Lee TH, Jung JJ (2008) A sampling technique enhancing accuracy and efficiency of metamodel-based rbdo: constraint boundary sampling. Comput Struct 86(13):1463–1476
Article Google Scholar
Lewis DD, Catlett J (1994) Heterogeneous uncertainty sampling for supervised learning. In: Proceedings of the eleventh international conference on machine learning, pp 148–156
Lewis DD, Gale WA (1994) A sequential algorithm for training text classifiers. In: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval. Springer-Verlag New York Inc., New York, pp 3–12
Google Scholar
Ma Y, Garnett R, Schneider J (2014) Active area search via bayesian quadrature. In: Artificial intelligence and statistics, pp 595–603
Mac Aodha O, Campbell ND, Kautz J, Brostow GJ (2014) Hierarchical subquery evaluation for active learning on a graph. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 564–571
McCallum A, Nigam K et al (1998) Employing em and pool-based active learning for text classification. In: ICML, vol 98, pp 359–367
Nguyen HT, Smeulders A (2004) Active learning using pre-clustering. In: Proceedings of the twenty-first international conference on machine learning ICML ’04. ACM, New York, p 79, https://doi.org/10.1145/1015330.1015349, (to appear in print)
Nowacki H (1980) Modelling of design decisions for cad. In: Computer aided design modelling, systems engineering, CAD-Systems. Springer, pp 177–223
Orabona F, Cesa-Bianchi N (2011) Better algorithms for selective sampling. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 433–440
Osugi T, Kim D, Scott S (2005) Balancing exploration and exploitation: a new algorithm for active machine learning. In: Fifth IEEE international conference on data mining. IEEE
Rasmussen C, Williams C (2006) Gaussian processes for machine learning. The MIT Press
Ren Y, Papalambros PY (2011) A design preference elicitation query as an optimization process. J Mech Des 133(11):111,004
Article Google Scholar
Schohn G, Cohn D (2000) Less is more: active learning with support vector machines. In: ICML, pp 839–846
Settles B (2010) Active learning literature survey. Univ Wiscons Madison 52(55–66):11
Google Scholar
Settles B, Craven M (2008) An analysis of active learning strategies for sequence labeling tasks. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 1070–1079
Singh P, Van Der Herten J, Deschrijver D, Couckuyt I, Dhaene T (2017) A sequential sampling strategy for adaptive classification of computationally expensive data. Struct Multidiscip Optim 55(4):1425–1438
Article MathSciNet Google Scholar
Tong S, Koller D (2001) Support vector machine active learning with applications to text classification. J Mach Learn Res 2:45–66
MATH Google Scholar
Yang X, Liu Y, Gao Y, Zhang Y, Gao Z (2015a) An active learning kriging model for hybrid reliability analysis with both random and interval variables. Struct Multidiscip Optim 51(5):1003–1016
Article MathSciNet Google Scholar
Yang Y, Ma Z, Nie F, Chang X, Hauptmann AG (2015b) Multi-class active learning by uncertainty sampling with diversity maximization. Int J Comput Vis 113(2):113–127
Article MathSciNet Google Scholar
Yannou B, Moreno F, Thevenot HJ, Simpson TW (2005) Faster generation of feasible design points. In: ASME 2005 international design engineering technical conferences and computers and information in engineering conference. American Society of Mechanical Engineers, pp 355–363
Zhu X, Lafferty J, Ghahramani Z (2003) Combining active learning and semi-supervised learning using gaussian fields and harmonic functions. In: ICML 2003 workshop on the continuum from labeled to unlabeled data in machine learning and data mining, vol 3
Zhuang X, Pan R (2012) A sequential sampling strategy to improve reliability-based design optimization with implicit constraint functions. J Mech Des 134(2):021,002
Article Google Scholar

Download references

Acknowledgments

The authors thank the anonymous reviewers whose efforts improved the manuscript. This work was funded through a University of Maryland Minta Martin Grant.

Author information

Authors and Affiliations

Department of Mechanical Engineering, University of Maryland, College Park, MD, 20742, USA
Wei Chen & Mark Fuge

Authors

Wei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Mark Fuge
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Chen.

Appendices

Appendix A: Theorem proofs

1.1 A1 Proof of theorem 3

According to (2), given an optimal query x^∗, we have

$$\begin{array}{lllllllllll} |\bar{f}(\boldsymbol{x}^{*})| &= |\boldsymbol{k}(\boldsymbol{x}^{*})^{T} \nabla \log p(\boldsymbol{y}|\hat{\boldsymbol{f}})| \\ &= \left|\boldsymbol{k}(\boldsymbol{x}^{*})^{T} \nabla \log \left[\begin{array}{llllllllll} {\Phi}(y_{1}f_{1})\\ \vdots\\ {\Phi}(y_{t-1}f_{t-1}) \end{array}\right]\right| \\ &= \left|\boldsymbol{k}(\boldsymbol{x}^{*})^{T} \left[\begin{array}{llllllllll} y_{1}\mathcal{N}(f_{1})/{\Phi}(y_{1}f_{1})\\ \vdots\\ y_{t-1}\mathcal{N}(f_{t-1})/{\Phi}(y_{t-1}f_{t-1}) \end{array}\right]\right| \\ &= \left| {\sum}_{i = 1}^{t-1} k(\boldsymbol{x}^{*},\boldsymbol{x}^{(i)})y_{i}\frac{\mathcal{N}(f_{i})}{\Phi(y_{i}f_{i})} \right| \\ &\leq {\sum}_{i = 1}^{t-1} \left| k(\boldsymbol{x}^{*},\boldsymbol{x}^{(i)})y_{i}\frac{\mathcal{N}(f_{i})}{\Phi(y_{i}f_{i})} \right| \\ &< {\sum}_{i = 1}^{t-1} k_{m} \text{sign}(y_{i})y_{i}\frac{\mathcal{N}(f_{i})}{\Phi(y_{i}f_{i})} \\ &= k_{m} \text{sign}(\boldsymbol{y})^{T} \left[\begin{array}{llllllllll} y_{1}\mathcal{N}(f_{1})/{\Phi}(y_{1}f_{1})\\ \vdots\\ y_{t-1}\mathcal{N}(f_{t-1})/{\Phi}(y_{t-1}f_{t-1}) \end{array}\right] \\ &= k_{m} \mu \end{array} $$

where

$$ \begin{array}{llllllllll} k_{m} &= \max_{\boldsymbol{x}^{(i)}\in X_{L}} k(\boldsymbol{x}^{*},\boldsymbol{x}^{(i)}) \\ &= \exp\left( -\frac{\min_{\boldsymbol{x}^{(i)}\in X_{L}} \|\boldsymbol{x}^{*}-\boldsymbol{x}^{(i)}\|^{2}}{2l^{2}}\right) \\ &= e^{-\delta^{2}/(2l^{2})} \end{array} $$

(A1)

and

$$ \mu = \text{sign}(\boldsymbol{y})^{T} \nabla \log p(\boldsymbol{y}|\hat{\boldsymbol{f}}) $$

(A2)

Similarly,

$$ \begin{array}{llllllll} V(\boldsymbol{x}^{*}) &= 1-\boldsymbol{k}(\boldsymbol{x}^{*})^{T}(K+W^{-1})^{-1}\boldsymbol{k}(\boldsymbol{x}^{*}) \\ &> 1-(k_{m}\boldsymbol{1})^{T} (K+W^{-1})^{-1} (k_{m}\boldsymbol{1}) \\ &= 1-{k_{m}^{2}}\boldsymbol{1}^{T}(K+W^{-1})^{-1}\boldsymbol{1} \\ &= 1-{k_{m}^{2}} \nu \end{array} $$

(A3)

where

$$ \nu = \boldsymbol{1}^{T}(K+W^{-1})^{-1}\boldsymbol{1} $$

(A4)

Therefore for the optimal query x^∗ we have

$$p_{\epsilon}(\boldsymbol{x}^{*}) = {\Phi}\left( -\frac{|\bar{f}(\boldsymbol{x}^{*})|+\epsilon}{\sqrt{V(\boldsymbol{x}^{*})}}\right) > {\Phi}\left( -\frac{k_{m} \mu+\epsilon}{\sqrt{1-{k_{m}^{2}} \nu}}\right) $$

Both Theorem 1 and 2 state that p_𝜖 (x^∗) = τ, thus

$${\Phi}\left( -\frac{k_{m} \mu+\epsilon}{\sqrt{1-{k_{m}^{2}} \nu}}\right) < \tau $$

When τ = Φ(−η𝜖), we have

$$ \frac{k_{m} \mu+\epsilon}{\sqrt{1-{k_{m}^{2}} \nu}} > \eta\epsilon $$

(A5)

Plugging (A1) into (A5) and solving for the distance δ, we get

$$\delta < \beta l $$

where

$$ \beta=\sqrt{2\log\frac{\mu^{2}+\eta^{2}\epsilon^{2}\nu}{\eta\epsilon\sqrt{\mu^{2}+(\eta^{2}-1)\epsilon^{2}\nu}-\epsilon\mu}} $$

(A6)

1.2 A2 Proof of theorem 5

Theorem 1 states that the optimal query in the exploitation stage lies at the intersection of $\bar {f}(\boldsymbol {x})= 0$ and p_𝜖 (x) = τ. By substituting Φ(−η𝜖) for τ, we have

$$ V(\boldsymbol{x}^{*}) = \frac{1}{\eta^{2}} $$

(A7)

According to (A3), we have $V(\boldsymbol {x}^{*})>1-k_m^2\nu $. Combining (A1), (A4), and (A7), we get

$$\delta < \delta_{exploit} = \gamma l $$

where

$$ \gamma = \sqrt{\log\frac{\eta^{2}\nu}{\eta^{2}-1}} $$

(A8)

1.3 A3 Proof of theorem 6

According to (A7), the predictive variance of an optimal query x_{e
x
p
l
o
i
t} in the exploitation stage is

$$V(\boldsymbol{x}_{exploit}) = \frac{1}{\eta^{2}} $$

While in the exploration stage, we have p_𝜖 (x_{e
x
p
l
o
r
e}) = τ at the optimal query x_{e
x
p
l
o
r
e} (Theorem 2). And by applying (4) and setting τ = Φ(−η𝜖), we have

$$V(\boldsymbol{x}_{explore}) = \frac{1}{\eta^{2}}\left( 1+\frac{|\bar{f}(\boldsymbol{x}_{explore})|}{\epsilon}\right)^{2} $$

Appendix B: Additional experimental results

1.1 B1 Hosaki example

We use the Hosaki example as an additional 2-dimensional example to demonstrate the performance of our proposed method. Different from the Branin example, the Hosaki example has feasible domains of different scales. Its feasible domains resemble two isolated feasible regions — a large “island” and a small one (Fig. 15a). The Hosaki function is

$$g(\boldsymbol{x})=\left( 1-8x_{1}+ 7{x_{1}^{2}}-\frac{7}{3}{x_{1}^{3}}+\frac{1}{4}{x_{1}^{4}}\right){x_{2}^{2}}e^{-x_{2}} $$

We define the label y = 1 if x ∈ {x|g(x) ≤ − 1, 0 < x₁, x₂ < 5}; and y = − 1 otherwise.

For AES, we set the initial point x⁽⁰⁾ = (3, 3). We use a Gaussian kernel with a length scale l = 0.4. The test set to compute F1 scores is generated along a 100 × 100 grid in the region where x₁ ∈ [− 3, 9] and x₂ ∈ [− 3.5, 8.5]. For NV and straddle, the input space bounds are shown in Table 3.

Table 3 Input space bounds for the NV algorithm and the straddle heuristic (Hosaki example)

Full size table

Table 4 shows the final F1 scores and running time of AES, NV, and the straddle heuristic. Fig. 15 shows the F1 scores and queries under different 𝜖 and η. Fig. 16 compares the performance of AES and NV with different boundary sizes. Fig. 17 shows the performance of AES and NV under Bernoulli and Gaussian noise.

Table 4 Final F1 scores and running time (Hosaki example)

Full size table

1.2 B2 Results of straddle heuristic

In this section we list experimental results related to the straddle heuristic. Specifically, Fig. 18 shows straddle’s F1 scores and queries using different sizes of input variable bounds, and the comparison with AES. Fig. 19 shows the comparison of AES and straddle under noisy labels.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, W., Fuge, M. Active expansion sampling for learning feasible domains in an unbounded input space. Struct Multidisc Optim 57, 925–945 (2018). https://doi.org/10.1007/s00158-017-1894-y

Download citation

Received: 17 August 2017
Revised: 22 November 2017
Accepted: 25 December 2017
Published: 19 January 2018
Issue Date: March 2018
DOI: https://doi.org/10.1007/s00158-017-1894-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Active expansion sampling for learning feasible domains in an unbounded input space

Abstract

Access this article

Subscribe and save

Buy Now