Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Next Article in Journal
Two-Machine Job-Shop Scheduling Problem to Minimize the Makespan with Uncertain Job Durations
Next Article in Special Issue
A Survey on Approximation in Parameterized Complexity: Hardness and Algorithms
Previous Article in Journal
Journey Planning Algorithms for Massive Delay-Prone Transit Networks
Previous Article in Special Issue
Parameterized Algorithms in Bioinformatics: An Overview
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Parameterized Optimization in Uncertain Graphs—A Survey and Some Results

by
N. S. Narayanaswamy
* and
R. Vijayaragunathan
*
Department of Computer Science and Engineering, Indian Institute of Technology Madras (IIT Madras), Chennai 600036, India
*
Authors to whom correspondence should be addressed.
Algorithms 2020, 13(1), 3; https://doi.org/10.3390/a13010003
Submission received: 11 November 2019 / Revised: 12 December 2019 / Accepted: 12 December 2019 / Published: 19 December 2019
(This article belongs to the Special Issue New Frontiers in Parameterized Complexity and Algorithms)

Abstract

:
We present a detailed survey of results and two new results on graphical models of uncertainty and associated optimization problems. We focus on two well-studied models, namely, the Random Failure (RF) model and the Linear Reliability Ordering (LRO) model. We present an FPT algorithm parameterized by the product of treewidth and max-degree for maximizing expected coverage in an uncertain graph under the RF model. We then consider the problem of finding the maximal core in a graph, which is known to be polynomial time solvable. We show that the Probabilistic-Core problem is polynomial time solvable in uncertain graphs under the LRO model. On the other hand, under the RF model, we show that the Probabilistic-Core problem is W[1]-hard for the parameter d, where d is the minimum degree of the core. We then design an FPT algorithm for the parameter treewidth.

1. Introduction

Network data analytics has come to play a key role in many scientific fields. A large body of such real-world networks have an associated uncertainty and optimization problems are required to be solved taking into account the uncertainty. Some of the uncertainty are due to the data collection process, machine-learning methods employed at preprocessing, privacy-preserving reasons and due to unknown causes during the operation of the network. Throughout this work, we study the case where the uncertainty is associated with the availability or the nature of relationship between the vertices of the network. The vertices themselves are assumed to be always available, in other words, the vertices are assumed to be certain. The concepts can be naturally modified to model uncertainty by associating uncertainty with the vertices. Road networks [1,2] are a natural source of optimization problems where the uncertainty is due to the traffic. Indeed, in uncertain traffic networks [2], the travel-time on a road is inherently uncertain. One way of modeling this uncertainty is by modeling the travel-time as a random variable. Indeed, the random variable is quite complex since the probability that it takes a certain value is dependent on other parameters like the day of the week and time of the day. However, our focus is only on the fact that uncertainty is modeled by an appropriately defined random variable. The natural optimization problem on an uncertain traffic network is to compute the expected minimum-time s-t path. In biological networks [3], the protein-protein interaction (PPI) network is an uncertain network. In a PPI network, proteins are represented by vertices and interaction between proteins are represented by edges. The interaction between proteins are derived through noisy and error-prone experiments which cause uncertainty. Sometimes the interactions are predicted by the nature of proteins instead of experiments. In this example, the uncertainty can be of two types: it can be on the presence or the absence of a protein-protein interaction or on the strength of an interaction between two proteins. Similarly, social networks are another example of uncertain networks where the members of the network are known, and the uncertainty is on the link between two members in the network. The interaction between members of an uncertain network associated with a social network are obtained using link prediction and by evaluating peer influence [4]. In all these three examples of networks with uncertainty, the uncertainty on the edges can be modeled as random variables. A random variable can be used to indicate the presence or the absence of an edge. In this case, the random variable takes values from the set { 0 , 1 } with each value having an associated probability. A random variable can also be used to model the strength of the interaction between two entities in the network, in which case for an edge, the corresponding random variable takes values from a set of values and each value has an associated probability.
In this work, we survey the different models of uncertainty and the associated optimization problems. We then present our results on optimization problems on uncertain networks when the networks have bounded treewidth. The fundamental motivation for this direction of study is that a typical optimization problem on an uncertain graph is an expectation computation over many graphs implicitly represented by the uncertain graph. A natural question is there are problems which have efficient algorithms on a graph in which the edges have no uncertainty but become hard on uncertain graphs. The typical optimization problems considered on uncertain graphs are shortest path [5], reliability [6], minimum spanning trees [7,8], maxflows [9,10], maximum coverage [11,12,13], influence maximization [14] and densest subgraph [15,16]. This article is structured partly as a survey and partly as an original research article. In Section 1.1, we formally present the concepts in uncertain graphs and then present subsequent details on uncertain graphs in Section 2. In Section 1.2, we survey the different algorithmic results on uncertain graphs and in Section 1.3 we outline our results.

1.1. Uncertain Graphs-Definition and Semantics

We consider the graphs with uncertain edges and certain vertices. An uncertain graph, denoted by G , is a triple that consists of a vertex set V, an edge set E and a set of outcomes { A e e E } . For each edge e E , the outcome A e is a set of values and an associated probability distribution on A e . The outcome A e is considered to be an interval or a finite set. For each e E , if the outcome A e is an interval, the associated probability distribution is called a continuous distribution, and if it is a finite set, the associated probability distribution is called a discrete distribution. In either case, the natural distribution is the uniform distribution. For the case in which for each edge e E , A e is a closed interval [ L e , U e ] , the probability that e gets specific value in L e < U e is 1 U e L e . In case for each e E , A e is a finite set under the uniform distribution, the probability that e gets a specific value from A e is 1 | A e | . Such uncertain graphs were the focus of the earliest results [5,7,9,10] and we survey these in Section 1.2. However, in general, the uncertainty could be modeled by any probability distribution A e . Therefore, the uncertain graph is a succinct representation of the set of all edge weighted graphs such that for each edge e its weight is a value from A e . This set is an uncountable set when A e is an interval and it is e E | A e | in case each e E , A e is a finite set. In this paper, we present our results for the case when for each edge e, the outcome A e is the set of values { 0 , 1 } . Naturally, 0 models the absence of an edge and 1 models the presence of an edge. An uncertain graph under this condition is a succinct representation of the set of all edge subgraphs of G = ( V , E ) . In this case, the uncertain graph G is represented as a triple ( V , E , p ) where p : E [ 0 , 1 ] is a function defined on E and p ( e ) is said to be the survival probability of the edge e. The failure probability of an edge e is 1 p ( e ) . The set of all graphs represented by an uncertain graph is well-known as the possible world semantics (PWS) [17,18] of the uncertain graph. For each E E , H = ( V , E ) is called a possible world of G and this is denoted by the notation H G . For an uncertain graph G = ( V , E , p ) , there are 2 | E | possible worlds. An uncertain graph and two of its possible worlds are illustrated in Figure 1.
The probability associated with a possible world depends on the probability distributions on A e and, most importantly, the dependence among the edge samples. A distribution model is a specification of the dependence among edge samples. Further, a distribution model uniquely determines a probability distribution on the possible worlds. For example, if the edge samples are all independent, the distribution model is called the Random Failure (RF) model. Under the RF model, the probability of a possible world H is given by P ( H ) = e E ( H ) p ( e ) e E \ E ( H ) ( 1 p ( e ) ) . Based on the dependence among the edge samples, the literature is rich in different distributions on the possible worlds. The distributions of interest in this paper are the Random Failure (RF) model, Independent Cascade (IC) model, Set-based Dependency (SBD) model and the Linear Reliability Ordering (LRO) model. These distribution models are all well-motivated by practical considerations on uncertain graphs in influence maximization, facility location, network reliability, to name a few. The detailed description of these distribution models and the corresponding distributions on the possible worlds are discussed in Section 2. In short, an uncertain graph along with a distribution model is a succinct description of an unique probability distribution on the PWS of the uncertain graph. Therefore, an uncertain graph along with a distribution model is equivalent to a sampling procedure to obtain a random sample from the corresponding probability distribution on the PWS of the uncertain graph. In Section 2, we describe the different distribution models by describing the corresponding sampling procedures of the edges in E.
The typical computational problem is posed for a fixed nature of the dependence among the edge samples from an input uncertain graph. The distributions on the edges and the dependence among the edge samples uniquely define the probability distribution on the possible worlds. Therefore, for a fixed dependence among the edge samples, the generic computational problem is to compute a solution that optimizes the expected value of a function over the distribution on the possible worlds. The problems that have been extensively studied are facility location to maximize coverage [19] and a selection of influentials on a social network to maximize influence [14]. Clearly, both these problems are coverage problems, studied two decades apart motivated by different considerations. In this paper, we are also motivated by understanding the parameterized complexity of problems on uncertain graphs. Historically, the earliest results considered different graph problems in which the distributions are on the set of values of the edge weights. We next present a survey of those results.

1.2. Survey of Optimization Problems in Uncertain Graphs

Graphs have been used to represent the relationships between entities. A graph is denoted by G = ( V , E ) where V is set of vertices and E V × V is set of edges representing relationship between pairs of vertices. PPI networks in bioinformatics, road networks and social networks are graphs with an uncertainty among the edges and are considered as uncertain graphs. The earliest ideas in graphs with uncertain edge weights were introduced by Frank and Hakimi [9] in 1965. Frank and Hakimi studied the maximum flow in a directed graph G = ( V , E ) with uncertain capacities. On an input, consisting of an uncertain digraph G = ( V , E ) and a continuous random variable with the uniform probability distribution on the capacity of each edge e, the probabilistic flow problem is to find the maximum flow probability and the joint distribution of the cut set values. Later in 1969, under the same setting Frank [5] studied the probabilistic shortest path problem on an undirected graph G = ( V , E ) . The probabilistic shortest path problem is to compute for each , the probability that the shortest path is at most . In 1976, J. R. Evans [10] studied the probabilistic maximum flow in a directed acyclic graph (DAG) with a discrete probability distribution on the edge capacities. A relatively recent result in 2008 is on the minimum spanning tree problem on uncertain graphs, by Erlebach et al. [7], with uncertain edge weights from a continuous distribution. Here the goal is to optimize the number of edges whose weight is sampled to obtain a spanning tree which achieves the expected MST weight over the possible worlds. This problem is different from the optimization problems of interest to us in this paper- given an uncertain graph as input, our goal is to optimize the expected value of a structural parameter over the possible worlds of the uncertain graph.
Optimization problems on uncertain graphs related to connectedness are among the most fundamental problems. Apart from their practical significance, they pose significant algorithmic challenges in different computational models. Further, the computational complexity of the problems increases significantly in the presence of uncertainty. In 1979, Valiant [6] studied the network reliability problem on uncertain graphs. The network reliability problem is a well-studied #P-hard problem [6,17,20,21,22,23]. The reliability problems have numerous applications in communication networks [24,25], biological networks [3] and social networks [26,27]. For a given network, reliability is defined as the ability of the network to remain operational after the failure of some of its links. The input consists of an uncertain graph G = ( V , E , p ) and a subset S V and the aim is to compute the probability that S is connected. Clearly, the reliability problem on uncertain graphs generalizes the graph connectivity problem which is polynomial time solvable. The Canadian Traveler Problem (CTP), formulated by Papadimitriou and Yannakakis in 1991 [28], is an online problem on uncertain graphs. Given an uncertain graph G = ( V , E , p ) , a source s and a destination t, a traveler must find a walk from s to t, where an edge e is known to have survived with probability p ( e ) only when the walk reaches one of its end points, after which it does not fail, conditioned on it having survived. The objective is to minimize the expected path length over the distribution on the possible worlds and the walker’s choices.
Coverage in Uncertain Graphs. Apart from the themes of network flows and connectedness, coverage problems are very practical when the uncertainty is on the survival of the edges. In this framework, the uncertainty is on whether an edge will survive a disaster and the goal is to place facilities in the network such that the expected coverage over the possible worlds is maximized. Each possible world can be thought of as the set of edges which survive a disaster. Formally, an uncertain graph G = ( V , E , p ) is a succinct description of the set of possible worlds that can arise due to a disaster in which an edge e is known to survive with probability p ( e ) . The nature of dependence among the edge samples is used to model the nature of edge failures in the event of a disaster. As mentioned earlier, a fixed nature of dependence among the edge samples uniquely defines a probability distribution on the possible worlds. From this point, in this paper, we consider uncertain graphs where the uncertainty is on the survival of an edge (recall, the other possibility is on the uncertain edge weight). Further, all the results we present are for a fixed dependence among edge samples and such a fixed dependence among the edge samples is called a distribution model. Given the motivation of disasters, the distribution model is specified based on the dependence among the edge failures (recall, the failure probability of an edge e is 1 p ( e ) ). In this framework, for a fixed distribution model, the function to be optimized is the coverage function.
Definition 1
(Coverage within distance r). The input consists of an uncertain graph G = ( V , E , p ) and an integer k. The goal is to compute a k-sized vertex set S which maximizes the expected total weight of the vertices which are at distance at most r from S. The expectation is over the possible worlds represented by the uncertain graph.
Naturally, we refer to k as the budget, r as the radius of coverage (the number of hops in the network) and S is the set of vertices where facilities have to be located. For certain graphs and r = 1 , the facility location with unreliable edges problem is the well-studied budgeted dominating set problem which is known to be NP-hard [29]. The other case of natural interest is for r = . In this case, the coverage problem is to find a set S of k vertices so as to maximize the expected number of vertices connected to S. In the case of certain graphs, this is polynomial time solvable as the problem is to find k connected components whose total vertex weight is the maximum.
Coverage in Social Networks. Coverage problems also have a natural interpretation in social networks. The dynamics of a social network based on the word-of-mouth effect have been of significant interest in marketing and consumer research [30], where a social network is referred to as an interpersonal network. The work by Brown and Reingen [30] state the different hypotheses for estimating the amount of uncertainty in a relationship between two persons in an interpersonal network. With the advent of social networks on the internet, the works of Domingos and Richardson [4,31] formalized the questions relating to the effective marketing of a product based on interpersonal relationships in a social network. In 1969, the work of Bass [32] had modeled the adoption of products as a diffusion process as a global phenomenon, independent of the interpersonal relationships between people in a society. Kempe, Kleinberg and Tardos (KKT) [14] brought together the earlier works on adoption of products in an interpersonal network and posed the question of selecting the most influential nodes in a social network with an aim to influence the maximum number of people to adopt a certain product or opinion. They considered the uncertainty in the social network to be the influence exerted by one individual on another individual and this is naturally modeled as an uncertain graph G = ( V , E , p ) . The propagation of influence is modeled as a diffusion process which is a function of time as in Bass [32]. The influence maximization problems in KKT are considered under distribution models, which are described as diffusion phenomena, called the Independent Cascade model and the Linear Threshold Model. Among these two models, the Independent Cascade model is defined as a sampling procedure whose outcome is an edge subgraph of a given uncertain graph. Thus, this model is more relevant for our study of uncertain graphs and the IC model is defined in Section 2. In the influence maximization problems considered in KKT [14], the aim is to select k influential people S such that the expected number of people connected to S is maximized. The expected number of vertices, over the distribution model, connected to a set S is called the influence of a set S or the expected coverage of S and is denoted by σ ( S ) . Thus, the influence maximization problem is to find the set arg max S V , | S | = k σ ( S ) . In Section 2, we discuss the computational complexity of σ ( S ) for different distribution models. Thus, the influence maximization problem in Reference [14] is essentially a facility location problem where the distribution model is specified by a diffusion phenomenon, r = and each input instance consists of an uncertain network and a budget k.
Coverage and Facility Location. The survey due to Snyder [33] in 2006 collects the vast body of results on the facility location problem in uncertain graphs into a single research article. The earliest result known to us, due to Daskin [19] who formulated the maximum expected coverage problem where vertices are uncertain, is different from our case where the uncertainty is on the edges and the vertices are known. To the best of our knowledge, it was Daskin’s work [19] that considered the general setting of dependence among vertex failures. In this case, the probability distribution would have been defined uniquely on the possible worlds which would have been the set of induced subgraphs. Subsequently, many variants of the facility location problem for uncertain graphs have been studied for different distribution models where the uncertainty is on the survival of edges.
Eiselt et al. [34] considered the problem with single edge failure and for r = . In this case, exactly one edge is assumed to have failed after a disaster and the objective is to place k facilities such that the expected weight of vertices not connected to any facility is minimized. In this case, each subgraph consisting of all the edges except one is a possible world. This problem is known to be polynomial time solvable for any k 1 . When the distribution model is the Random Failure model and r = , the most reliable source (MRS) is well-studied [35,36,37,38]. The input is an uncertain graph and k = 1 . The goal is to select one vertex v such that the expected number of vertices connected to v is maximized. Melachrinoudis and Helander [38] gave a polynomial time algorithm for the MRS problem on trees, followed by linear time algorithm on uncertain trees by Ding and Xue [37]. Colbourn and Xue [35] gave an O ( n 2 ) -time algorithm for the MRS problem on uncertain series-parallel graphs. Wei Ding [36] gave an O ( n 2 ) -time algorithm for the MRS problem on uncertain ring graphs.
Apart from the RF model, the Linear Reliability Ordering ( LRO) distribution model has been well-studied recently. Under this distribution model, for each integer r 1 , the facility location problem is studied as Max-Exp-Cover-r-LRO problem [11,12,13]. For the case when r = , the problem is known as the Max-Exp-Cover-LRO problem. Hassin et al. [11] presented an algorithm to solve the Max-Exp-Cover-LRO problem via a reduction to the Max-Weighted-k-Leaf-Induced-Subtree problem. They then showed that the Max-Weighted-k-Leaf-Induced-Subtree problem can be solved in polynomial time on trees using a greedy algorithm. Consequently, they showed that the Max-Exp-Cover-LRO problem can be solved in polynomial time. For r = 1 , the Max-Exp-Cover-1-LRO problem is NP-complete on planar graphs and the hardness follows from the hardness of budgeted dominating set due to Khuller et al. [29]. The Max-Exp-Cover-1-LRO admits a ( 1 1 e ) -approximation algorithm [12,13]. Similarly, Kempe et al. [14] shows that the influence maximization under the IC distribution model and the LT distribution model has a ( 1 1 e ) approximation algorithm. Both these results naturally follow due to the submodularity of the expected neighborhood size function, the monotonicity (as r increases) and submodularity of the expected coverage function and the result of Nemhauser et al. [39] on greedy maximization of submodular functions. In the setting of parameterized algorithms, the Max-Exp-Cover-1-LRO problem is W[1]-complete for solution size as the parameter and this follows from the hardness of budgeted dominating set problem. An FPT algorithm for the Max-Exp-Cover-1-LRO problem with treewidth as the parameter is presented in Reference [13]. Formally stated, given an instance G = ( V , E , p ) , k of the Max-Exp-Cover-1-LRO problem, an optimal solution can be computed in time 4 O ( t ) n O ( 1 ) where t is the treewidth of the graph G = ( V , E ) which is presented in the input as a tree decomposition.
Finding Communities in Uncertain Graphs. Finding communities is a significant problem in social network and in bioinformatics. A natural graph theoretic model for a community is a dense subgraph. A well-known dense subgraph is the core. Given an integer d, a graph G = ( V , E ) is said to be d-core if degree of every vertex v V is at least d. The way of obtaining the unique maximal induced subgraph of a graph G which is a d-core is to repeatedly discard vertices of degree less than d. If the procedure terminates with a non-empty graph, then the graph is a d-core of the graph G. A vertex v is said to be in a d-core if there is a d-core which contains v. This idea is generalized to the uncertain graph framework as follows: Given an uncertain graph G = ( V , E , p ) and the distribution model is the RF model, the d-core probability of a vertex v V , denoted by q d ( v ) , is the probability that v is in the d-core of a possible world. In other words, q d ( v ) = H G P ( H ) I ( H , d , v ) , where I ( H , d , v ) is an indicator function that takes value one if and only if there is a d-core of H that contains v, and P ( H ) is probability of the possible world H. In this setting, we consider the ( d , θ ) -core problem defined by Peng et al. [16]. We refer to this problem as the Individual Core problem and it is defined as follows.
Definition 2
(Individual-Core). The input consists of an uncertain graph G = ( V , E , p ) , an integer d and a probability threshold θ [ 0 , 1 ] . The objective is to compute an S V such that for each v S , q d ( v ) θ .
Peng et al. [16] shows that the Individual-Core problem is NP-complete. Prior to the results of Peng et al. [16], Bonchi et al. [15] introduced the study of d-core problem in uncertain graphs. We refer to the d-core problem on uncertain graphs as the Probabilistic-Core problem defined the follows.
Definition 3
(Probabilistic-Core). Given an uncertain graph G = ( V , E , p ) , an integer d and a probability threshold θ ( 0 , 1 ] , then the aim of the Probabilistic-Core problem is to find a set K V such that the Pr ( K is a d core in G ) is at least θ.
The problem of deciding on the existence of such a set K can be shown to be NP-hard using the hardness result given in Reference [16]. On the other hand, if p ( e ) = 1 for all e E , then the d-problem is polynomial time solvable as described at the beginning of this discussion.
A chronological listing of different optimization problems on uncertain graphs is presented in Table 1. The tabulation shows that there are many distribution models under which different optimization problems could be considered. The goal would be to understand the complexity of computing that expectation when the input is presented as an uncertain graph. Indeed, any NP- Complete problem on certain graphs remains NP-Complete for any distribution model when the inputs are uncertain graphs. Therefore, our natural focus is on Exact and Parameterized Computation of the expectation on uncertain graphs. The goal of the area of exact and parameterized computation is to classify problems based on their computational complexity as a function of input parameters other than the input size. The desired solution size, the treewidth of an input graph, and the input size are natural well-studied parameters. An algorithm with running time f ( k ) n O ( 1 ) is said to be a Fixed Parameter Tractable algorithm with respect to the parameter k. Interestingly, there are many problems that do not have FPT algorithms with respect to some parameters, while they have FPT algorithms with respect to others. A rich complexity theory has evolved over nearly four decades of research with the W-hierarchy being the central classification. In this hierarchy problem classes are ordered in increasing order of computational complexity. In this hierarchy the problems which have FPT algorithms are considered the simplest in terms of computational complexity. The complete history of this line of research can be found in the most recent textbook [40].

1.3. Our Questions and Results

Given our focus on exact and parameterized algorithms, we think that it should be possible to classify problems based on the hardness of efficiently computing the expectation over different distribution models. One of the contributions of this paper is to collect many of the known algorithmic results on uncertain graphs and the different distribution models for which results have been obtained. Our focus is on uncertain graph optimization problems on the LRO and RF distribution models, and these models have been the focus of many previous results in the literature. Our results add to the knowledge about these two models and are among the first parameterized algorithms on uncertain graphs. They also give an increased understanding of how treewidth of uncertain graphs can be used along with the structure of the distribution models to compute the expectation in time parameterized by the treewidth. Finally, the motivation for the choice of these two models is that the number of possible worlds with non-zero probability under the LRO model is equal to the number of edges m, while the number of possible worlds under the RF model is an exponential in m, which is the maximum number of possible worlds. We define the different distribution models listed in Section 1.1 in detail and some of their properties are identified from relationships between the models in Section 2.
Our first case study on parameterized algorithms under the RF model is on maximizing expected coverage in uncertain graphs. The starting point of our work is the reduction of the Max-Exp- Cover-LRO problem to the Max-Weighted-k-Leaf-Induced-Subtree problem, which can be solved in polynomial time [11,12]. The reduction was from maximizing the expectation coverage to maximizing the total weight of a combinatorial parameter. However, the reduction does not work for the Max-Exp-Cover-1-LRO, as the problem is at least as hard as budgeted dominating set. In the case when the graph has bounded treewidth, we were able to show in a previous work [13] that Max-Exp-Cover-1-LRO has an FPT algorithm parameterized by treewidth. The dynamic programming algorithm depends on properties specific to the LRO distribution model. On the other hand, the status of the question is unclear when the distribution model is the RF model. We address this by presenting a DP algorithm for the Max-Exp-Cover-1-RF problem in Section 4, which is an FPT algorithm parameterized by the product of the treewidth and max-degree of the input graph.
The second case study on parameterized algorithms under the RF model is on finding a d-core in uncertain graphs. In Section 5, we consider the Probabilistic-Core problem, and design a polynomial time algorithm for the LRO distribution model. For the RF distribution model, we observe that the Probabilistic-Core problem is W[1]-hard for the parameter d, where d is the minimum degree of the core. Then we design a DP algorithm, which is an FPT algorithm for Probabilistic-Core with respect to treewidth as the parameter.
Essentially in both the case studies, given an uncertain graph under the RF model, the DP uses the tree decomposition to efficiently compute the expected value, which is expressed as a weighted summation over the exponentially many possible worlds.

2. Distribution Models for Uncertain Graphs

As mentioned in Section 1.1, a distribution model along with an uncertain graph G = ( V , E , p ) uniquely describes the probability distribution on the possible worlds. We describe the distribution model by describing a corresponding sampling procedure on the edges of G = ( V , E , p ) . This formalism standardizes the nomenclature of optimization problems on uncertain graphs. For example, for the coverage problems on uncertain graphs, the problem names Max-Exp-Cover-1-LRO and Max-Exp-Cover-1-RF clearly state the distribution model and the radius of coverage relevant for the problem. An instance of each of these problems is an uncertain graph G = ( V , E , p ) and an integer k 1 . In this section, we present an edge sampling procedure from a given uncertain graph G = ( V , E , p ) corresponding to a distribution model on G . The outcome of a sampling procedure is a possible world, which is an edge subgraph H of the graph G = ( V , E ) . An edge e E ( H ) is called a survived edge and an edge e E \ E ( H ) is called a failed edge. We also present some new observations regarding the different distribution models.

2.1. Random Failure Model

Random Failure (RF) model is the most natural distribution model. In this sampling procedure, an edge e E is selected with probability p ( e ) independent of the outcome of every other edge in E. Thus, each edge subgraph of G is a possible world and for an edge subgraph H, the probability that H is the outcome of the sampling procedure is denoted by P ( H ) which is given by the equation P ( H ) = e E H p ( e ) e E \ E H ( 1 p ( e ) ) .

2.2. Independent Cascade Model

The sampling procedure to define the Independent Cascade (IC) model is very naturally described by a diffusion process which proceeds in rounds. The process is as defined in Reference [14]. The diffusion process starts after round 0. In round 0, A 0 is a non-empty set of vertices and these are called active vertices. The process maintains a set of edges A t after round t 0 . The vertices in A t are said to be active. The sampling process is as follows. If a vertex v is in A t 1 , then it remains active in round t also, that is v A t . The diffusion process in round t + 1 is as follows: If v is in A t \ A t 1 , that is v first becomes active after round t 1 , then consider each edge e = u v such that u is not in A t . Each such edge e is sampled with probability p ( e ) independent of the other outcomes. If e = u v is selected (that is, it survives), then u is added to A t + 1 . We reiterate that the edges incident on v will not be sampled in subsequent rounds. The sampling procedure terminates in not more than | E | rounds. The outcome of this sampling procedure is the edge subgraph formed by a set of those edges which were selected when the first end point of the edge becomes active. Kempe et al. [14] showed that for any edge subgraph H of G = ( V , E ) , the probability that H is the outcome is given by P ( H ) = e E ( H ) p ( e ) e E \ E ( H ) ( 1 p ( e ) ) .
Observation 1.
For an uncertain graph G = ( V , E , p ) the RF Model and the IC Model are identical.
The next distribution model is a generalization of the RF model and was introduced by Gunnec and Salman [43].

2.3. Set-Based Dependency (SBD) Model

The uncertain graph G = ( V , E , p ) satisfies the additional properties that E is partitioned into { E 1 , E 2 , E t } of E, for some t 1 . Further, p satisfies the property that for any two edges e 1 and e 2 that belong to the same part in the partition, p ( e 1 ) p ( e 2 ) . Typically, this partition is a fixed partition of E coming from the domain where the edge failures occur according to the SBD model. The sampling procedure definition of the SBD model is as follows: the edge sets are considered in order from E 1 to E t , and for 1 i < j t , the edges in the set E i are considered before the edges in the set E j . For each 1 i t , the edges in E i are considered in decreasing order of the value p. For each 1 i t , when an edge e E i is considered, it is sampled with probability p ( e ) independent of the outcome of the previous samples. If the outcome selects the edge e (that is, if e survives), then the next edge in E i is considered. Otherwise, all the remaining edges of E i that are to be considered after e, are considered to have failed, and the set E i + 1 is considered. The procedure terminates after considering E 1 , , E t . The edge subgraph consisting of the selected edges (survived edges) is the outcome of this sampling procedure. The number of possible worlds that have a non-zero probability of being an outcome of the sampling procedure is i = 1 t ( | E i | + 1 ) . To summarize, under the SBD model, for each 1 i t , the survival of an edge e in a set E i implies that every edge e E i with a greater survival probability than that of e would have survived. Further, the edge samples of edges in two different sets are mutually independent. As a consequence of this, we have the following observation.
Observation 2.
The RF model on an uncertain graph G = ( V , E , p ) is identical to the SBD model on G and E is partitioned into m sets each containing an edge of E.

2.4. Linear Reliable Ordering (LRO) Model

The case when the partition of E consists of only one set, the SBD model is called the Linear Reliability Ordering (LRO) model introduced by Hassin et al. [11]. Let p ( e m ) < p ( e m 1 ) < < p ( e 2 ) < p ( e 1 ) . Then, under the LRO model it follows that for each j > i , Pr ( e j fails e i fails ) = 1 . Further, the possible worlds is the set of m + 1 graphs { G 0 , , G m } where E ( G 0 ) = , E ( G m ) = E , and for each 0 i m 1 , E ( G i + 1 ) = E ( G i ) { e i + 1 } . The following lemma shown by Hassin et al. [11] regarding the probability distribution on the possible worlds uniquely defined by the LRO model.
Lemma 1
(Hassin et al. [11]). For 0 i m , the probability of the possible world G i is given by
P ( G i ) = 1 p ( e 1 ) i f i = 0 p ( e i ) p ( e i + 1 ) i f 1 i < m p ( e m ) i f i = m .
Coming back to the SBD model, on the uncertain graph G = ( V , E , p ) and E partitioned into { E 1 , E 2 , E } for some 1 , the probability of an outcome H under the SBD model naturally follows from Lemma 1. The idea is to consider an outcome of the SBD model as independent samples from the LRO model on the uncertain graphs ( V , E 1 , p 1 ) , ( V , E 2 , p 2 ) , , ( V , E , p ) , where for each 1 i , p i is the function p restricted to E i .
This completes our description of the distribution models known in the literature. We next present two dynamic programming algorithms on the input uncertain graph G = ( V , E , p ) , when the graph G = ( V , E ) is presented as a nice tree decomposition. The algorithms compute an expectation under the RF model and have worse running times than that of corresponding algorithms for computing the expectation under the LRO model.

3. Definitions Related to Graphs

Every graph we consider in this work are simple undirected graphs unless. A graph G = ( V , E ) is an undirected graph with vertex set V and edge set E. We denote the number of vertices and edges by n and m, respectively. For a vertex v V , N ( v ) denotes the set of neighbors of v and N [ v ] = N ( v ) { v } is the closed neighborhood of v. For each vertex v V , d e g G ( v ) denote the degree of the vertex v in G. When G is clear in the context d e g ( v ) is used. The maximum degree of the graph G, denoted by Δ ( G ) , and the minimum degree of the graph G, denoted by δ ( G ) , are the maximum and minimum degree of its vertices. When G is clear in the context Δ and δ is used. Other than this, we follow the standard graph theoretic terminologies from Reference [44]. We define some special notations for the uncertain graphs as follows. Given a vertex v V , let E ( v ) E denote the set of all edges incident on v. Given a subset of vertices C V , let E ( C ) = v C E ( v ) denote the edge set of the vertex-induced uncertain subgraph G [ C ] . Similarly, given an edge set F E , let V ( F ) = e = u v F { u , v } denote the vertex set of the edge-induced uncertain graph G [ F ] .
We study the parameterized complexity of the coverage problems and k-core problem on uncertain graphs. We follow the standard parameterized complexity terminologies from Reference [40]. We define the parameter treewidth that will be relevant to our discussion.
Definition 4
(Tree Decomposition [45,46]). A tree decomposition of a graph G is a pair ( X , H ) such that H is a tree and X = { X i V : i H } . For each node i H , X i is referred to as bag of i. The following three conditions hold for a tree decomposition ( X , H ) of the graph G.
(a) 
For each vertex v V , there is a node i H such that v X i .
(b) 
For each edge u v E , there is a node i H such that u , v X i .
(c) 
For each vertex v V , the induced subtree of the nodes in H that contains v is connected.
The width of a tree decomposition is the max i H ( | X i | 1 ) . The treewidth of a graph, denoted by t w , is the minimum width over all possible tree decompositions of G. An example of a tree decomposition is illustrated in Figure 2. For our algorithm, we require a special kind of decomposition, called the nice tree decomposition which we define below.
Definition 5
(Nice tree decomposition [46]). A nice tree decomposition is a tree decomposition, rooted by a node r with X r = and each node in the tree decomposition is one of the following four type of nodes.
1. 
Leaf node.A node i H with no child and X i = .
2. 
Introduce node.A node i H with one child j such that X i = X j { v } for some v X j .
3. 
Forget node.A node i H with one child j such that X i = X j \ { v } for some v X j .
4. 
Join node.A node i H with two children j and g such that X i = X j = X g .
An example of four types of nodes is illustrated in Figure 3. Given a tree decomposition ( X , H ) of a graph G with width k, a nice tree decomposition ( X , H ) with same width and O ( n k ) nodes can be computed in time O ( n k 2 ) [46]. Hereafter, we will assume that the tree decompositions considered are nice. For each node i H , let H i denote the subtree rooted at i. Let n i = | X i | . Let X i + be the set of all vertices in the bag of nodes in the subtree H i .
X i + = X i if i is a leaf node X i j c h ( i ) X j + if i is a non - leaf node ,
where c h ( i ) is the set of all children of i in H. We use j and k to denote the two children of a join node i in H. Further, j denotes the left child, and g denotes the right child. In case i has only one child, as in the case of introduce and forget nodes, only the left child j is well-defined and g does not exist. In this case, X g + is taken to be the empty set. We refer to Reference [40] for a thorough introduction to treewidth and its algorithmic properties.

4. Max-Exp-Cover-1-RF Problem is FPT by ( Treewidth · Δ )

An instance of Max-Exp-Cover-1-RF consists of a tuple G , w , k where G = ( V , E , p ) is an uncertain graph, w is a function that assigns weight w ( v ) to each vertex v, and k is the budget. The goal is to find a set S V such that | S | k and the expected total weight of the vertices dominated by S is maximized. The expectation is computed over the probability distribution uniquely defined on the possible worlds by the RF model. We introduce the function C which we refer to as the coverage function. The two arguments of the coverage function are subsets T and S of the vertex set V, and the value C ( T , S ) is the expected coverage of T by S, where the expectation is computed over the possible worlds. For any subsets S , T V , the Coverage of T by S, denoted by C ( T , S ) , is v T C ( v , S ) . For a vertex v V and S V , the expected coverage of v by the set S, denoted by C ( v , S ) , is given by
C ( v , S ) = w ( v ) if v S w ( v ) 1 u S N ( v ) ( 1 p u v ) otherwise .
Note that if a set is a singleton set, then we abuse notation a little and write the element instead of the set.
Our algorithm for the Max-Exp-Cover-1-RF follows the classical bottom-up approach for dynamic programming based on the nice tree decomposition. We compute a certain number of candidate solutions for the subproblem at each node i H , using only the candidate solution values maintained in the children of i. At each node i H , we have ( 5 · 2 Δ ) n i candidate solutions which are stored in a table denoted by T i . The final optimal solution is obtained from the solutions stored in T r at the root node r of H. At any node i in H, the expected coverage of a vertex u X i by a set S in the subproblem at node i in H is computed by decomposing N ( u ) S carefully to avoid over-counting. Towards this end, we introduce notation for the coverage conditioned on the event that some edges have failed. We denote this event by a function f and call this the surviving neighbors (SN) function. The SN-function f : V 2 V has the property that for each u V , f ( u ) N ( u ) . In other words, we are interested in the expected coverage conditioned on the event that for each u V , all the edges to vertices in N ( u ) \ f ( u ) have failed.
Definition 6.
Let u be a vertex and S V be a set and f be an SN-function such that S f ( u ) = S N ( u ) . The conditional coverage of u by S restricted by f is defined to be
C f ( u , S ) = v N ( u ) \ f ( u ) ( 1 p ( u v ) ) C ( u , S f ( u ) )
Extending this definition to sets, for any two vertex sets T , S , we define C f ( T , S ) = u T C f ( u , S ) .
For each u V , C f ( u , S ) is the expected coverage of u by S f ( u ) conditioned on the event that u is not covered by any vertex from the set N ( u ) \ f ( u ) . Further, C f ( u , S ) is the product of w ( u ) and the probability of sampling a subgraph G G in which u has a neighbor in S N ( u ) and no neighbor from N ( u ) \ f ( u ) . We refer to C f ( u , S ) as the conditional coverage of u by S (we leave out the phrase restricted by f).
Lemma 2.
Let u V and X , Y V such that X Y = , and consider an SN-function f such that f ( u ) = N ( u ) \ X . Then C ( u , X Y ) = C ( u , X ) + C f ( u , Y ) .
Proof. 
C ( u , X Y ) = w ( u ) 1 v ( X Y ) N ( u ) ( 1 p ( u v ) ) = w ( u ) 1 v X N ( u ) ( 1 p ( u v ) ) · v Y N ( u ) ( 1 p ( u v ) ) = w ( u ) ( 1 v X N ( u ) ( 1 p ( u v ) ) + v X N ( u ) ( 1 p ( u v ) ) 1 v Y N ( u ) ( 1 p ( u v ) ) = C ( u , X ) + v X N ( u ) ( 1 p ( u v ) ) C ( u , Y ) = C ( u , X ) + C f ( u , Y )
Hence the proof.  □

4.1. Recursive Formulation of the Value of a Solution

Let S V be a set of size k for the Max-Exp-Cover-1-RF problem in G . We now present a recursive formulation to compute the expected coverage of V by S. Throughout this section S denotes this set of size k. Let i be a node in H. Let j and g be the children of i. In case i is either introduce node or forget node, then g is considered to be a null node with X g = X g + = . In node i, let A = S X i , S i = S X i + and Z i = S i \ A . Let Z ^ = S \ S i . For the solution S, the set X i can be partitioned into five sets ( A , C , L , R , B ) as follows.
A = { u X i S } C = { u X i \ A u N ( A ) \ N ( Z i ) } , in other words , C satisfies C ( C , Z i ) = 0 L = { u X i \ A u N ( Z j ) \ N ( Z g ) } , in other words , L satisfies C ( L , Z g ) = 0 R = { u X i \ A u N ( Z g \ N ( Z j ) } , in other words , R satisfies C ( R , Z j ) = 0 B = { u X i \ A u N ( Z j ) N ( Z g ) } .
Further, for the solution S, define the SN-function f : X i + 2 V as follows:
f ( u ) = N ( u ) \ { v N ( u ) v Z ^ } , if u X i = N ( u ) , if u X i + \ X i
Note:f is dependent on S and i and wherever f is used, it must be used as per the definition at the corresponding node in the tree decomposition.
Let S V be a subset of size k, i be a node in H in the nice tree decomposition, f be the SN-function for X i + defined using S, P = ( A , C , L , R , B ) be a partition of X i , and S i , Z i and Z ^ be as defined above. The following two lemmas are useful in setting up a recursive definition of C ( V , S ) and a bottom-up evaluation of the recurrence.
Lemma 3.
C ( V , S ) = C ( V \ X i + , Z ^ A ) + C ( X i \ A , Z ^ ) + C f ( X i + , S i ) .
Proof. 
The expected coverage of V by the set S is given by C ( V , S ) = C ( V \ X i + , S ) + C ( X i , S ) + C ( X i + \ X i , S ) . Since V is partitioned into three disjoint sets X i , X i + \ X i , and V \ X i + , we get
C ( V , S ) = C ( V \ X i + , Z ^ A ) + C ( X i \ A , Z ^ S i ) + C ( A , S ) + C ( X i + \ X i , S i ) .
By applying Lemma 2 to C ( X i \ A , Z ^ S i ) , it follows that
C ( V , S ) = C ( V \ X i + , Z ^ A ) + C ( X i \ A , Z ^ ) + C f ( X i \ A , S i ) + C ( A , S ) + C ( X i + \ X i , S i ) .
Further, C ( A , S i ) = C ( A , S ) , since C ( A , S i ) = C ( A , S ) = v A w ( v ) , it follows that
C ( V , S ) = C ( V \ X i + , Z ^ A ) + C ( X i \ A , Z ^ ) + C f ( X i \ A , S i ) + C ( A , S i ) + C ( X i + \ X i , S i ) .
By the definition of f and C f , C f ( X i \ A , S i ) + C ( A , S i ) + C ( X i + \ X i , S i ) = C f ( X i + , S i ) . Therefore we get C ( V , S ) = C ( V \ X i + , Z ^ A ) + C ( X i , Z ^ ) + C f ( X i + , S i ) . Hence the lemma.  □
As a corollary it follows that C ( V , S ) = C f ( X r + , S r ) . Recall that X r = , S r = S and f is such that f ( u ) = N ( u ) for all u X i + . We now show that for each node i H , C f ( X i + , S i ) can be written in terms of the appropriate sub-problems in children j and g of the node i.
Lemma 4.
C f ( X i + , S i ) = w ( A ) + C f ( C , A ) + C f ( L , A Z j ) + C f ( R , A Z g ) + C f ( B , S i ) + C ( X i + \ X i , S i ) .
Proof. 
By definition, the expected coverage of X i + by S i restricted by f is given by the following equations.
C f ( X i + , S i ) = C f ( X i , S i ) + C f ( X i + \ X i , S i ) = w ( A ) + C f ( C , S i ) + C f ( L , S i ) + C f ( R , S i ) + C f ( B , S i ) + C ( X i + \ X i , S i ) = w ( A ) + C f ( C , A ) + C f ( L , A Z j ) + C f ( R , A Z g ) + C f ( B , S i ) + C ( X i + \ X i , S i )
The second equation follows from the first due to the definition of the partition P = ( A , C , L , R , B ) , and the fact that for all u X i + \ X i , f ( u ) = N ( u ) . The third equation follows from the second due to the fact that C f ( L , S i ) = C f ( L , A Z j ) and C f ( R , S i ) = C f ( R , A Z g ) , since L and R are sets for which C ( L , Z g ) = C ( R , Z j ) = 0 .  □
Lemmas 3 and 4 show that the expected coverage of a set S of size k can be computed in a bottom-up manner over the nice tree decomposition. Further, at a node i H the expected coverage of X i + by a S is C ( X i , Z ^ ) + C f ( X i + , S i ) , where f is uniquely determined by S, i and the tree decomposition. Also, C f ( X i + , S i ) is decomposed into six terms based on the partition P which is uniquely determined by S, i and the tree decomposition. Of these six terms, five of them are computed at the node i H , and one term comes from j and g, which are the children of i. Therefore, the search for the optimum S of size k can be performed in a bottom-up manner by enumerating all possible choices of the 5-way partition of X i and all possible choices of the SN-function f at X i . For each candidate partition of X i and the SN-function f, the optimum S i is computed by considering the compatible solutions at X j and X g . This completes the recursive formulation of the expected coverage of a set S of size k. We next present the bottom-up evaluation of the recurrence to compute the optimum set which is arg max | S | k C ( V , S ) .

4.2. Bottom-Up Computation of an Optimal Set

For each node i H , we associate a table T i . Each row in the table T i is a triple which consists of an integer b, a five way partitioning P = ( A , C , L , R , B ) of X i and an SN-function f defined on X i + . Throughout this section we assume that for vertices u X i + \ X i , f ( u ) = N ( u ) . The columns corresponding to a row ( b , P , f ) is a vertex set T i [ b , P , f ] . Solution and a value T i [ b , P , f ] . Value . Let S denote T i [ b , P , f ] . Solution . To define the value T i [ b , P , f ] . Value associated with S, consider Z = S \ A , and let Z j = Z ( X j + \ X j ) and Z g = Z ( X g + \ X g ) . Then, T i [ b , P , f ] . Value = w ( A ) + C f ( C , A ) + C f ( L , A Z j ) + C f ( R , A Z g ) + C f ( B , A Z ) + C ( X i + \ X i , S ) . The set T i [ b , P , f ] . Solution is a set S of size b such that S X i = A and the associated value T i [ b , P , f ] . Value is maximized.
Leaf node. Let i H be a leaf node with bag X i = . The only possible five-way partition of an empty set is the set with five empty sets and the budget is b = 0 . The only valid SN-function is f : . Therefore, the value of the corresponding row in the table is
T i [ b , P , f ] = { Solution = , Value = 0 } for b = 0 , P = ( , , , , ) and f : .
It is clear that the empty set is the set that achieves the maximum for the Max-Exp-Cover-1-RF problem on the empty graph with budget b = 0 . Therefore, at the leaf nodes in H, the table T i maintains the optimum solution for each row. The time to update an entry is O ( 1 ) .
Introduce node. Let i be an introduce node with child j such that X i = X j { v } for some v X j . Let 0 b k be an integer, P = ( A , C , L , R , B ) be a five-way partition of X i and f be the SN-function defined on X i + . The computation of the table entry is split into two cases, depending on whether the vertex v belongs to the set A in the partition P or not.
  • Case v A . Define C v = { u X i \ A v N ( u ) f ( u ) } . Let P j denote the partition of X j obtained by removing vertex v from the set A of the partition P . Let f j : X j 2 V be the SN-function defined as follows:
    f j ( u ) = f ( u ) \ { v } if u C v f ( u ) otherwise .
    Then,
    T i [ b , P , f ] . Solution = T j [ b 1 , P j , f j ] . Solution { v } T i [ b , P , f ] . Value = T j [ b 1 , P j , f j ] . Value + w ( v ) + C f ( C v , v )
  • Case v A . Since v is in X i but not in X j it follows that N ( v ) X i + X i . Therefore, the coverage of the vertex v by the vertices that occur only in X j + \ X j is zero. Let P j be the partition of X j obtained by removing the vertex v from the appropriate set in the partition P . The SN-function f j is defined as follows on the set X j + : For u X i \ { v } , f j ( u ) = f ( u ) .
    T i [ b , P , f ] . Solution = T j [ b , P j , f j ] . Solution T i [ b , P , f ] . Value = T j [ b , P j , f j ] . Value + C f ( v , A )
Forget node. Let i be a forget node with child j such that X i = X j \ { v } for some v X j . Let 0 b k be a budget, P = ( A , C , L , R , B ) be a five-way partition of X i and f be an SN-function. We consider the following five-way partitions of X j .
  • P 1 = ( A { v } , C , L , R , B )
  • P 2 = ( A , C { v } , L , R , B )
  • P 3 = ( A , C , L { v } , R , B )
  • P 4 = ( A , C , L , R { v } , B )
  • P 5 = ( A , C , L , R , B { v } )
Let f j be the SN-function defined as follows: for u X i , f j ( u ) = f ( u ) and f j ( v ) = N ( v ) . Let P j = arg max P { P i } i = 1 5 T j [ b , P , f j ] . Value . T i [ b , P , f ] is defined as follows:
T i [ b , P , f ] . Solution = T j [ b , P j , f j ] . Solution T i [ b , P , f ] . Value = T j [ b , P j , f j ] . Value
Join node. Let i be a join node with children j and g such that X i = X j = X g . Let 0 b k be a budget, P = ( A , C , L , R , B ) be a five-way partition of X i and f be an SN-function. We consider the sets U j and U g of all SN-functions defined on X j + and X g + , respectively, satisfying the following conditions:
  • For each v L , f j ( v ) = f ( v ) and f g ( v ) = .
  • For each v R , f j ( v ) = and f g ( v ) = f ( v ) .
  • For each v B , f j ( v ) and f g ( v ) are defined as follows. For each partition f 1 ( v ) f 2 ( v ) = f ( v ) X i + , f j ( v ) = f 1 ( v ) ( f ( v ) \ X i + ) and f g ( v ) = f 2 ( v ) ( f ( v ) \ X i + ) .
We then consider all possible candidates for L j , R j , B j such that L j R j B j = L B and L g , R g , B g such that L g R g B g = R B . Each such candidate defines a P j = ( A , C R , L j , R j , B j ) of X j and a P g = ( A , C L , L g , R g , B g ) of X g . Let R j and R g denote the set of all such candidate partitions of X j and X g , respectively.
Let P j R j , P g R g , f j U j , f g U g and 0 b b | A | be the values at which the maximum value for Equation (2) is achieved.
max 0 b j b | A | P j R j , P g R g f j U j , f g U g T j [ b j + | A | , P j , f j ] . Value + T g [ b b j , P g , f g ] . Value .
The value of the row corresponding to ( b , P , f ) in T i is given as follows:
T i [ b , P , f ] . Solution = T j [ b + | A | , P j , f j ] . Solution T g [ b b , P g , f g ] . Solution T i [ b , P , f ] . Value = T j [ b + | A | , P j , f j ] . Value + T g [ b b , P g , f g ] . Value C f ( A C , A ) .
We now prove that the update steps presented above are correct.
Lemma 5.
For each node i H , for each row ( b , P , f ) in T i , the pair T i [ b , P , f ] . Solution , T i [ b , P , f ] . Value is such that C ( X i + , T i [ b , P , f ] . Solution ) = T i [ b , P , f ] . Value , and this is the maximum possible value.
Proof. 
The proof is by induction on the height of a node in H. The height of a node i in a rooted tree H is the distance to the farthest leaf in the subtree rooted at i. The base case is when i is a leaf node in H, and clearly its height is 0. For a leaf node i with X i = , the row with b = 0 , P = ( , , , , ) and f : is the only valid row entry and its value is 0. This completes the proof of the base case. Let us assume that the claim is true for all nodes of height at most h 1 0 . We prove that if the claim is true at all nodes of height at most h 1 , then it is true for a node of height h. Let b be a budget, P = ( A , C , L , R , B ) be a partition of X i and f be an SN-function on X i + . Let T i [ b , P , f ] . Solution = S = A Z . For any optimal S = A Z where | S | = b , we prove that
T i [ b , P , f ] . Value w ( A ) + C f ( C , A ) + C f ( L , A Z j ) + C f ( R , A Z g ) + C f ( B , S ) + C ( X i + \ X i , S ) .
We proceed by considering the type of node i so that the induction hypothesis can be applied at nodes of height at most h 1 .
Case when i is an introduce node. Let j be the child of i and X i = X j { v } for some v X j . We now consider two cases depending on whether v belongs to A or not.
  • Case v A . Let P j be the partition of X j obtained from P by removing v from A. Let f j be the SN-function on X j + such that f j ( u ) = f ( u ) \ v , if u C v and f j ( u ) = f ( u ) , otherwise. We know from our claimed optimality of S , and that S X i = S X i = A , and the value of T i [ b , P , f ] , that
    C f ( X i + , S ) = C f j ( X j + , S \ { v } ) + C f ( C v , v ) > C f ( X i + , S ) = C f j ( X j + , S \ { v } ) + C f ( C v , v ) .
    Therefore, it follows that C f j ( X j + , S \ { v } ) > C f j ( X j + , S \ { v } ) = T j [ b 1 , P j , f j ] . Value . In other words, we have concluded that the value for the row ( b 1 , P j , f j ) in T j is not the optimum value. This contradicts our premise at node j, which is of height at most h 1 for which, by induction hypothesis, the table maintains the optimal values. Therefore, our assumption that T i [ b , P , f ] is not optimum is wrong.
  • Case v A . Let P j be the partition of X j obtained by removing v from the appropriate set in the partition P . Let f j be the SN-function on X j + such that f j ( u ) = f ( u ) for each u X j + . We know from our claimed optimality of S , and that S X i = S X i = A , and the value of T i [ b , P , f ] that C f ( X i + , S ) = C f j ( X j + , S ) + C f ( v , A ) > C f ( X i + , S ) = C f j ( X j + , S ) + C f ( v , A ) . Therefore, it follows that C f j ( X j + , S ) > C f j ( X j + , S ) . In other words, we have concluded that the value for the row ( b , P j , f j ) in T j is not the optimum value. This contradicts our premise at node j, which is of height at most h 1 for which, by induction hypothesis, the table maintains the optimal values. Therefore, our assumption that T i [ b , P , f ] is not optimum is wrong.
Forget node. We know that X i + = X j + , and v is in X j but not in X i , it follows that N ( v ) X i = and N ( v ) X j + . Define f j to be the SN-function at X j + such that f j ( u ) = f ( u ) for each u X i and f j ( v ) = N ( v ) . We have assumed T i [ b , P , f ] . Value < C f ( X i + , S ) . Further, since X i + = X j + and due to the definition of f j , C f ( X i + , S ) = C f j ( X j + , S ) . Since T i [ b , P , f ] is computed identically from some row in T j , let us say ( b , P j , f j ) , it follows that T j [ b , P j , f j ] . Value < C f j ( X j + , S ) . This contradicts the premise that the table T i is at the lowest height in the tree decomposition at which an entry is sub-optimal. Therefore, our premise is wrong.
Join node. We assume that S is indeed a better solution than S for the table entry ( b , P , f ) of T i . Let S j = S ( X j + \ X j ) and S g = S ( X g + \ X g ) . Let b j = | S j | . Let P j = ( A , C R , L j , R j , B j ) and P g = ( A , C L , L g , R g , B g ) be the partitions of X j and X g defined using S j and S g , respectively. Note that, L j R j B j = L B and L g R g B g = R B . Let f j be the SN function on X j + such that f j ( u ) = f ( u ) for u A C L B and f j ( u ) = for u R . Let f g be the SN function on X g + such that f g ( u ) = f ( u ) for u A C R , f g ( u ) = for u L and f g ( u ) = f ( u ) \ S j . The coverage C f ( X i + , S ) can be written as C f ( X i + , S ) = C f j ( X j + , S j ) + C f g ( X g + , S g ) w ( A ) C f ( C , A ) , where the coverage of X j + by S j and X g + by S g are restricted to the partitions P j and P g .
We know that X i = X j = X g in case of join node. The table entry ( b , P , f ) is updated using the entries ( b j + | A | , P j , f j ) and ( b b j , P g , f g ) of the table T j and T g , respectively. In other words, the values of variables b j , P j , P g , f j and f g are obtained from the Equation (2) described in the recursive definition of join node. The values b j , P j , P g , f j and f g are also feasible for the range given in the Equation (2). Then, we have T j [ b j + | A | , P j , f j ] . Value + T g [ b b j , P g , f g ] . Value < T j [ b + | A | , P j , f j ] . Value + T g [ b b , P g , f g ] . Value . Since S is better than S for the entry ( b , P , f ) of T i , we have T j [ b + | A | , P j , f j ] . Value + T g [ b b , P g , f g ] . Value < C f j ( X j + , S j ) + C f g ( X g + , S g ) . The above inequality shows that at least one of the table entries T j [ b j + | A | , P j , f j ] or T g [ b b j , P g , f g ] is not optimal. This would again contradict the premise that i is the node at the least height at which some table entry is sub-optimal.
This completes the case analysis and our proof. Hence the lemma.  □
Running Time. There are O ( n t w ) -many nodes in the nice tree decomposition H. Each node i H has a maximum of ( k + 1 ) ( 5 · 2 Δ ) t w entries. The 2 Δ · t w comes from the fact that at each vertex in a bag, we enumerate all subsets of neighbors to come up with the SN-functions. It is clear from the description that at the leaf nodes, introduce nodes, and forget nodes, the update time is O ( t w ) . At a join node, the time taken to compute an entry depends on three basic operations. The optimal partitions P j and P g are computed in 3 | L | + | R | + 2 | B | time and budget distribution can be done in O ( k ) time. The costliest operation is to enumerate the different SN-functions f j and f g for the given SN-function f. Since we need to consider all the 2 Δ -possible ways of distributing the f ( v ) for each vertex v, the distribution takes O ( ( 2 Δ ) | B | ) time. Therefore, the running time for an entry ( b , P , f ) in a join node takes O ( k · 3 | L | + | R | + 2 | B | · ( 2 Δ ) | B | ) time, and this is 2 O ( Δ · t w ) . This analysis of the running time and Lemma 5 complete the proof of the following theorem which is our main result.
Theorem 1.
TheMax-Exp-Cover-1-RFproblem can be solved in time 2 O ( Δ · t w ) n O ( 1 ) .

5. Parameterized Complexity of Probabilistic-Core Problem

Let G = ( V , E , p ) be an uncertain graph and d be an integer. Given a set K V , we define the probability that the set K is being a d-core in G , denoted by ρ d ( G , K ) , to be H G P ( H ) I ( H , K , d ) , where I is an indicator variable that takes value 1 if and only if the set K forms a d-core in the graph H. The decision version of the Probabilistic-Core problem is formally stated as follows: given an uncertain graph G = ( V , E , p ) , an integer d and a probability θ [ 0 , 1 ] , decide if there exists a set K V such that ρ d ( G , K ) θ .
We study the Probabilistic-Core problem under the LRO and RF model. First we show that the Probabilistic-Core-LRO problem and the Individual-Core problem are polynomial time solvable, due to the fact that an uncertain graph under the LRO model has only a polynomial number of worlds. Then we show that the Probabilistic-Core-RF problem is W[1]-hard for the parameter d and admits an FPT algorithm for the parameter treewidth.

5.1. An Exact Algorithm for the Probabilistic-Core-LRO Problem

We present a polynomial time algorithm (see as Algorithm 1) for the Probabilistic-Core-LRO problem. Let G = ( V , E , p ) , d be an instance of the Probabilistic-Core-LRO problem. As per the definition of the LRO model, let the set { G 0 , , G m } be the possible worlds of G , where G 0 = ( V , ) and for 0 i m 1 , E ( G i + 1 ) = E ( G i ) { e i } . For each 0 i m 1 , we consider the linear order in which G i precedes G i + 1 .
Algorithm 1:Probabilistic-Core-LRO
Algorithms 13 00003 i001
Lemma 6.
The Algorithm 1 solves theProbabilistic-Core-LROproblem in polynomial time.
Proof. 
By definition of the possible worlds, G m = ( V , E ) . If δ ( G m ) < d , then G m does not contain a d-core. Moreover, for 0 i < m , since G i is an edge subgraph of G m , no graph in the possible worlds has a d-core. This is indicated by the return value-pair of the ∅ and probability 0.0 . On the other hand if δ ( G m ) d , let G i be the first graph in the linear ordering of possible worlds for which a non-empty C is computed by Algorithm 1 on exit from the While-loop. Clearly, C induces a d-core in G i . Since G i is the first graph which contains a d-core, for each 0 j i 1 , G j does not contain a d-core. Further, for every j > i , C is a d-core in G j since it is an edge-subgraph of G i . Then,
ρ d ( G , C ) = j = 0 m P ( G j ) I ( G j , C , d ) = j = i m P ( G j ) = p ( e i ) .
For any set K V with ρ d ( G , K ) > 0 , the set K can form a d-core only in the possible worlds { G i , G i + 1 , , G m } . Thus, ρ d ( G , K ) ρ d ( G , C ) . This completes the proof.  □
The following observation again uses the fact that there are only a polynomial number of possible worlds under the LRO model.
Observation 3.
TheIndividual-Coreproblem is polynomial time solvable on uncertain graphs under the LRO model.
Proof. 
Clearly, for each 0 i m , the While-loop in Algorithm 1 computes the maximal d-core in G i . Therefore, for each v, q d ( v ) = p ( e i ) , where i the smallest index such that the non-empty d-core contains v. Since the number of possible worlds are m + 1 , for each vertex v V , the probability q d ( v ) can be computed in polynomial time. Thus, for a given uncertain graph G = ( V , E , p ) , an integer d and probability threshold θ , the set { v q d ( v ) θ } is the optimum solution to the Individual-Core problem.  □

5.2. Parameterized Complexity of the Probabilistic-Core-RF Problem

We show that the Probabilistic-Core-RF problem is W[1]-hard. The reduction that we show for the W[1]-hardness is similar to the hardness result of the Individual-Core problem shown by Peng et al. [16] (they call this the ( d , θ ) -Core problem). The reduction is from the Clique problem which is as follows: given a graph G = ( V , E ) and an integer k, decide if G has a clique of size at least k. The Clique problem is known to be W[1]-hard [47].
Theorem 2.
TheProbabilistic-Core-RFproblem is W[1]-hard for the parameter d.
Proof. 
Let G = ( V , E ) , k be an instance of the Clique problem. The output of the reduction is denoted by G = ( V , E , p ) , d , θ , and it is an instance of Probabilistic-Core-RF problem. The vertex set and edge set of G are same as V and E, respectively, d = k 1 and θ = 2 k 2 . Further, for each edge e E , define p ( e ) = 1 2 in G . Now we show that the Clique problem on the instance G , k and the Probabilistic-Core-RF problem on the instance G , d , θ are equivalent.
We prove the forward direction first. Let K V be a k-clique in G. The set K is indeed a d-core since every vertex v K has k 1 neighbors in K. The probability that K is a d-Core in a random sample from the possible worlds of G is 2 k 2 . Thus, the set K is a feasible solution for the instance G , d , θ of the Probabilistic-Core-RF problem.
Now we prove the reverse direction. We claim that any feasible solution K V contains exactly d + 1 vertices. If K has less than d + 1 vertices then K cannot form a d-core in any possible world. Therefore, we consider the case in which that K has more than d + 1 vertices, and each vertex in the set K has degree at least d. Consequently, the number of edges in any possible world in which K is a d-core is at least d ( d + 2 ) 2 = d ( d + 1 ) 2 + d 2 = d + 1 2 + d 2 . Then, the probability that the set K is a d-core in G is at most 2 d + 1 2 d 2 < θ , and this contradicts the hypothesis that K is a feasible solution. It follows that any feasible solution K V contains exactly d + 1 vertices and each vertex has degree d, thus K is a k clique. Hence the theorem.  □

5.3. The Probabilistic-Core-RF Problem is FPT by Treewidth

We show that the Probabilistic-Core-RF problem admits an FPT algorithm with treewidth as the parameter. The input consists of an instance G = ( V , E , p ) , d of the Probabilistic-Core-RF problem. A nice tree decomposition ( X , T ) of the graph G = ( V , E ) is also given as part of the input. Let i be any node in T and α : X i { 0 , 1 , , d } and β : X i { 0 , 1 , , d } be a pair of functions on X i . Given a set K V , the set K is said to be ( α , β ) -constrained d-core of G if
  • for each v X i K , | N ( v ) K | = α ( v ) + β ( v ) and | N ( v ) K X i | = α ( v ) , and
  • for each v K \ X i , | N ( v ) K | = d .
Let G i α , β denote the uncertain graph G [ X i + \ ( α 1 ( 0 ) β 1 ( 0 ) ) ] . We define a constrained version of the Probabilistic-Core-RF problem as follows. Given a set K, the probability that the set K is an ( α , β ) -constrained d-core in G i α , β is given by:
ρ d ( α , β , G i α , β , K ) = H G i α , β P ( H ) I ( H , α , β , K , d ) ,
where I ( H , α , β , K , d ) is an indicator function that takes value 1 if and only if K is an ( α , β ) -constrained d-core of H. The optimization version of the ( α , β ) -constrained Probabilistic-Core-RF problem seeks to find the set K such that ρ d ( α , β , G i α , β , K ) is maximized. The solution for ( α , β ) -constrained Probabilistic-Core-RF problem for different values of i, α and β on G are partial solutions which are used to come up with a recursive specification of the optimum value. The dynamic programming (DP) formulation on the nice tree decomposition ( X , T ) results in an FPT algorithm with treewidth as the parameter.
The dynamic programming formulation maintains a table T i at every node i X . Each row in the table T i is a pair of functions α and β such that α , β : X i { 0 , 1 , , d } . The column corresponding to the row ( α , β ) is a pair ( Solution , Value ) which consists of a set S α , β and a probability value. Further, the set S α , β is an optimal solution for the ( α , β ) -constrained Probabilistic-Core-RF problem on the instance G i α , β , d .
Intuitively, the functions α and β defined on X i stand for the “in-bag-degree” and “out-bag-degree” constraints, respectively, for each vertex v X i . We maintain the candidate solutions in the table for different values of the functions α and β . We allow the candidate solutions that are infeasible at current stage and those vertices which are not satisfied with degree d will get neighbors from nodes at a higher level in the tree decomposition. The optimal solution for the instance G , d of the Probabilistic-Core-RF problem can be obtained from the table T r where r is the root of the tree decomposition T .

5.3.1. Dynamic Programming

We now present the dynamic programming formulation on the different types of nodes in T . Let i T be a node with bag X i . For a pair of functions α , β : X i { 0 , 1 , , d } , we show how to compute the table entry T i [ α , β ] in each type of node as follows.
Leaf node. Let i be a leaf node with bag X i = . We have one row in the table T i . Let α , β : { 0 } be the pair of functions corresponding to the row and the value of the table entry is given as:
T i [ α , β ] = ( , 0.0 ) .
Insert node. Let i be an insert node with a child j, and let X i = X j { v } for some v X j . The row T i [ α , β ] is computed based on the value of α ( v ) and β ( v ) . If β ( v ) > 0 , then the row T i [ α , β ] becomes infeasible since N ( v ) X i + X i . That is,
T i [ α , β ] = ( , 0.0 ) .
In the rest of the cases we have β ( v ) = 0 . We define β : X j { 0 , 1 , , d } to be a function such that β ( u ) = β ( u ) for all u X j . When α ( v ) = 0 , then the vertex v will not be in the solution S α , β . Let α : X j { 0 , 1 , , d } be a function such that α ( u ) = α ( u ) for all u X j . The value of the row T i [ α , β ] is same as T j [ α , β ] since v is excluded. When α ( v ) > 0 , then v is part of S α , β . Let U = ( N ( v ) X i ) \ α 1 ( 0 ) . Then, we have | U | α ( v ) . Otherwise, no feasible solution exists for the row T i [ α , β ] . That is, degree constraint of v will not be met in any feasible solution. Assume | U | α ( v ) . Let Y U be a subset of U of size α ( v ) , and q ( v , Y ) = u Y p ( u v ) . The vertices in Y that are the neighbors of v contribute degree α ( v ) to v. Then, each vertex u Y will lose a degree from α ( u ) in the node j. Define α Y : X j { 0 , 1 , , d } to be
α Y ( u ) = α ( u ) if u Y α ( u ) 1 if u Y .
Let
W = arg max Y U , | Y | = α ( v ) q ( v , Y ) T j [ α Y , β ] . Value
be the best α ( v ) neighbors of v such that the solution obtained from the table T j is maximized. Then, the recursive definition of the row T i [ α , β ] is given as,
T i [ α , β ] = ( T j [ α W , β ] . Solution { v } , q ( v , W ) T j [ α W , β ] . Value ) .
Forget node. Let i be a forget node with a child j, and let X i = X j \ { v } for some v X j . From the definition of the tree decomposition, it follows that N ( v ) X j + . Since N ( v ) X j + , either v is part of solution with constraint α ( v ) + β ( v ) = d or v is not part of solution. Let U = ( N ( v ) X j ) \ α 1 ( 0 ) . For each 0 a min ( d , | U | ) and Y U of size a, we define α a , Y , β a , Y : X j { 0 , 1 , , d } such that
α a , Y ( u ) = α ( u ) if u v and u Y α ( u ) + 1 if u v and u Y a if u = v ,
and
β a , Y ( u ) = β ( u ) if u v and u Y β ( u ) 1 if u v and u Y d a if u = v ,
For 0 a m i n ( d , | U | ) , let
α a , β a = arg max α a , Y , β a , Y : Y U , | Y | = a T j [ α a , Y , β a , Y ] . Value .
Let t = arg max 0 a m i n ( d , | U | ) T j [ α a , β a ] . Value . Then the value of the row T i [ α , β ] is equal to T j [ α t , β t ] .
Join node. Let i be a join node with children j and g such that X i = X j = X g . For a function α defined on X i , we consider the function α for the child node X j and for the other child node X g , consider the function α : X i { 0 , 1 , , d } such that α ( u ) = 0 for all u X i . Since β considers the neighbors from outside X i , each vertex v X i with β ( v ) > 0 will get the d-core neighbors from the set X i + \ X i . Since X i + \ X i = ( X j + \ X j ) ( X g + \ X g ) and both the sets are disjoint, we divide β ( v ) into two parts. For each vertex v X i , we try all possible ways of dividing β ( v ) into two parts. For x : X i { 0 , 1 , , d } such that for each v X i , 0 x ( v ) β ( v ) , we define β x : X j { 0 , 1 , , d } and β x : X g { 0 , 1 , , d } to be β x ( v ) = x ( v ) and β x ( v ) = β ( v ) x ( v ) . Let
y = arg max x : X i { 0 , 1 , , d } v X i , 0 x ( v ) β ( v ) T j [ α , β x ] . Value · T g [ α , β x ] . Value
Then, the recursive definition of the row T i [ α , β ] is given as,
T i [ α , β ] = ( T j [ α , β y ] . Solution T g [ α , β y ] . Solution , T j [ α , β y ] . Value · T g [ α , β y ] . Value ) .

5.3.2. Correctness and Running Time

Lemma 7.
Let i be a node in T . For every pair of functions α , β : X i { 0 , 1 , , d } , the row T i [ α , β ] is computed optimally.
Proof. 
Let i be a node in T . We claim that for each pair of functions α , β : X i { 0 , 1 , , d } , the row T i [ α , β ] computes the optimal solution for the instance G i α , β of the ( α , β ) -constrained Probabilistic-Core problem. That is, the set S α , β = T i [ α , β ] . Solution is an optimal solution for the above instance. Let A = X i \ ( α 1 ( 0 ) β 1 ( 0 ) ) . We show that for any S = A Z for some Z X i + \ X i ,
ρ d ( α , β , G i α , β , S ) ρ d ( α , β , G i α , β , S α , β ) .
The proof is by induction on the height of a node in T . The height of a node i in the rooted tree T is the distance to the farthest leaf in the subtree rooted at i. The base case is when i is a leaf node in T and height is 0. For a leaf node i with X i = , the row with α , β : { 0 , 1 , d } completes the proof of the base case. Let us assume that the claim is true for all nodes of height at most h 1 0 . We now prove that if the claim is true at all nodes of height at most h 1 , then it is true for a node of height h. Let i be a node of height h. Clearly i is not a leaf node.
When i is an introduce node. Let j be the child of i, and X i = X j { v } for some v X j . If β ( v ) > 0 , then no feasible solution exists since N ( v ) ( X i + \ X i ) = . This is captured in our dynamic programming. In the further cases, we consider β ( v ) = 0 . Let L = N ( v ) X i . Consider the case when α ( v ) = 0 . That is, the vertex v is not part of the solution. The recursive definition of the dynamic programming gives
S α , β = T i [ α , β ] . Solution = T j [ α , β ] . Solution
where α and β are as defined in the dynamic programming. A feasible solution to the row T i [ α , β ] should be feasible for the row T j [ α , β ] . Otherwise, the degree constraints are not met by the solution. For the solution S = A Z ,
ρ d ( α , β , G j α , β , S ) ρ d ( α , β , G j α , β , S α , β )
since j is a node at height h 1 and by our induction hypothesis. Also,
ρ d ( α , β , G j α , β , S ) = ρ d ( α , β , G i α , β , S )
since v S . Then, we have
ρ d ( α , β , G i α , β , S ) = ρ d ( α , β , G j α , β , S ) ρ d ( α , β , G j α , β , S α , β ) = ρ d ( α , β , G i α , β , S α , β ) .
This completes the case when β ( v ) = 0 and α ( v ) = 0 .
Now we consider the case where α ( v ) > 0 . In the solution S = A Z , we need α ( v ) neighbors of U in the d-core where the set U = ( N ( v ) X i ) \ ( α 1 ( 0 ) β 1 ( 0 ) ) . Let S j = S \ { v } . There exists a Y U of size α ( v ) such that the probability ρ d ( α , β , G i α , β , S ) can be written as follows:
ρ d ( α , β , G i α , β , S ) = ρ d ( α Y , β , G j α Y , β , S j ) u Y p ( u v ) .
The solution S j is compatible for the row T j [ α Y , β ] where the vertices in Y are neighbors of v and degree constraint of these vertices is decreased by 1 in α such that it will be satisfied by the edge from v. Since W in the dynamic programming is optimal over all possible α ( v ) sized subsets of W, we have
ρ d ( α , β , G i α , β , S ) = ρ d ( α Y , β , G j α Y , β , S j ) u Y p ( u v ) ρ d ( α W , β , G j α W , β , T j [ α W , β ] . Solution ) u W p ( u v ) T j [ α W , β ] . Value u W p ( u v ) = T i [ α , β ] . Value .
This completes the argument of the case when i is an introduce node.
When i is a forget node. Let j be the child of i, and X i = X j \ { v } for some v X j . We consider two cases depending on whether v belongs to S. We first consider the case when v S . Consider the functions α 0 , and β 0 , as defined in the recursive computation of forget node. Since v S , v will get zero degree constraint in both functions α and β , and other vertices will have same constraints as α and β values. Then, the probability ρ d ( α , β , G i α , β , S ) can be written as follows:
ρ d ( α , β , G i α , β , S ) = ρ d ( α 0 , , β 0 , , G j α 0 , , β 0 , , S ) .
Since j is a node at height h 1 , we have
ρ d ( α , β , G i α , β , S ) = ρ d ( α 0 , , β 0 , , G j α 0 , , β 0 , , S ) T j [ α 0 , , β 0 , ] . Value T j [ α t , β t ] . Value = T i [ α , β ] . Value
Secondly, we consider the case when v S . Let U = ( N ( v ) X j ) \ α 1 ( 0 ) . Since v is in X j but v is not in its parent node X i , v should have degree d in the ( α , β ) -constrained Probabilistic-Core problem. Then, α ( v ) = a and β ( v ) = d a for an 0 a d . Let Y U of size a be the set of vertices that have an edge to v to compensate the degree constraint α ( v ) at v. Then, each vertex u Y will gain a degree constraint α ( u ) and lose a degree constraint β ( u ) . Using the integer a and the set Y, the probability ρ d ( α , β , G i α , β , S ) can be written as follows:
ρ d ( α , β , G i α , β , S ) = ρ d ( α a , Y , β a , Y , G j α a , Y , β a , Y , S ) .
Since j is a node at height h 1 , we know that the row T j [ α a , Y , β a , Y ] is computed optimally. That is,
ρ d ( α a , Y , β a , Y , G i α a , Y , β a , Y , S ) T j [ α a , Y , β a , Y ] . Value T j [ α t , β t ] . Value = T i [ α , β ] . Value
This completes the argument of the case when i is a forget node.
When i is a join node. Let j and g be the children of i, and X i = X j = X g . The set S \ X i can be partitioned into sets S j and S g where S j = ( S \ X i ) X j and S g = ( S \ X i ) X g . Let U = { u X i α ( v ) > 0 } and S i = X i \ α 1 ( 0 ) . For each vertex u U , the degree constraint α ( u ) should be satisfied by the edges from U to u. Since X i = X j = X g , the degree constraint α ( u ) is either satisfied in the node j or node g and not in both. Without loss of generality we assume that the degree constraint α is satisfied in the node j and no zero degree constraint α in the node g. Then we define α : X g { 0 , 1 , , d } to be for every vertex v X i , α ( v ) = 0 . For each vertex v X i with β ( v ) > 0 , the degree constraint β ( v ) can be satisfied by the sets S j and S g together. There exists an integer x ( v ) such that x ( v ) neighbors in the core are obtained from the set S j and β ( v ) x ( v ) neighbors in the core are obtained from the set S g . Then, there exists a function x : X i { 0 , 1 , , d } such that for each vertex v V , 0 x ( v ) β ( v ) . Using the function x, the probability ρ d ( α , β , G i α , β , S ) can be given as follows:
ρ d ( α , β , G i α , β , S ) = ρ d ( α , β x , G j α , β x , S i S j ) . ρ d ( α , β x , G g α , β x , S i S g )
The functions β x and β x for the given function x are defined in the description of dynamic programming. Since both j and g are nodes at height at most h 1 , by induction hypothesis, the rows T j [ α j , β x ] and T g [ α , β x ] are computed optimally. Therefore, we have
ρ d ( α , β , G i α , β , S ) = ρ d ( α , β x , G j α , β x , S i S j ) . ρ d ( α , β x , G g α , β x , S i S g ) T j [ α , β x ] . Value . T g [ α , β x ] . Value T j [ α , β y ] . Value . T g [ α , β y ] . Value = T i [ α , β ] . Value
where y is the optimal distribution of the degree which results in the T i [ α , β ] . Value , as defined in the recursive formulation at a join node. This completes the argument for the case when i is a join node. Hence the lemma.  □
For a possible world H G , the degree of a vertex v V ( H ) is at most the degree of v in G = ( V , E ) . A vertex with degree less than d in G cannot be an element of a core in any possible world. Thus, those vertices can be excluded throughout the algorithm. Such a pruning results in either the pruned graph being an empty graph or the minimum degree of the pruned graph is becoming at least d. We state the following Lemma from Koster et al. [48].
Lemma 8
(Koster et al [48]). For a graph G = ( V , E ) with | V | 2 , δ ( G ) t w ( G ) .
In the following lemma, we analyze the running time of the dynamic programming algorithm.
Lemma 9.
TheProbabilistic-Core-RFproblem can be solved optimally in time 2 O ( t w log t w ) n O ( 1 ) .
Proof. 
For each node i in T , our dynamic programming generates a table T i with ( d + 1 ) 2 t w = O ( d 2 t w ) rows. Each row T i [ α , β ] for some α , β : X i { 0 , 1 , , d } is computed in our dynamic programming based on the type of node i. When i is leaf node, a single row exists in T i and that is computed in O ( 1 ) time. When i is introduce node, we consider two cases that α ( v ) = 0 and α ( v ) > 0 . If α ( v ) = 0 , then the functions α j and β j can be computed in time O ( t w ) . If α ( v ) > 0 , the set U can be found by enumerating all α ( v ) sized sets in U. This will take O ( | Y | β ( v ) ) time and using the upper bound on values we get O ( t w d ) . It follows that if i is an introduce node, then T i [ α , β ] can be computed in time O ( t w d ) . When i is a forget node, for each value of 0 a m i n ( d , | U | ) , we enumerate all a sized subsets of Y. This requires O ( t w k k ) time. When i is join node, we need to compute the optimal distribution of β ( v ) for each vertex v X i . This requires O ( d t w ) time. From Lemma 8, we know that d t w , we upper bound the value d by t w . Overall, a row T i [ α , β ] can be computed in time O ( t w t w ) . The entire table T i can be computed in time O ( t w 3 t w ) = 2 O ( t w log t w ) . The nice tree decomposition ( X , T ) has O ( n · t w ) many nodes and a table on each node can be computed in time O ( 2 O ( t w log t w ) ) n O ( 1 ) . An optimal solution to the Probabilistic-Core-RF problem on the input uncertain graph G will be obtained from the table of the root node r. That is, T r [ α : { 0 } , β : { 0 } ] . Solution is the optimal solution.  □
The preceding lemmas results in the following theorem.
Theorem 3.
TheProbabilistic-Coreproblem admits an FPT algorithm for the parameter t w .

6. Discussion

There are many natural questions related to the parameterized complexity of algorithms on uncertain graphs under different distribution models. The following are some open questions.
  • Are there efficient reductions between distribution models so that we can classify problems based on the efficiency of algorithms under different distribution models? This question is also of practical significance because the distribution models are specified as sampling algorithms. Consequently, the complexity of expectation computation on uncertain graphs under different distribution models is an interesting new parameterization. Further, one concrete question is whether the LRO model is easier than the RF model for other optimization problems on uncertain graphs. For the two case studies considered in this paper, that is the case.
  • While our results do support the natural intuition that a tree decomposition is helpful in expectation computation, it is unclear to us how traditional techniques in parameterized algortihms can be carried over to this setting. In particular, it is unclear to us as to whether for any distribution model, a kernelization based algorithm can give an FPT algorithm on uncertain graphs.
  • We have considered the coverage and the core problems on uncertain graphs under the LRO and RF models. However, we have not been able to get an FPT algorithm with the parameter treewidth for Max-Exp-Cover-1-RF. Actually, any approach to avoid the exponential dependence on ( Δ . t w ) would be very interesting and would give a significant insight on other approaches to evaluate the expected coverage.
  • Even though the Individual-Core-RF problem and Probabilistic-Core-RF problem are similar, we have not been able to get an FPT algorithm for the Individual-Core-RF problem with treewidth as the parameter. Even for other structural parameters such as vertex-cover number and feedback vertex set number, FPT results will give a significant insight on the Individual-Core problem.

Author Contributions

Investigation, N.S.N. and R.V.; Writing–original draft, N.S.N. and R.V.; All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Añez, J.; Barra, T.D.L.; Pérez, B. Dual graph representation of transport networks. Trans. Res. Part B Methodol. 1996, 30, 209–216. [Google Scholar] [CrossRef]
  2. Hua, M.; Pei, J. Probabilistic Path Queries in Road Networks: Traffic Uncertainty Aware Path Selection. In Proceedings of the 13th International Conference on Extending Database Technology (EDBT ’10), Lausanne, Switzerland, 22–26 March 2010; pp. 347–358. [Google Scholar] [CrossRef]
  3. Asthana, S.; King, O.D.; Gibbons, F.D.; Roth, F.P. Predicting protein complex membership using probabilistic network reliability. Genome Res. 2004, 14, 1170–1175. [Google Scholar] [CrossRef] [Green Version]
  4. Domingos, P.; Richardson, M. Mining the Network Value of Customers. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’01), San Francisco, CA, USA, 26–29 August 2001; pp. 57–66. [Google Scholar] [CrossRef]
  5. Frank, H. Shortest Paths in Probabilistic Graphs. Oper. Res. 1969, 17, 583–599. [Google Scholar] [CrossRef]
  6. Valiant, L.G. The Complexity of Enumeration and Reliability Problems. SIAM J. Comput. 1979, 8, 410–421. [Google Scholar] [CrossRef]
  7. Hoffmann, M.; Erlebach, T.; Krizanc, D.; Mihalák, M.; Raman, R. Computing Minimum Spanning Trees with Uncertainty. In Proceedings of the 25th Annual Symposium on Theoretical Aspects of Computer Science, Bordeaux, France, 21–23 February 2008; pp. 277–288. [Google Scholar] [CrossRef]
  8. Focke, J.; Megow, N.; Meißner, J. Minimum Spanning Tree under Explorable Uncertainty in Theory and Experiments. In Proceedings of the 16th International Symposium on Experimental Algorithms (SEA 2017), London, UK, 21–23 June 2017; pp. 22:1–22:14. [Google Scholar] [CrossRef]
  9. Frank, H.; Hakimi, S. Probabilistic Flows Through a Communication Network. IEEE Trans. Circuit Theory 1965, 12, 413–414. [Google Scholar] [CrossRef]
  10. Evans, J.R. Maximum flow in probabilistic graphs-the discrete case. Networks 1976, 6, 161–183. [Google Scholar] [CrossRef]
  11. Hassin, R.; Ravi, R.; Salman, F.S. Tractable Cases of Facility Location on a Network with a Linear Reliability Order of Links. In Algorithms-ESA 2009, Proceedings of the 17th Annual European Symposium, Copenhagen, Denmark, 7–9 September 2009; Springer: Berlin, Germany, 2009; pp. 275–276. [Google Scholar]
  12. Hassin, R.; Ravi, R.; Salman, F.S. Multiple facility location on a network with linear reliability order of edges. J. Comb. Optim. 2017, 34, 1–25. [Google Scholar] [CrossRef]
  13. Narayanaswamy, N.S.; Nasre, M.; Vijayaragunathan, R. Facility Location on Planar Graphs with Unreliable Links. In Proceedings of the Computer Science-Theory and Applications-13th International Computer Science Symposium in Russia, CSR 2018, Moscow, Russia, 6–10 June 2018; pp. 269–281. [Google Scholar] [CrossRef]
  14. Kempe, D.; Kleinberg, J.M.; Tardos, É. Maximizing the spread of influence through a social network. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 24–27 August 2003; pp. 137–146. [Google Scholar] [CrossRef] [Green Version]
  15. Bonchi, F.; Gullo, F.; Kaltenbrunner, A.; Volkovich, Y. Core decomposition of uncertain graphs. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’14), New York, NY, USA, 24–27 August 2014; pp. 1316–1325. [Google Scholar] [CrossRef]
  16. Peng, Y.; Zhang, Y.; Zhang, W.; Lin, X.; Qin, L. Efficient Probabilistic K-Core Computation on Uncertain Graphs. In Proceedings of the 34th IEEE International Conference on Data Engineering (ICDE), Paris, France, 16–19 April 2018; pp. 1192–1203. [Google Scholar] [CrossRef]
  17. Ball, M.O.; Provan, J.S. Calculating bounds on reachability and connectedness in stochastic networks. Networks 1983, 13, 253–278. [Google Scholar] [CrossRef]
  18. Zou, Z.; Li, J. Structural-Context Similarities for Uncertain Graphs. In Proceedings of the 2013 IEEE 13th International Conference on Data Mining, Dallas, TX, USA, 7–10 December 2013; pp. 1325–1330. [Google Scholar] [CrossRef]
  19. Daskin, M.S. A Maximum Expected Covering Location Model: Formulation, Properties and Heuristic Solution. Transp. Sci. 1983, 17, 48–70. [Google Scholar] [CrossRef] [Green Version]
  20. Ball, M.O. Complexity of network reliability computations. Networks 1980, 10, 153–165. [Google Scholar] [CrossRef]
  21. Karp, R.M.; Luby, M. Monte-Carlo algorithms for the planar multiterminal network reliability problem. J. Complex. 1985, 1, 45–64. [Google Scholar] [CrossRef] [Green Version]
  22. Provan, J.S.; Ball, M.O. The Complexity of Counting Cuts and of Computing the Probability that a Graph is Connected. SIAM J. Comput. 1983, 12, 777–788. [Google Scholar] [CrossRef]
  23. Guo, H.; Jerrum, M. A Polynomial-Time Approximation Algorithm for All-Terminal Network Reliability. SIAM J. Comput. 2019, 48, 964–978. [Google Scholar] [CrossRef] [Green Version]
  24. Ghosh, J.; Ngo, H.Q.; Yoon, S.; Qiao, C. On a Routing Problem Within Probabilistic Graphs and its Application to Intermittently Connected Networks. In Proceedings of the 26th IEEE International Conference on Computer Communications, Joint Conference of the IEEE Computer and Communications Societies, INFOCOM, Anchorage, AK, USA, 6–12 May 2007; pp. 1721–1729. [Google Scholar] [CrossRef]
  25. Rubino, G. Network Performance Modeling and Simulation; chapter Network Reliability Evaluation; Gordon and Breach Science Publishers, Inc.: Newark, NJ, USA, 1999; pp. 275–302. [Google Scholar]
  26. Swamynathan, G.; Wilson, C.; Boe, B.; Almeroth, K.C.; Zhao, B.Y. Do social networks improve e-commerce?: a study on social marketplaces. In Proceedings of the first Workshop on Online Social Networks (WOSN 2008), Seattle, WA, USA, 17–22 August 2008; pp. 1–6. [Google Scholar] [CrossRef]
  27. White, D.R.; Harary, F. The Cohesiveness of Blocks In Social Networks: Node Connectivity and Conditional Density. Soc. Methodol. 2001, 31, 305–359. [Google Scholar] [CrossRef] [Green Version]
  28. Papadimitriou, C.H.; Yannakakis, M. Shortest paths without a map. Theor. Comput. Sci. 1991, 84, 127–150. [Google Scholar] [CrossRef] [Green Version]
  29. Khuller, S.; Moss, A.; Naor, J. The Budgeted Maximum Coverage Problem. Inf. Process. Lett. 1999, 70, 39–45. [Google Scholar] [CrossRef]
  30. Brown, J.J.; Reingen, P.H. Social Ties and Word-of-Mouth Referral Behavior. J. Consum. Res. 1987, 14, 350–362. [Google Scholar] [CrossRef]
  31. Richardson, M.; Domingos, P.M. Mining knowledge-sharing sites for viral marketing. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, AB, Canada, 23–26 July 2002; pp. 61–70. [Google Scholar] [CrossRef]
  32. Bass, F.M. A New Product Growth for Model Consumer Durables. Manag. Sci. 1969, 15, 215–227. [Google Scholar] [CrossRef]
  33. Snyder, L.V. Facility location under uncertainty: A review. IIE Trans. 2006, 38, 547–564. [Google Scholar] [CrossRef]
  34. Eiselt, H.A.; Gendreau, M.; Laporte, G. Location of facilities on a network subject to a single-edge failure. Networks 1992, 22, 231–246. [Google Scholar] [CrossRef]
  35. Colbourn, C.J.; Xue, G. A linear time algorithm for computing the most reliable source on a series-parallel graph with unreliable edges. Theor. Comput. Sci. 1998, 209, 331–345. [Google Scholar] [CrossRef] [Green Version]
  36. Ding, W. Computing the Most Reliable Source on Stochastic Ring Networks. In Proceedings of the 2009 WRI World Congress on Software Engineering, Xiamen, China, 19–21 May 2009; Volume 1, pp. 345–347. [Google Scholar] [CrossRef]
  37. Ding, W.; Xue, G. A linear time algorithm for computing a most reliable source on a tree network with faulty nodes. Theor. Comput. Sci. 2011, 412, 225–232. [Google Scholar] [CrossRef] [Green Version]
  38. Melachrinoudis, E.; Helander, M.E. A single facility location problem on a tree with unreliable edges. Networks 1996, 27, 219–237. [Google Scholar] [CrossRef]
  39. Nemhauser, G.L.; Wolsey, L.A.; Fisher, M.L. An analysis of approximations for maximizing submodular set functions—I. Math. Program. 1978, 14, 265–294. [Google Scholar] [CrossRef]
  40. Cygan, M.; Fomin, F.V.; Kowalik, L.; Lokshtanov, D.; Marx, D.; Pilipczuk, M.; Pilipczuk, M.; Saurabh, S. Parameterized Algorithms; Springer: Berlin, Germany, 2015. [Google Scholar] [CrossRef]
  41. Sigal, C.E.; Pritsker, A.A.B.; Solberg, J.J. The Stochastic Shortest Route Problem. Oper. Res. 1980, 28, 1122–1129. [Google Scholar] [CrossRef]
  42. Guerin, R.A.; Orda, A. QoS routing in networks with inaccurate information: Theory and algorithms. IEEE/ACM Trans. Netw. 1999, 7, 350–364. [Google Scholar] [CrossRef]
  43. Günneç, D.; Salman, F.S. Assessing the reliability and the expected performance of a network under disaster risk. In Proceedings of the International Network Optimization Conference (INOC), Spa, Belgium, 22–25 April 2007. [Google Scholar]
  44. Diestel, R. Graph Theory, 4th ed.; Graduate Texts in Mathematics; Springer: Berlin, Germany, 2012; Volume 173. [Google Scholar]
  45. Bodlaender, H.L. A Tourist Guide through Treewidth. Acta Cybern. 1993, 11, 1–21. [Google Scholar]
  46. Kloks, T. Treewidth, Computations and Approximations; Lecture Notes in Computer Science; Springer: Berlin, Germany, 1994; Volume 842. [Google Scholar]
  47. Downey, R.G.; Fellows, M.R. Fixed-parameter intractability. In Proceedings of the Seventh Annual Structure in Complexity Theory Conference, Boston, MA, USA, 22–25 June 1992. [Google Scholar]
  48. Koster, A.M.C.A.; Wolle, T.; Bodlaender, H.L. Degree-Based Treewidth Lower Bounds. In Proceedings of the 4th International Workshop, WEA 2005 Experimental and Efficient Algorithms, Santorini Island, Greece, 10–13 May 2005; pp. 101–112. [Google Scholar] [CrossRef] [Green Version]
Figure 1. (a) A probabilistic graph G = ( V , E , p ) ; (b) A possible world H 1 of G with P ( H 1 ) = 0.0072 ; (c) Another possible world H 2 of G with P ( H 2 ) = 0.0588 .
Figure 1. (a) A probabilistic graph G = ( V , E , p ) ; (b) A possible world H 1 of G with P ( H 1 ) = 0.0072 ; (c) Another possible world H 2 of G with P ( H 2 ) = 0.0588 .
Algorithms 13 00003 g001
Figure 2. (a) A graph with 9 vertices; (b) An optimal tree decomposition with treewidth 3.
Figure 2. (a) A graph with 9 vertices; (b) An optimal tree decomposition with treewidth 3.
Algorithms 13 00003 g002
Figure 3. An example of leaf ( a ) , introduce ( b ) , forget ( c ) and join ( d ) nodes. Directed edges denote child to parent link.
Figure 3. An example of leaf ( a ) , introduce ( b ) , forget ( c ) and join ( d ) nodes. Directed edges denote child to parent link.
Algorithms 13 00003 g003
Table 1. A chronology of studies on uncertain graphs.
Table 1. A chronology of studies on uncertain graphs.
WorkOptimization ProblemUncertainty Model
Frank and Hakimi, 1965 [9]Probabilistic maximum flowCapacities on the edges are drawn from an independent continuous distribution.
Frank, 1969 [5]Probabilistic shortest pathLength of the edges are drawn from a continuous distribution.
Evans, 1976 [10]Probabilistic maximum flowCapacities on the edges are obtained from an arbitrary but unknown discrete probability distribution.
Valiant, 1979 [6]Network reliabilityThe probability p ( e ) is the same for each edge and failure of every edge is independent.
Sigal, Pritsker and Solberg, 1980 [41]Stochastic shortest pathEdge weights are drawn from a known cumulative distribution function.
Daskin, 1983 [19]Expected coverageFailure probability is the same for each vertex
Papadimitriou Yannakakis, 1991 [28]Canadian Traveler ProblemEach edge has a survival probability, edge failure is independent and algorithm knows of the failure during execution.
Guerin and Orda, 1999 [42]Most reliable path and flows with bandwidth selectionEach edge e has a survival probability p e ( x ) for the availability of bandwidth x.
Hassin, Salman and Ravi, 2009 (2017) [11,12]Expected coverageEach edge e has a survival probability p ( e ) and edge failure follows LRO model.
Bonchi, Gullo, Kaltenbrunner and Volkovich, 2014 [15]Probabilistic-CoreEach edge e has a survival probability p ( e ) and edge failure follows RF model.

Share and Cite

MDPI and ACS Style

Narayanaswamy, N.S.; Vijayaragunathan, R. Parameterized Optimization in Uncertain Graphs—A Survey and Some Results. Algorithms 2020, 13, 3. https://doi.org/10.3390/a13010003

AMA Style

Narayanaswamy NS, Vijayaragunathan R. Parameterized Optimization in Uncertain Graphs—A Survey and Some Results. Algorithms. 2020; 13(1):3. https://doi.org/10.3390/a13010003

Chicago/Turabian Style

Narayanaswamy, N. S., and R. Vijayaragunathan. 2020. "Parameterized Optimization in Uncertain Graphs—A Survey and Some Results" Algorithms 13, no. 1: 3. https://doi.org/10.3390/a13010003

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop