A Probabilistic Bound on the Basic Role Mining
Problem and its Applications
Alessandro Colantonio, Roberto Di Pietro, Alberto Ocello, Nino Vincenzo Verde
Abstract In this paper we describe a new probabilistic approach to the role engineering
process for RBAC. In particular, we address the issue of minimizing the number of roles,
problem known in literature as the Basic Role Mining Problem (basicRMP). We leverage
the equivalence of the above issue with the vertex coloring problem. Our main result is the
proof that the minimum number of roles is sharply concentrated around its expected value.
A further contribution is to show how this result can be applied as a stop condition when
striving to find out an approximation for the basicRMP. We also show that the proposal
can be used to decide whether it is advisable to undertake the efforts to renew an RBAC
state. Note that both these applications can result in a substantial saving of resources. A
thorough analysis using advanced probabilistic tools supports our results. Finally, further
relevant research directions are also highlighted.
1 Introduction
An access control model is an abstract representation of security technology, providing
a high-level logical view to describe all peculiarities and behaviors of an access control
system. The Role-Based Access Control (RBAC, [1]) is certainly the most widespread
access control model proposed in the literature for medium to large-size organizations.
Alessandro Colantonio
Engiweb Security, Roma, Italy, e-mail: alessandro.colantonio@eng.it
Università di Roma Tre, Roma, Italy,e-mail: colanton@mat.uniroma3.it
Roberto Di Pietro
Università di Roma Tre, Roma, Italy,
UNESCO Chair in Data Privacy, Tarragona, Spain,
e-mail: dipietro@{mat.uniroma3.it,urv.cat}
Alberto Ocello
Engiweb Security, Roma, Italy, e-mail: alberto.ocello@eng.it
Nino Vincenzo Verde
Università di Roma Tre, Roma, Italy, e-mail: nverde@mat.uniroma3.it
1
2
Alessandro Colantonio, Roberto Di Pietro, Alberto Ocello, Nino Vincenzo Verde
The simplicity of this model is one of the main reasons for its adoption: a role is just a
collection of privileges, while users are assigned to roles based on duties to fulfil [10].
The migration to RBAC introduces different benefits, such as simplified system administration, enhanced organizational productivity, reduction in new employee downtime,
enhanced system security and integrity, simplified regulatory compliance, and enhanced
security policy enforcement [6]. To maximize all the advantages offered by adopting the
role-based approach, the model must be customized to describe the organizational roles
and functions of the company [3]. However, this migration process often has a high economic impact. To optimize the model customization, the role engineering discipline has
been introduced. It can be defined as the set of methodologies and tools to define roles
and to assign permissions to roles according to the actual needs of the company [5].
To date, various role engineering approaches have been proposed in order to address
this problem. They are usually classified in literature as: top-down and bottom-up. The
former carefully decomposes business processes into elementary components, identifying
which system features are necessary to carry out specific tasks. This approach is mainly
manual, as it requires a high level analysis of the business. The bottom-up class searches
legacy access control systems to find de facto roles embedded in existing permissions.
This process can be automated resorting to data mining techniques, thus leading to what
is usually referred to as role mining.
Since the bottom-up approach can be automated, it has attracted a lot of interest
from researchers who proposed new data mining techniques particularly designed for
role engineering purposes. Various role mining approaches can be found in the literature [3, 7, 12, 16–20, 22]. A problem partially addressed in these works is the “interestingness” of roles. Indeed, the importance of role completeness and role management efficiency resulting from the role engineering process has been evident from the earliest
papers on the subject. However, only recently researchers have started to formalize the
role-set optimality concept. One possible optimization approach is minimizing the total
number of roles [7, 12, 18]. Yet, the identification of the role-set that describes the access
control configuration with the minimum number of roles is an NP-complete problem [18].
Thus, all of the aforementioned papers just offer an approximation of the optimal solution
in order to address the complexity of the problem. However, since none of them quantify
the introduced approximations, it is not possible to estimate the quality of the proposed
role mining algorithm outcomes.
Contributions. In this paper we provide a probabilistic method to optimize the number of
roles needed to cover all the existing user-permission assignments. The method leverages
a known reduction of the role number minimization problem to the chromatic number
of a graph. The main contribution of this work is proving that the optimal role number
is sharply concentrated around its expected value. We further show how this result can
be used as a stop condition when striving to find an approximation of the optimum for
any role mining algorithm. The corresponding rational is that if a result is tight to the
optimum, and the effort required to discover a better result is high, it might be appropriate
to accept the current result.
Roadmap. This paper is organized as follows: Section 2 reports relevant related works.
Section 3 summarizes the main concepts used in the rest of the paper; namely, a formal description of the RBAC model, some probabilistic tools, and a brief review of graph theory.
A Probabilistic Bound on the Basic Role Mining Problem and its Applications
3
In Section 4 the role minimization problem and the model used are formally described.
Section 5 provides the main theoretical result and discusses some practical applications
of this result. Finally, Section 6 presents some concluding remarks and further research
directions.
2 Related Work
Kuhlmann et al. [11] first introduced the term “role mining”, trying to apply existing
data mining techniques (i.e., clustering similar to k-means) to implement a bottom-up approach. The first algorithm explicitly designed for role engineering is described in [17],
applying hierarchical clustering on permissions. Another example of a role mining algorithm is provided by Vaidya et al. [20]; they applied subset enumeration techniques to
generate a set of candidate roles, computing all possible intersections among permissions
possessed by users.
The work of Colantonio et al. [3,4] represents the first attempt to discover roles with semantic meanings. The authors define a metric for evaluating good collections of roles that
can be used to minimize the number of candidate roles. Vaidya et al. [18, 19] also studied
the problem of finding the minimum number of roles covering all permissions possessed
by the users, calling it the basic Role Mining Problem (basicRMP). They also demonstrated that such a problem is NP-complete. Ene et al. [7] offer yet another alternative
model to minimize the number of candidate roles. In particular, they reduced the problem
to the well-known minimum clique partition problem or, equivalently, to the minimum
biclique covering. Actually, not only is the role number minimization equivalent to the
clique covering, but it has been reduced to many other NP problems, like binary matrices
factorization [12] and tiling database [9] to cite a few. These reductions make it possible
to apply fast graph reduction algorithms to exactly identify the optimal solution for some
realistic data set—however, the general problem is still NP-complete.
Recently, Frank et al. [8] proposed a probabilistic model for RBAC. They defined
a framework that expresses user-permission relationships in a general way, specifying
the related probability. Through this probability it is possible to elicit the role-user and
role-permission assignments which then make the corresponding direct user-permission
assignments more likely. The authors also presented a sampling algorithm that can be used
to infer their model parameters. The algorithm converges asymptotically to the optimal
value; the approach described in this paper can be used to offer a stop condition for the
quest to the optimum.
3 Background
In this section we review all the notions used in rest of the paper, namely the Role-Based
Access Control entities, some probabilistic tools, and some graph theory concepts.
4
Alessandro Colantonio, Roberto Di Pietro, Alberto Ocello, Nino Vincenzo Verde
3.1 Role-Based Access Control
The RBAC entities of interest are:
• PERMS , the set of access permissions, namely all grantable operations for each system
object;
• USERS , the set of all system users;
• ROLES ⊆ 2PERMS , the set of all roles, namely permission combinations.
• UA ⊆ USERS × ROLES , the set of user-role assignments; given a role, the function
ass_users : ROLES → 2USERS identifies all the assigned users.
• PA ⊆ PERMS × ROLES , the set of permission-role assignments; given a role, the
function ass_perms : ROLES → 2PERMS identifies all the assigned perms.
The RBAC model also allows to establish a partial order among roles, namely a hierarchy
of roles based on the permission set inclusion, identified by the set RH ⊆ ROLES ×
ROLES . Although very useful in certain applications, we are able to achieve our results
without resorting to it; this greatly simplifies the analysis.
In addition to the RBAC standard entities, the set UP ⊆ USERS × PERMS identifies
permission to user assignments. In an access control system it is represented by entities
describing access rights (e.g., access control lists).
3.2 Martingales and Azuma-Hoeffding Inequality
In this section we will introduce some definitions and theorems that provide the mathematical basis we will build on in the sequel of the paper. In particular, we need to introduce: martingales, Doob martingales, and the Azuma-Hoeffding inequality. These are
well known tools for the analysis of randomized algorithms [15, 21].
Definition 1 (Martingale). A sequence of random variables Z0 , Z1 , . . . , Zn is a martingale
with respect to the sequence X0 , X1 , . . . , Xn if for all n ≥ 0, the following condition holds:
• Zn is function of X0 , X1 , . . . , Xn ,
• ❊[|Zn |] ≤ ∞,
• ❊[Zn+1 | X0 , . . . , Xn ] = Zn ,
where the operator ❊[·] indicates the expected value of a random variable. A sequence of
random variables Z0 , Z1 , . . . is called martingale when it is a martingale with respect to
himself. That is ❊[|Zn |] ≤ ∞ and ❊[Zn+1 | Z0 , . . . , Zn ] = Zn .
Definition 2 (Doob Martingale). A Doob martingale refers to a martingale constructed
using the following general approach. Let X0 , X1 , . . . , Xn be a sequence of random variables, and let Y be a random variable with ❊[|Y |] < ∞. (Generally Y , will depend on
X0 , X1 , . . . , Xn .) Then
Zi = ❊[Y | X0 , . . . , Xi ], i = 0, 1, . . . , n,
gives a martingale with respect to X0 , X1 , . . . , Xn .
A Probabilistic Bound on the Basic Role Mining Problem and its Applications
5
The previous construction assures that the resulting sequence Z0 , Z1 , . . . , Zn is always a
martingale.
Martingales are especially useful to predict the value of a random variable Y that
is function of the random variables X1 , . . . , Xn . In this case, we can use a Doob martingale where Z0 , Z1 , . . . , Zn represents a sequence of refined estimates of the value Y ,
that gradually offers more and more information on the values of the random variables
X1 , X2 , . . . , Xn . Z0 is just the expectation of Y , whereas Zi is the expected value of Y when
the values of X1 , . . . , Xi are known. This way, if Y is fully determined by X1 , . . . , Xn , then
Zn = Y . A useful property of the martingales that we will use in this paper is the AzumaHoeffding inequality [15]:
Theorem 1 (Azuma-Hoeffding inequality). Let X0 , . . . , Xn be a martingale such that
Bk ≤ Xk − Xk−1 ≤ Bk + dk
for some constants dk and for some random variables Bk that may be functions of
X0 , X1 , . . . , Xk−1 . Then, for all t ≥ 0 and any λ > 0,
−2λ 2
Pr(|Xt − X0 | ≥ λ ) ≤ 2 exp
(1)
∑tk=1 dk2
The Azuma-Hoeffding inequality applied to the Doob martingale gives the so called
Method of Bounded Differences (MOBD) [14].
3.3 Graphs Modeling
We shall now review some graph related concepts that will be used to generate our RBAC
model. A graph G is an ordered pair G = hV, Ei, where V is the set of vertices, and E
is a set of unordered pairs of vertices. We say that v, w ∈ V are endpoints of the edge
hv, wi ∈ E. Given a subset S of the vertices V (G), then the subgraph induced by S is the
graph where the set of vertices is S, and the edges are the members of E(G) such that the
corresponding endpoints are both in S. We denote with G[S] the subgraph induced by S.
A bipartite graph is a graph where the set of vertex can be partitioned into two subsets V1
and V2 such that for every edge hv1 , v2 i ∈ E(G), v1 ∈ V1 and v2 ∈ V2 .
A clique is a subset S of vertices in G, such that the subgraph induced by S is a complete
graph, namely for every two vertices in S there exists an edge connecting the two. A
biclique in a bipartite graph, also called bipartite clique, is a set of vertices B1 ⊆ V1 and
B2 ⊆ V2 such that hb1 , b2 i ∈ E for all b1 ∈ B1 and b2 ∈ B2 . In other words, if G is a bipartite
graph, a set S of vertices V (G) is a biclique if and only if the subgraph induced by S is a
complete bipartite graph. In this case we will say that the vertices of S induce a biclique
in G. A maximal clique or biclique is a set of vertices that induces a complete subgraph,
and that is not a subset of the vertices of any larger complete subgraph.
A clique cover of G is a collection of cliques C1 , . . . ,Ck , such that for each edge hu, vi ∈
E there is some Ci that contains both u and v. A minimum clique partition (MCP) of a
graph is a smallest by cardinality collection of cliques such that each vertex is a member
6
Alessandro Colantonio, Roberto Di Pietro, Alberto Ocello, Nino Vincenzo Verde
of exactly one of the cliques; it is a partition of the vertices into cliques. Similar to the
clique cover, a biclique cover of G is a collection of biclique B1 , . . . , Bk such that for each
edge hu, vi ∈ E there is some Bi that contains both u and v. We say that Bi covers hu, vi
if Bi contains both u and v. Thus, in a biclique cover, each edge of G is covered at least
by one biclique. A minimum biclique cover (MBC) is the smallest collection of bicliques
that covers the edges of a given bipartite graph, or in other words, is a biclique cover of
minimum cardinality.
4 Problem Modelling
In this section we show how to model our role engineering approach.
4.1 Definitions
The following definitions are required to formally describe the role engineering problem:
Definition 3 (System Configuration). Given an access control system, we refer to its
configuration as the tuple ϕ = hUSERS , PERMS , UP i, that is the set of all existing users,
permissions, and the corresponding relationships between them within the system.
A system configuration represents the user authorization state before migrating to
RBAC, or the authorizations derivable from the current RBAC implementation—in this
case, the user-permission relationships may be derived as
UP = {hu, pi | ∃r ∈ ROLES : u ∈ ass_users(r)
∧
p ∈ ass_perms(r)}
Definition 4 (RBAC State). An RBAC state is a tuple ψ = hROLES , UA , PA i, namely
an instance of all the sets characterizing the RBAC model.
An RBAC state is used to obtain a system configuration. Indeed, the role engineering
goal is to find the “best” state that correctly describes a given configuration. In particular,
we are interested in finding the following kind of states:
Definition 5 (Candidate Role-Set). Given an access control system configuration ϕ, a
candidate role-set is the RBAC state ψ that “covers” all possible combinations of permissions possessed by users according to ϕ, namely a set of roles such that the union of
related permissions exactly matches with the permissions possessed by the user. Formally
∀u ∈ USERS , ∃R ⊆ ROLES :
[
ass_perms(r) = {p ∈ PERMS | hu, pi ∈ UP }.
r∈R
Definition 6 (Cost Function). Let Φ,Ψ be respectively the set of all possible system
configurations and RBAC states. We refer to the cost function cost as
A Probabilistic Bound on the Basic Role Mining Problem and its Applications
❘
cost : Φ ×Ψ →
7
❘+
where + indicates positive real numbers including 0; it represents an administration cost
estimate for the state ψ used to obtain the configuration ϕ.
The administration cost concept was first introduced in [3]. Leveraging the cost metric enables to find candidate role-sets with the lowest possible effort to administer the
resulting RBAC state.
Definition 7 (Optimal Candidate Role-Set). Given a configuration ϕ, an optimal candidate role-set is the corresponding configuration ψ that simultaneously represents a candidate role-set for ϕ and minimized the cost function cost(ϕ, ψ).
The main goal related to mining roles is to find optimal candidate role-sets. In the next
section we focus on optimizing a particular cost function. Let cost indicate the number of
needed roles. The role mining objective then becomes to find a candidate role-set that has
the minimum number of roles for a given system configuration. This is exactly the Basic
Role Mining Problem. We will show that this problem is equivalent to that of finding the
chromatic number of a given graph. Using this problem equivalence, we will identify a
useful property on the concentration of the optimal candidate role-sets. This allows us
to provide a stop condition for any iterative role mining algorithm that approximates the
minimum number of roles.
4.2 The proposed model
Given the configuration ϕ = hUSERS , PERMS , UP i we can build a bipartite graph G =
hV, Ei, where the vertex set V is partitioned into the two disjoint subset USERS and
PERMS , and where E is a set of pairs hu, pi such that u ∈ USERS and p ∈ PERMS . Two
vertices u and p are connected if and only if hu, pi ∈ UP .
A biclique coverage of the graph G identifies a unique candidate role-set for the configuration ϕ [7], that is ψ = hROLES , UA , PA i . Indeed, every biclique identifies a role,
and the vertices of the biclique identify the users and the permission assigned to this role.
Let the function cost return the number of roles, that is:
cost(ϕ, ψ) = |ROLES |
(2)
In this case, minimizing the cost function is equivalent to finding a candidate role-set that
minimizes the number of roles. This corresponds to basicRMP. Let B a biclique coverage
of a graph G, we define the function cost ′ as:
cost ′ (B) = cost(ϕ, ψ)
where ψ is the state hUA , PA , ROLES i that can be deduced by the biclique coverage B
of G, and G is the bipartite graph built from the configuration ϕ that is uniquely identified
by hUSERS , PERMS , UP i. In this model, the problem of finding an optimal candidate
role-set can be equivalently expressed as finding a biclique coverage for a given bipartite
8
Alessandro Colantonio, Roberto Di Pietro, Alberto Ocello, Nino Vincenzo Verde
graph G that minimizes the number of required bicliques. This is exactly the minimum
biclique coverage (MBC) problem. In the following we first recall both the reduction of
the MBC problem to the minimum clique partition (MCP) problem [7] and the reduction
of MCP to the chromatic number problem.
From the graph G, it is possible to construct a new undirected unipartite graph G′
where the edges of G become the vertices of G′ : two vertices in G′ are connected by an
edge if and only if the endpoints of the corresponding edges of G induce a biclique in G.
Formally:
G′ = E, {he1 , e2 i | e1 , e2 induce a biclique in G}
The vertices of a (maximal) clique in G′ correspond to a set of edges of G, where the
endpoints induce a (maximal) biclique in G. The edges covered by a (maximal) biclique
of G induce a (maximal) clique in G′ . Thus, every biclique edge cover of G corresponds to
a collection of cliques of G′ such that their union contains all of the vertices of G′ . From
such a collection, a clique partition of G′ can be obtained by removing any redundantly
covered vertex from all but one of the cliques to which it belongs to. Similarly, any clique
partition of G′ corresponds to a biclique cover of G. Thus, the size of a minimum biclique
coverage of a bipartite graph G is equal to the size of a minimum clique partition of G′ .
Finding a clique partition of a graph G = hV, Ei is equivalent to finding a coloring of
its complement G = hV, (V × V ) \ Ei. This implies that the biclique cover number of a
bipartite graph G corresponds to the chromatic number of G′ [7].
5 A Concentration Result for Optimal Candidate Role-Sets
Using the model described in the previous section, we will prove that the cost of an optimal candidate role-set ψ for a given system configuration ϕ is tightly concentrated around
its expected value. We will use the concept of martingales and the Azuma-Hoeffding inequality to obtain a concentration result for the chromatic number of a graph G [14, 15].
Since finding the chromatic number is equivalent to both MCP and MBP, we can conclude
that the minimum number of roles required to cover the user-permission relationships in
a given configuration is tightly concentrated around its expected value.
Let us denote with G an undirected unipartite graph, and with χ(G) the chromatic
number of G.
Theorem 2. Given a graph G with n vertices, the following equation holds:
−2λ 2
Pr(|χ(G) − ❊[χ(G)]| ≥ λ ) ≤ 2 exp
n
(3)
Proof. We fix an arbitrary numbering of the vertices from 1 to n. Let Gi be the subgraph of
G induced by the set of vertices 1, . . . , i. Let Z0 = ❊[χ(G)] and Zi = ❊[χ(G) | G1 , . . . , Gi ].
Since adding a new vertex to the graph requires no more than one new color, the gap
between Zi and Zi−1 is at most 1. This allows us to apply the Azuma-Hoeffding inequality,
that is Equation 1 where dk = 1.
A Probabilistic Bound on the Basic Role Mining Problem and its Applications
9
Note that this result holds even without knowing ❊[χ(G)]. Informally, Theorem 2
states that the chromatic number of a graph G is sharply concentrated around its expected
value. Since finding the chromatic number of a graph is equivalent to MCP, and MCP is
equivalent to MBC, this result holds also for MBC. Translating these concepts in terms
of RBAC entities, this means that the cost of an optimal candidate role-set of any configuration ϕ with |UP | = n is sharply concentrated around its expected value according to
Equation 3, where χ(G) is equal to the minimum number of required roles. It is important
to note that n represents the number of vertices in the coloring problem but, according to
the proposed model, it is also the number of edges in MBP; that is, the user-permission
assignments of the system configuration.
Probability
1
2
2exp(-2λ /n)
0.5
0.3
0.1
0.8
0.6
0.4
0.2
0
0
50000
100000
150000
200000
250000
n
300000
350000
400000
450000
500000
1400
1000
1200
800
600
200
400
0
λ
(a) Plot of Equation 3
0.5
0.3
0.1
0
50000
100000
150000
200000
250000
300000
350000
400000
450000
500000
1400
1200
1000
800
600
400
200
0
λ
(b) Highlight of some λ values for Figure 1(a)
Fig. 1 Relationship between the parameters λ , n and the resulting probability
n
10
Alessandro Colantonio, Roberto Di Pietro, Alberto Ocello, Nino Vincenzo Verde
Figure 1(a) shows the plot of the Equation 3 for n varying between 1 and 500,000, and
λ less than 1,500. It is possible to see that for n = 500,000 it is sufficient to choose λ = 900
to assure that Pr(|χ(G) − ❊[χ(G)]| ≥ λ ) ≤ 0.1. In the same way, choosing λ = 600, then
Pr(|χ(G) − ❊[χ(G)]| ≥ λ ) is less than 0.5. Figure 1(b) shows the values for λ and n to
have the left part of the inequality in Equation 3 to hold with probability less than 0.5,
0.3, and 0.1 respectively.
√
Setting λ = n log n, Equation 3 can be expressed as:
Pr(|χ(G) − ❊[χ(G)]| ≥
p
2
n log n) ≤ 2
n
(4)
√
That is, the probability that our approach differ from the optimum more than n log n
is less than n22 . This probability becomes quickly negligible as n increases. To support
the viability of the result, note that in a large organization there are usually thousands
user-permission assignments.
5.1 Applications of the Bound
˜ [χ(G)−
˜ [χ(G)] for ❊[χ(G)] such that |❊
Assuming that we can estimate an approximation ❊
❊[χ(G)]| ≤ ε for any ε > 0, Theorem 2 can be utilized as a stop condition when striving
to find an approximation of the optimum for any role mining algorithm.
Indeed, suppose that we have a probabilistic algorithm that provides an approxima˜ [χ(G)], we can use
tion of χ(G), and suppose that its output is χ̃(G). Since we know ❊
this value to evaluate whether the output is acceptable and therefore decide to stop the
iterations procedure. Indeed, we have that:
2
˜ (χ(G))| ≥ λ + ε) ≤ 2 exp −2λ
Pr(|χ(G) − ❊
n
This is because
˜ (χ(G))| ≥ λ + ε) ≤ Pr(|χ(G) − ❊(χ(G))| ≥ λ )
Pr(|χ(G) − ❊
and, because of Theorem 2, this probability is less than or equal to 2 exp −2λ 2 /n . Thus,
˜ [χ(G)]| ≤ λ + ε holds, then we can stop the iteration, otherwise we have to
if |χ̃(G) − ❊
reiterate the algorithm until it outputs an acceptable value.
For a direct application of this result, we can consider a system configuration with
|UP | = x. If λ = y, the probability that |χ(G)− ❊[χ(G)]| ≤ y is greater than 2 exp −2y2 /x .
˜ [χ(G)] − ❊[χ(G)]| ≤ ε we can conclude that
We do not know ❊[χ(G)], but since |❊
˜
|χ(G) − ❊[χ(G)]| < y + ε with probability at least 2 exp −2y2 /x . For instance, we have
considered the real case of a large size company, with 500,000 user-permissions assign˜ [χ(G)]| <
ments. With λ = 1200, and considering ε = 100, the probability that |χ(G) − ❊
˜
λ +ε is at least 99.36%. This means that, if ❊[χ(G)] = 24, 000, with the above probability
the optimum is between 22,700 and 25,300. If a probabilistic role mining algorithm outputs a value χ̃(G) that is estimated quite from this range, then it is appropriate to reiterate
A Probabilistic Bound on the Basic Role Mining Problem and its Applications
11
the process in order to find a better result. Conversely, let us assume that the algorithm
outputs a value within the given range. We know that the identified solution differs, from
the optimum, by at most 2(λ + ε), with probability at least 99.36%. Thus, one can assess
whether it is appropriate to continue investing resources in the effort to find a better solution, or to simply accept the provided solution. This choice can depend on many factors,
such as the computational cost of the algorithm, the economic cost due to a new analysis,
and the error that we are prone to accept, to name a few.
There is also another possible application for this bound. Assume that a company is
assessing whether to renew its RBAC state, just because it is several years old [19]. By
means of the proposed bound, the company can establish whether it is the case to invest
money and resources in this process. Indeed, if the cost of the RBAC state in use is be˜ [χ(G)] − λ − ε and ❊
˜ [χ(G)] + λ + ε, the be best option would be not to renew it
tween ❊
because the possible improvement is likely to be marginal. Moreover, changing the RBAC
state requires a huge effort for the administrators, since they need to get used to the new
configuration. In our proposal it is quite easy to assess whether a renewal is appropriate.
This indication can lead to important time and money saving.
˜ [χ(G)]. Currently, not many
Note that in our hypothesis, we have assumed to know ❊
researchers have addressed this specific issue in reference to a generic graph, whereas
plenty of results have been provided for Random Graphs. In particular, it has been proven
[2, 13] that for G ∈ Gn,p :
❊[χ(G)] ∼ 2 logn 1 n
1−p
We are presently striving to apply a slight modification of the same probabilistic techniques used in this paper, to derive a similar bound for the class of graphs used in our
model.
6 Conclusions and Future Works
In this paper we proved that the optimal administration cost for RBAC, when striving to
minimize the number of roles, is sharply concentrated around its expected value. The result has been achieved by adopting a model reduction and advanced probabilistic tools.
Further, we have shown how to apply this result to deal with practical issues in administering RBAC; that is, how it can be used as a stop condition in the quest for the optimum.
This paper also highlights a few research directions. First, a challenge that we are
currently addressing is to derive an estimate of the expected optimal number of roles
(❊[χ(G)]) from a generic system configuration. Another research path is applying both
the exposed reduction and the probabilistic tools to obtain similar bounds while simultaneously minimizing more parameters.
12
Alessandro Colantonio, Roberto Di Pietro, Alberto Ocello, Nino Vincenzo Verde
References
1. American National Standards Institute (ANSI) and InterNational Committee for Information Technology Standards (INCITS): ANSI/INCITS 359-2004, Information Technology – Role Based Access
Control (2004)
2. Bollobás, B.: The chromatic number of random graphs. Combinatorica 8(1), 49–55 (1988)
3. Colantonio, A., Di Pietro, R., Ocello, A.: A cost-driven approach to role engineering. In: Proceedings
of the 23rd ACM Symposium on Applied Computing, SAC ’08, vol. 3, pp. 2129–2136. Fortaleza,
Ceará, Brazil (2008)
4. Colantonio, A., Di Pietro, R., Ocello, A.: Leveraging lattices to improve role mining. In: Proceedings
of the IFIP TC 11 23rd International Information Security Conference, SEC ’08, IFIP International
Federation for Information Processing, vol. 278, pp. 333–347. Springer (2008)
5. Coyne, E.J.: Role engineering. In: RBAC ’95: Proceedings of the first ACM Workshop on Role-based
access control, p. 4. ACM, New York, NY, USA (1996)
6. Coyne, E.J., Davis, J.M.: Role Engineering for Enterprise Security Management. Artech House
(2007)
7. Ene, A., Horne, W., Milosavljevic, N., Rao, P., Schreiber, R., Tarjan, R.E.: Fast exact and heuristic
methods for role minimization problems. In: Proceedings of the 13th ACM Symposium on Access
Control Models and Technologies, SACMAT ’08, pp. 1–10 (2008)
8. Frank, M., Basin, D., Buhmann, J.M.: A class of probabilistic models for role engineering. In:
Proceedings of the 15th ACM Conference on Computer and Communications Security, CCS ’08,(to
appear) (2008)
9. Geerts, F., Goethals, B., Mielikäinen, T.: Tiling databases. In: Discovery Science, Lecture Notes in
Computer Science, vol. 3245, pp. 278–289. Springer (2004)
10. Jajodia, S., Samarati, P., Subrahmanian, V.S.: A logical language for expressing authorizations. In:
SP ’97: Proceedings of the 1997 IEEE Symposium on Security and Privacy, p. 31. IEEE Computer
Society, Washington, DC, USA (1997)
11. Kuhlmann, M., Shohat, D., Schimpf, G.: Role mining – revealing business roles for security administration using data mining technology. In: Proceedings of the 8th ACM Symposium on Access Control
Models and Technologies, SACMAT ’03, pp. 179–186 (2003)
12. Lu, H., Vaidya, J., Atluri, V.: Optimal boolean matrix decomposition: Application to role engineering.
In: Proceedings of the 24th IEEE International Conferene on Data Engineering, ICDE ’08, pp. 297–
306 (2008)
13. Łuczak, T.: The chromatic number of random graphs. Combinatorica 11(1), 45–54 (1991)
14. McDiarmid, C.J.H.: On the method of bounded differences. In: J. Siemons (ed.) Surveys in Combinatorics: Invited Papers at the 12th British Combinatorial Conference, 141 in London Mathematical
Society Lecture Notes Series, pp. 148–188. Cambridge University Press (1989)
15. Mitzenmacher, M., Upfal, E.: Probability and Computing: Randomized Algorithms and Probabilistic
Analysis. Cambridge University Press, New York, NY, USA (2005)
16. Rymon, R.: Method and apparatus for role grouping by shared resource utilization (2003). United
States Patent Application 20030172161
17. Schlegelmilch, J., Steffens, U.: Role mining with ORCA. In: Proceedings of the 10th ACM Symposium on Access Control Models and Technologies, SACMAT ’05, pp. 168–176 (2005)
18. Vaidya, J., Atluri, V., Guo, Q.: The role mining problem: finding a minimal descriptive set of roles. In:
Proceedings of the 12th ACM Symposium on Access Control Models and Technologies, SACMAT
’07, pp. 175–184 (2007)
19. Vaidya, J., Atluri, V., Guo, Q., Adam, N.: Migrating to optimal RBAC with minimal perturbation. In:
Proceedings of the 13th ACM Symposium on Access Control Models and Technologies, SACMAT
’08, pp. 11–20 (2008)
20. Vaidya, J., Atluri, V., Warner, J.: RoleMiner: mining roles using subset enumeration. In: Proceedings
of the 13th ACM Conference on Computer and Communications Security, pp. 144–153 (2006)
21. Williams, D.: Probability with Martingales. Cambridge University Press (1991)
22. Zhang, D., Ramamohanarao, K., Ebringer, T.: Role engineering using graph optimisation. In: Proceedings of the 12th ACM Symposium on Access Control Models and Technologies, SACMAT ’07,
pp. 139–144 (2007)