WO2013176784A1

WO2013176784A1 - Optimal strategies in security games

Info

Publication number: WO2013176784A1
Application number: PCT/US2013/034854
Authority: WO
Inventors: Milind Tambe; Fernando ORDÓÑEZ; Rong Yang; Zhengyu YIN; Matthew Brown; Bo AN; Christopher KIEKINTVELD
Original assignee: University of Southern California USC
Current assignee: University of Southern California USC
Priority date: 2012-05-24
Filing date: 2013-04-01
Publication date: 2013-11-28
Anticipated expiration: 2014-11-24

Description

OPTIMAL STRATEGIES IN SECURITY GAMES

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application clai ms priority to U.S. Provisional Patent

Application 61 /651 ,799, entitled "Computing Opti mal Strategy Against Quantal Response in Security Games," filed May 25, 201 2, attorney docket no. 028080-0752, the entire content of which, including Exhibits 1 -3, is incorporated herein by reference. This application is also a Continuation In Part of U.S. Continuation Patent Application No. 1 3/479,884, entitled "Agent Security Via Approxi mate Solvers," filed May 24, 201 2, attorney docket no. 028080-0751 , which is a continuation application of U.S. Patent Application 1 2/251 ,766, entitled "Agent Security Via Approxi mate Solvers," filed October 1 5, 2008, attorney docket no. 028080-0399 ; and a continuation application of [U.S. Patent Application 1 2/253,695, entitled "Decomposed Opti mal Bayesian Stackelberg Solver," filed October 1 7, 2008, attorney docket no. 028080- 0367. Both applications clai m priority to U.S. Provisional Patent Application 60/980, 1 28, entitled "ASAP (Agent Security Via Approxi mate Policies) Algorithm in an Approxi mate Solver for Bayesian-Stackelberg Games," filed October 1 5, 2007, attorney docket no . 028080-0299, and U.S. Provisional Patent Application 60/980,739, entitled "DOBSS (Decomposed Opti mal Bayesian Stackelberg Solver) is an Opti mal Algorithm for Solving

Stackelberg Games" filed October 1 7, 2007, attorney docket no. 028080- 0300. The entire contents of all of these applications are incorporated herein by reference.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH

[0002] This invention was made with government support under Grant Nos. W91 1 NF-1 0-1 -01 85 and ICM/FIC P1 0-024-F, awarded by the Army Research Office. The invention was also made with government support from the Department of Homeland Security (DHS), under Grant No. 201 0-ST-061 - RE0001 , awarded through the Center for Risk and Economic Analysis of Terrorism Events (CREATE) and with DHS support through the National Center for Border Security and I mmigration (NCBSI). The government has certain rights in the invention.

BACKGROUND

[0003] Game theory is an increasingly i mportant paradigm for modeling security domains which feature complex resource allocation. Security games, a special class of attacker-defender Stackelberg games, are at the heart of several major deployed decision-support applications.

[0004] In these applications, the defender is typically trying to maxi mize a single objective. However, there are domains where the defender has to consider multiple objectives si multaneously. For example, the Los Angeles Sheriff's Department (LASD) has stated that it needs to protect the city's metro system from ticketless travelers, common cri minals, and terrorists. From the perspective of LASD, each one of these attacker types provides a unique threat (lost revenue, property theft, and loss of life). Given this diverse set of threats, selecting a security strategy is a significant challenge as no single strategy can mini mize the threat for all attacker types. Thus, tradeoffs must be made and protecting more against one threat may increase the vulnerability to another threat. However, it is not clear how LASD should weigh these threats when determining the security strategy to use. One could attempt to establish methods for converting the different threats into a single metric. However, this process can become convoluted when attempting to compare abstract notions such as safety and security with concrete concepts such as ticket revenue.

[0005] Bayesian security games have been used to model domains where the defender is facing multiple attacker types. The threats posed by the different attacker types are weighted according to the relative likelihood of encountering that attacker type. There are three potential factors li miting the use of Bayesian security games: (1 ) the defender may not have information on the probability distribution over attacker types, (2) it may be i mpossible or undesirable to directly compare and combine the defender rewards of different security games, and (3) only one solution is given, hiding the tradeoffs between the objectives from the end user. [0006] The recent real-world applications of attacker-defender Stackelberg security games, e.g., ARMOR, I RIS and G UARDS, provide software assistants that help security agencies optimize allocations of their li mited security resources. These applications require efficient algorithms that derive mixed (randomized) strategies for the defender (security agencies), taking into account an attacker's surveillance and best response. The algorithms underlying these applications or most others in the literature have assu med perfect rationality of the human attacker, who strictly maxi mizes his expected utility. While this is a standard game-theoretic assumption and appropriate as an approxi mation in first generation applications, it is a well- accepted li mitation of classical game theory. Indeed, algorith mic solutions based on this assumption may not be robust to the boundedly rational decision making of a human adversary (leading to reduced expected defender reward), and may also be li mited in exploiting human biases.

[0007] Due to their significance in real-world security, there has been a lot of recent research activity in leader-follower Stackelberg games, oriented towards producing deployed solutions: ARMOR at LAX, I RIS for Federal Air Marshals Service, and GUARDS for the TSA. Bayesian extension to

Stackelberg game has been used to model the uncertainty over players' preferences by allowing multiple discrete follower types, as well as, by use of sampling-based algorithms, continuous payoff uncertainty.

[0008] Scalability of discrete follower types is essential in domains such as road network security, where each follower type could represent a cri minal attempting to follow a certain path. Scaling up the number of types is also necessary for the sampling-based algorithms to obtain high quality solutions under continuous uncertainty. Unfortunately, such scale-up remains difficult, as finding the equilibrium of a Bayesian Stackelberg game is N P-hard.

Indeed, despite the recent algorithmic advancement including Multiple-LPs, DOBSS, HBGS, none of these techniques can handle games with more than ~ 50 types, even when the number of actions per player is as few as 5:

inadequate both for scale-up in discrete follower types and for sampling- based approaches. This scale-up difficulty has led to an entirely new set of algorithms developed for handling continuous payoff uncertainty, and continuous observation and execution error; these algorithms do not handle discrete follower types, however.

SUMMARY

[0009] Illustrative embodi ments are now discussed and illustrated. Other embodi ments may be used in addition or instead. Details which may be apparent or unnecessary may be omitted to save space or for a more effective presentation. Conversely, some embodi ments may be practiced without all of the details which are disclosed.

[0010] The present disclosure provides different solution methodologies for addressing the issues of protecting and/or patrolling security domains, e.g., identified infrastructures or resources, with limited resources. The solution methodologies can provide opti mal solutions to attacker-defender

Stackelberg security games that are modeled on a real-world application of interest. These opti mal solutions can be used for directing patrolling strategies and/or resource allocation for particular security domains.

[001 1 ] One aspect of the present disclosure provides for computing opti mal strategy against quantal response in security games. Two algorithms are presented, which address the difficulties in computing opti mal defender strategies in real-world security games: GOSAQ can compute the globally opti mal defender strategy against a QR model of attackers when there are no resource constraints and gives an efficient heuristic otherwise; PASAQ in turn provides an efficient approxi mation of the opti mal defender strategy with or without resource constraints. These two algorithms, presented in Exhibit 1 , are based upon three key ideas: i) use of a binary search method to solve the fractional opti mization problem efficiently, ii) construction of a convex opti mization problem through a non-linear transformation, iii) building a piecewise linear approxi mation of the non-linear terms in the problem.

Additional contributions of Exhibit 1 include proofs of approxi mation bounds, detailed experi mental results showing the advantages of GOSAQ and PASAQ in solution quality over the benchmark algorithm (BRQR) and the efficiency of PASAQ. Given these results, PASAQ is at the heart of the PROTECT system, which is deployed for the US Coast Guard in the Port of Boston, and is now headed to other ports.

[0012] A further aspect is directed to a unified method for handling discrete and continuous uncertainty in Bayesian Stackelberg games, which scales up Bayesian Stackelberg games, providing a novel unified approach to handling uncertainty not only over discrete follower types but also other key continuously distributed real world uncertainty, due to the leader's execution error, the follower's observation error, and continuous payoff uncertainty. To that end, the aspect provide new algorithms. An algorithm for Bayesian Stackelberg games, called H UNTER, is presented to scale up the number of types. HUNTER combines the following five key features: i) efficient pruning via a best-first search of the leader's strategy space; ii) a novel linear program for computing tight upper bounds for this search ; iii) using Bender's decomposition for solving the upper bound linear program efficiently; iv) efficient inheritance of Bender's cuts from parent to child; and v) an efficient heuristic branching rule. Experi ments show that HUNTER provides order of magnitude speedups over the best existing methods to handle discrete follower types. In the second part of Exhibit 2, it is shown how H UNTER'S efficiency for Bayesian Stackelberg games can be exploited to also handle the continuous uncertainty using sample average

approxi mation. The H UNTER-based approach also outperforms latest robust solution methods under continuously distributed uncertainty.

[0013] A further aspect provides a multi-objective opti mization for security games, which provides a solution to the challenges of different security domains. The aspect includes treatment of multi-objective security games (MOSG), which combines security games and multi-objective opti mization. Instead of a single opti mal solution, MOSGs have a set of Pareto opti mal (non-dominated) solutions referred to as the Pareto frontier. The Pareto frontier can be generated by solving a sequence of constrained single- objective opti mization problems (CSOP), where one objective is selected to be maxi mized while lower bounds are specified for the other objectives.

Features include: i) an algorithm, Iterative e -Constraints, for generating the sequence of CSOPs; ii) an exact approach for solving an MILP formulation of a CSOP (which also applies to multi-objective opti mization in more general Stackelberg games) ; iii) heuristics that achieve speedup by exploiting the structure of security games to further constrain a CSOP; iv) an approxi mate approach for solving an algorithmic formulation of a CSOP, increasing the scalability of the approach described in Exhibit 3 with quality guarantees. Additional contributions of Exhibit 3 include proofs on the level of

approxi mation and detailed experi mental evaluation of the proposed

approaches.

[0014] These, as well as other components, steps, features, benefits, and advantages, will now become clear from a review of the following detailed description of illustrative embodi ments, the accompanying drawings, and the clai ms.

BRIEF DESCRIPTION OF THE DRAWINGS

[0015] The drawings are of illustrative embodi ments. They do not illustrate all embodi ments. Other embodi ments may be used in addition or instead. Details that may be apparent or unnecessary may be omitted to save space or for more effective illustration. Some embodi ments may be practiced with additional components or steps and/or without all of the components or steps that are illustrated. When the same numeral appears in different drawings, it refers to the same or like components or steps.

[0016] FIG. 1 depicts a table with notations used for an exemplary quantal response embodi ment according to the present disclosure.

[0017] FIG. 2 depicts an algorithm used for an exemplary quantal response embodi ment of the present disclosure.

[0018] FIG. 3, includes FIGS. 3(a) and 3(b), which display two examples of approxi mations of nonlinear objective functions over partitioned domains and after variable substitution.

[0019] FIG. 4 includes FIGS. 4(a)-4(f), which depict a solution quality and runti me comparison, without assignment constraints for examples of the three algorithms, GOSAQ, PASAQ, and BRQR. [0020] FIG. 5 includes FIGS. 5(a)-5(f), which depict a solution quality and runti me comparison, with assign ment constraints for examples of the three algorithms, GOSAQ, PASAQ, and BRQR.

[0021] FIG. 6 depicts an example search tree of solving Bayesian games.

[0022] FIG. 7 is a diagram showing step of creating internal search nodes according to an example of the H UNTER algorithm.

[0023] FIG. 8 is a diagram depicting an example of internal nodes according to an example of the H UNTER algorithm.

[0024] FIG. 9 depicts an example of the H UNTER algorithm.

[0025] FIG. 1 0 depicts an example of the convex hull of H, clconvH.

[0026] FIG. 1 1 depicts an experi mental analysis of HUNTER in runti me comparison with the HBGS and DOBSS algorithms.

[0027] FIG. 1 2 is an example of a Pareto frontier plotted for a Bi-Objective MOSG.

[0028] FIG. 1 3 depicts an example of the lterative-£-Constraints algorithm.

[0029] FIG. 14 depicts an example of an algorithm for recreating a sequence of CSOP problems generated by the lterative-£-Constraints algorithm that ensures b < v throughout.

[0030] FIG. 1 5 depicts a table with notations used for an exemplary MOSG algorithm according to the present disclosure.

[0031] FIG. 1 6 depicts an example of the ORIGAMI-M algorithm.

[0032] FIG. 1 7 depicts an example of the MI NI-COV algorithm.

[0033] FIG. 1 8 depicts an example of the ORIGAMI-A algorithm.

[0034] FIG. 1 9 depicts an example of scaling up targets, according to the present disclosure.

[0035] FIG. 20 depicts a further example of scaling up targets, according to the present disclosure.

[0036] FIG. 21 depicts an example of scaling up objectives, according to the present disclosure.

[0037] FIG. 22 depicts an example of scaling down epsilon, according to the present disclosure. [0038] FIG. 23 shows results using ORIGAMI-A under specified conditions.

[0039] FIG. 24 is shows epsilon solution quality for MILP-PM and ORIGAMI-A.

[0040] FIG. 25 depicts a comparison of maxi mum objective loss for different epsilon values against uniformly weighted Bayesian security games.

DETAI LED DESCRIPTION

[0041 ] Illustrative embodi ments are now discussed and illustrated. Other embodi ments may be used in addition or instead. Details which may be apparent or unnecessary may be omitted to save space or for a more effective presentation. Conversely, some embodi ments may be practiced without all of the details which are disclosed.

[0042] The present disclosure provides different solution methodologies for addressing the issues of protecting and/or patrolling security domains, e.g., identified infrastructures or resources, with limited resources. The solution methodologies can provide opti mal solutions to attacker-defender

Stackelberg security games that are modeled on a real-world application of interest. These opti mal solutions can be used for directing patrolling strategies and/or resource allocation for particular security domains. Three aspects of the present disclosure are described in the following Sections 1 -3. Formula presented in the sections are numbered separately for each section while tables and figures are numbered together.

[0043] Section 1 - Computing Opti mal Strategy Against Quantal Response in Security Games: GOSAQ and PASAQ algorithms

[0044] As was noted above, an aspect of the present disclosure in directed to computing an opti mal strategy of one or more defenders against one or more attackers having a quantal response in security games.

[0045] To step beyond the first-generation deployments of attacker- defender security games, it may be desirable to relax the assumption of perfect rationality of the hu man adversary. Indeed, this assumption is a well- accepted li mitation of classical game theory and modeling human adversaries' bounded rationality is desirable. To this end, quantal response (QR) has provided very promising results to model human bounded

rationality. However, in computing opti mal defender strategies in real-world security games against a QR model of attackers, difficulties have been recognized: (1 ) solving a nonlinear non-convex opti mization problem efficiently for massive real-world security games; and (2) addressing constraints on assigning security resources, which adds to the complexity of computing the opti mal defender strategy.

[0046] An aspect of the present disclosure provides two new algorithms to address these difficulties: The global optimal strategy against quantal response (GOSAQ) algorith m can compute the globally opti mal defender strategy against a QR model of attackers when there are no resource constraints and gives an efficient heuristic otherwise; the piecewise linear approxi mation of opti mal strategy against quantal response (PASAQ) algorithm, in turn provides an efficient approxi mation of the opti mal defender strategy with or without resource constraints. These two novel algorithms are based on three key ideas: (i) use of a binary search method to solve the fractional opti mization problem efficiently, (ii) construction of a convex opti mization problem through a non-linear transformation, (iii) building a piecewise linear approxi mation of the non-linear terms in the problem.

Additional contributions of the disclosure include proofs of approxi mation bounds, detailed experi mental results showing the advantages of GOSAQ and PASAQ in solution quality over the benchmark algorithm (BRQR) and the efficiency of PASAQ.

[0047] QR assumes errors in human decision making and suggests that instead of strictly maxi mizing utility, individuals respond stochastically in games: the chance of selecting a non-opti mal strategy increases as the associated cost decreases. The QR model has received widespread support in the literature in terms of its superior ability to model hu man behavior in games, including in recent multi-agent systems literature. An even more relevant study in the context of security games showed that defender security allocations assuming a quantal response model of adversary behavior outperformed several competing models in experi ments with human subjects. QR is among the best-performing current models and one that allows tuning of the 'adversary rationality level' as explained later. Hence this model is one that can be practically used by security agencies desiring to not be locked into adversary models of perfect rationality.

[0048] Unfortunately, in computing opti mal defender strategies in security games assuming an adversary with quantal response (QR-adversary), facing two major difficulties: (1 ) solving a nonlinear non-convex opti mization problem efficiently for massive real-world security games; and (2) addressing resource assignment constraints in security games, which adds to the complexity of computing the opti mal defender strategy. Yet, scaling-up to massive security problems and handling constraints on resource

assignments are essential to address real-world problems such as computing strategies for Federal Air Marshals Service (FAMS) and the US Coast Guard (USCG).

[0049] The algorith m BRQR has been used to solve a Stackelberg security game with a QR-adversary. BRQR however is not guaranteed to converge to the opti mal solution, as it used a nonlinear solver with multi-starts to obtain an efficient solution to a non-convex opti mization problem. Furthermore, such use of BRQR did not consider resource assignment constraints that are included in this paper. Nevertheless, GOSAQ and PASAQ are compared herein to the performance of BRQR, since it is the bench mark algorithm.

Another existing algorithm that efficiently computes the Quantal Response Equilibrium only applies to cases where all the players have the same level of errors in their quantal response, a condition not satisfied in security games. In particular, in security games, the defender's strategy is based on a computer-aided decision-making tool, and therefore it is a best response. Adversaries, on the other hand, are human beings who may have biases and preferences in their decision making, so they are modeled with a quantal response. Therefore, new algorithms have been developed, as presented herein, to compute the opti mal defender strategy when facing a QR- adversary in real-world security problems.

[0050] In the present disclosure, the following five contributions are provided. First, an algorithm called GOSAQ is provided to compute the defender opti mal strategy against a QR-adversary. GOSAQ uses a binary search method to iteratively esti mate the global opti mal solution rather than searching for it directly, which would require solving a nonlinear and non- convex fractional problem. It also uses a nonlinear variable transformation to convert the problem into a convex problem. GOSAQ leads to a ε-opti mal solution, where ε can be arbitrarily small. Second, another algorithm called PASAQ is provided to approxi mate the optimal defender strategy. PASAQ is also based on binary search. It then converts the problem into a Mixed- Integer Linear Programming problem by using a piecewise linear

approxi mation. PASAQ leads to an efficient approxi mation of the global opti mal defender strategy and provides an arbitrarily near-opti mal solution with a sufficiently accurate linear approxi mation. Third, GOSAQ and PASAQ both show that they cannot only solve problems without resource assignment constraints, such as for the LAX police, but also problems with resource assignment constraints, such as problems for FAMS and USCG. Fourth, the correctness/approxi mation-bound proof of GOSAQ and PASAQ is provided. Fifth, detailed experi mental analysis is provided on the solution quality and computational efficiency of GOSAQ and PASAQ, illustrating that both GOSAQ and PASAQ achieve better solution quality and runti me scalability than the previous bench mark algorithm BRQR. Indeed, PASAQ can potentially be applied to most of the real-world deployments of the Stackelberg Security Game, including ARMOR and I RIS, that are based on a perfect rationality model of the adversary. This may i mprove the performances of such systems when dealing with human adversaries.

[0051 ] For a statement of the problem, consider a Stackelberg Security Game (SSG) with a single leader and at least one follower, where the defender plays the role of the leader and the adversary plays the role of the follower. The defender and attacker may represent organizations and need not be single individuals. The following notation to describe a SSG is used, also listed in Table 1 shown in FIG. 1 . For this, the defender has a total of M resources to protect a set of targets = {l,...,|i^r'|}. The outcomes of the SSG depend only on whether or not the attack is successful. So given a target /^', the defender receives reward R . if the adversary attacks a target that is covered by the defender; otherwise, the defender receives penalty P . .

Correspondingly, the attacker receives penalty P" in the former case; and reward R " in the latter case. Note that a key property of SSG is that while the games may be non-zero-sum, R > and R " , V/ [9]. In other words, adding resources to cover a target helps the defender and hurts the attacker.

[0052] The j^th individual defender strategy can be denoted as A_j, which is an assignment of all the security resources. Generally, A_j could be

represented as a column vector A_j =( ₇)^Γ , where Ay indicates whether or not target /^' is covered by assignment j. Let A = {A_j} be the set of feasible assignments of resources and let ay be the probability of selecting strategy j. Given this probability of selecting defender strategies, the likelihood protecting any specific target /^' can be computed as the marginal

xi =∑A₇ej_¾ ^fli^Ai - ^{Th e} marginals x, clearly su m to M, the total number of resources. Previous work has shown that defender strategies in SSGs can be represented in terms of these marginals, leading to more concise equivalent representations. In particular, the defender's expected utility if the adversary attacks target /^' can be written as:

[0053] U? (x_i) = x_iR? + (l - x_i)P?

[0054] and the adversary's expected utility on attacking target /^' is

[0055] u (x_i) = x_iP; + (l - x_i)R

[0056] These marginal coverage vectors can be converted to a mixed strategy over actual defender strategies when there are no resource constraints, such as in ARMOR.

[0057] In the presence of constraints on assignments of resources, marginals may result which cannot be converted to probabilities over individual strategies. However, as is show below, this difficulty can be addressed if a complete description of defender strategies is set A. In this case enforcing that the marginals are obtained from a convex combination of these feasible defender strategies can be added.

[0058] In SSGs, the goal is to compute a mixed strategy for the leader to commit to based on her knowledge of the adversary's response. More specifically, given that the defender has li mited resources (e.g., she may need to protect 8 targets with 3 guards), she must design her strategy to opti mize against the adversary's response to maxi mize effectiveness.

[0059] Opti mal Strategy against Quantal Response

[0060] In this section of the present disclosure, assuming a QR-adversary, i.e. with a quantal response {q /^' e T) to the defender's mixed strategy x = (x/, /^'€ T). The value q, is the probability that adversary attacks target /^', computed as

[0062] where λ > 0 is the parameter of the quantal response mode, which represents the error level in adversary's quantal response. Si multaneously, the defender maxi mizes her utility (given her computer-aided decision making tool) :

[0063] U^d (x) =∑q_i(x)U? (x_i)

[0064] Therefore, in domains without constraints on assigning the resources, the problem of computing the opti mal defender strategy against a QR-adversary can be written in terms of marginals as:

[0066] Problem P1 has a polyhedral feasible region and is a non-convex fractional objective function.

[0067] Resource Assignment Constraint [0068] In many real world security problems, there are constraints on assigning the resources. For example, in the FAMS problem [7], an air marshal is scheduled to protect 2 flights (targets) out of M total flights. The total number of possible schedule is However, not all of the schedules

are feasible, since the flights scheduled for an air marshal have to be connected, e.g. an air marshal cannot be on a flight from A to B and then on a flight C to D. A resource assignment constraint i mplies that the feasible assignment set A is restricted; not all combinatorial assignment of resources to targets are allowed. Hence, the marginals on targets, x, are also restricted.

[0069] Definition 1 . Consider a marginal coverage x to be feasible if and only if there exists a ≥ 0, A_j e A such that∑_{A &Jl}a_j = 1 and for all i e T, x, =

[0071 ] In fact, (aj) is the mixed strategy over all the feasible assignments of the resources. In order to compute the defender's opti mal strategies against a QR-adversary in the presence of resource-assignment constraints, solving P2 is needed. The constraints in P1 are modified to enforce feasibility of the marginal coverage.

s.t∑x_t < M

[0072] P2 : x, = V a , A.. Vz^'e

0≤a_j≤ 1, VA_j e Λ

[0073] BI NARY SEARCH METHOD

[0074] Solve P1 and P2 is needed to compute the opti mal defender strategy, which requires opti mally solving a non-convex problem which is in general an N P-hard problem [1 6]. In this section, the basic structure of using a binary search method to solve the two problems is described. However, further efforts are required to convert this skeleton into actual efficiently runnable algorithms. The additional details in the next two sections will be filled in.

[0075] For notational simplicity, the symbols Mi^ are defined in Table 2. The numerator and denominator of the objective function in P1 and P2 by N(x) and D(x) are denoted:

[0076 Table 2: Symbols for Targets in SSG

[0077] A key idea of the binary search method is to iteratively estimate the global optimal value (p*) of the fractional objective function of P 1 , instead of searching for it directly. Let A> be the feasible region of P1 (or P2). Given a real value r, it can be known whether r<p* by checking,

[0078] 3xG _f,s. rD(x)-N(x)≤0 (2)

[0079] Justification is now given for correctness of the binary search method to solve any generic fractional programming problem

max^ _/ N(x)l D(x) for any functions N(x) and D(x) > 0.

[0080] Lemma 1. For any real value r e R, one of the following two conditions holds.

[0081] (a) r≤p*^3xG _f,s.t.,rD(x)-N(x)≤0

[0082] (b) r≤p*^VxG _f, rD(x) - N(x) > 0

[0083] PROOF, (a) as (b) is proven similarly. '<=': since 3x such that rD(x) < N(x) this means that r < ^N^≤ /?*;'=>': Since Ρ1 optimizes a

D(x)

continuous objective over a closed convex set, then there exists any optimal

N(x*)

solution x*such that p* =— -— -<r which rearranging gives the result.

D(x*) [0084] As shown in FIG.2, Algorithm 1 , describes the basic structure of the binary search method. Given the payoff matrix (P_M) and the total number of security resources {numRes), Algorithm 1 first initializes the upper bound {Uo) and lower bound (L₀) of the defender expected utility on Line 2. Then in each iteration, r is set to be the mean of U and L. Line 6 checks whether the current r satisfies Equation (2). If so, p*> r, the lower-bound of the binary search needs to be increased; in this case, it also returns a valid strategy V. Otherwise, p* < r, the upper-bound of the binary search should be

decreased. The search continues until the upper-bound and lower-bound are sufficiently close, i.e., U-L<£. The number of iterations in Algorithm 1 is

( (U -L ]

bounded by O log— -— - . Specifically for SSGs the upper and lower

I ε ))

bounds can be estimated as follows:

[0085] Lower bound: Let s_u be any feasible defender strategy. The defender utility based on using s_u against a adversary's quantal response is a lower bound of the optimal solution of P1. A simple example of s_u is the uniform strategy.

[0086] Upper bound: Since P ≤Uf ≤Rf then the following can be stated: Uf ≤ max ._e Rf . The defender's utility is computed as ∑_ίεΤ- _ίΥ = U , where Uf is the defender utility on target /^'and q_h is the probability that the adversary attacks target /^'. Thus, the maximum Rf serves as an upper bound of Uf .

[0087] Turning now to feasibility checking, which is performed in Step 6 in Algorithm 1. Given a real number re R ,in order to check whether Equation (2) is satisfied, introduction is made to CF-OPT.

[0088] CF-OPT: min rD(x)-N(x)

[0089] Let δ* be the optimal objective function of the above optimization problem. If δ* < 0, Equation (2) must be true. Therefore, by solving the new optimization problem and checking if δ* < 0, an answer can be made if a given ris larger or smaller than the global maximum. However, the objective function in CF-OPT is still non-convex, therefore, solving it directly is still a hard problem. Two methods to address this in the next two sections will be introduced.

[0090] GOSAQ: ALGORITHM 1 + VARIABLE SUBSTITUTION

[0091] Global Opti mal Strategy Against Quantal response (GOSAQ) is now presented, which adapts Algorithm 1 to efficiently solve problems P1 and P2. It does so through the following nonlinear invertible change of variables:

[0092] y_t = e* ₎V/e 'T (3)

[0093] GOSAQ with No Assignment Constraint

[0094] Focusing first on applying GOSAQ to solve P1 for problems with no resource assignment constraints. Here, GOSAQ uses Algorith m 1 , but with a rewritten CF-OPT as follows given the above variable substitution : lR{y_t )

[0097] ^p,≤y_l < \, Vi (5)

[0098] Let's refer to the above opti mization problem as GOSAQ-CP.

[0099] Lemma 2. Let Obj_CF (x) andobj_CF (y) be the objective function of CF- OPT and GOSAQ-CP respectively; X_f and Y_f denote the feasible domain of CF-OPT and GOSAQ-CP respectively:

[00100] m Obj_CF (x) = mmb j_CF (y)

χ≡Λ/ y≡Jf

[00101] The proof, omitted for brevity, follows from the variable substitution in equation 6. Lemma 2 indicates that solving GOSAQ-CP is equivalent to solving CF-OPT. The following shows that GOSAQ-CP is actually a convex opti mization problem.

[00102] Lemma 3. GOSAQ-CP is a convex optimization problem with a unique optimal solution.

[00103] PROOF. Showing that both the objective function and the nonlinear constraint function (4) in GOSAQ-CP are strictly convex by taking second derivatives and showing that the Hessian matrices are positive definite. The fact that the objective is strictly convex implies that it can have only one optimal solution.

[00104] In theory, convex optimization problems like the one above, can be solved in polynomial time through the ellipsoid method or interior point method with the volumetric barrier function (in practice there are a number of nonlinear solvers capable of finding the only Karush-Kuhn-Tucker (KKT) point SAQ entails running Algorithm 1 , performing Step

6 with and each time solving GOS-CP which is

polynomial solvable. Therefore, GOSAQ is a polynomial time algorithm.

[00105] The bound of GOSAQ'S solution quality is now shown.

[00106] Lemma 4. Let L^* and U^* be the lower and upper bounds of GOSAQ when the algorithm stops, and x^* is the defender strategy returned by GOSAQ. Then,

[00107] L*≤Obj_Pl(x*)≤U* where Obj_Pl(x) denotes the objective function of P1.

[00108] PROOF. Given r, Let δ^*(ή be the minimum value of the objective function in GOSAQ-CP. When GOSAQ stops, 5^*{L^*) < 0, because from Lines 6-8 of Algorithm 1 , updating the lower bound requires it. Hence, from

Lemma 2, L * D(x*) - N(x*) < 0 ^ <— -— - . Similarly,

D(x*) * (t/ * ) < 0 ^ t/* > ^¾

D(x*)

[00109] Theorem 1. Let x^* be the defender strategy computed by GOSAQ,

[00110] 0 < p * -Obj_Pl (x*)≤£ (7)

[00111] PROOF, p^* is the global maximum of P1, so p*-Obj_Pl(x*). Let L^* and I/^* be the lower and upper bound when GOSAQ stops. Based on Lemma 4, L*≤Obj_Pl(x*)≤U*. Simultaneously, Algorithm 1 indicates that

L*< p*(x*)≤U*.

[00112] Therefore, 0< p*-Obj_Pl(x*)≤U*-L*≤£. [00113] Theorem 1 indicates that the solution obtained by GOSAQ is an ε- optimal solution.

[00114] GOSAQ with Assignment Constraints

[00115] In order to address the assignment constraints, P2 needs to be solved. Note that the objective function of P2 is the same as that of P1. The difference lies in the extra constraints which enforce the marginal coverage to be feasible. Therefore Algorithm 1 is used once again with variable substitution given in Equation 3, but modify GOSAQ-CP as follows (which is referred as GOSAQ-CP-C) to incorporate the extra constraints:

s.t. Constraint ( 4 ),( 5 )

[00118] ∑α. =1 (9)

[00119] 0<α,.<1, ^.eJ? (10)

[00120] Equation (8) is a nonlinear equality constraint that makes this optimization problem non-convex. There are no known polynomial time algorithms for generic non-convex optimization problems, which can have multiple local minima. Attempts can be made to solve such non-convex problems by using one of the efficient nonlinear solvers, but a Karush-Kuhn- Tucker (KKT) point would be obtained which can be only locally optimal. There are a few research grade global solvers for non-convex programs, however they are limited to solving specific problems or small instances. Therefore, in the presence of assignment constraints, GOSAQ is no longer guaranteed to return the optimal solution as might be left with locally optimal solutions when solving the subproblems GOSAQ-CP-C.

[00121] PASAQ: ALGORITHM 1+ LINEAR APPROXIMATION

[00122] Since GOSAQ may be unable to provide a quality bound in the presence of assignment constraints (and as shown later, may turn out to be inefficient in such cases), the Piecewise linear Approximation of optimal Strategy Against Quantal response (PASAQ) is proposed. PASAQ is an algorithm to compute the approxi mate optimal defender strategy. PASAQ has the same structure as Algorithm 1 . The key idea in PASAQ is to use a piecewise linear function to approxi mate the nonlinear objective function in CF-OPT, and thus convert it into a Mixed- Integer Linear Programming (MILP) problem. Such a problem can easily include assignment constraints giving an approxi mate solution for a SSG against a QR-adversary with assignment constraints.

[00123] In order to demonstrate the piecewise approxi mation in PASAQ, the nonlinear objective function of CF-OPT is rewritten as:

[00124] ∑<¾ (r - J? ) *^"* +∑te ^"

r

[00125] The goal is to approxi mate the two nonlinear function f.^(l)(x.) = e^~fliXi and ^<x< as two piecewise linear functions in the range x_t e [0, 1 ], for e

Uniformly divide the range [0..1 ] first into K pieces

(segments). Si multaneously, introduce a set of new variables {x_ik,k = ..K}to k -\ k

represent the portion of x, in each of the K pieces ,k = l..K

K K

Therefore, ¾ e In order to ensure that {x_ik} is a

valid partition of x,, all x* must satisfy: x* > 0 only if x_ik. =— ,X/k' ^' < k. In other

K

words, Xik can be non-zero only when all the previous pieces are completely filled. FIG. 3, which includes Figures 3(a) and 3(b), displays two examples of such a partition.

[00126] Thus, the two nonlinear functions can be represented as piecewise linear functions using {x_ik}. Let 0.X be the K + 1 cut-

points of the linear segments of function /:⁽¹⁾ (x.), and {y_ik ,k = l..x} be the slopes of each of the linear segments. Starting from /:⁽¹⁾(0), the piecewise linear approxi mation of f- ^l) (x_t), denoted as L ^) : [00127] ^⁾(x_i) = f_i ^(l)(0) +∑_7ikx_ik =l +∑y_ikx

k=l k=l

[00128] Similarly, the piecewise linear approximation of / ²⁾ ( ,■),

obtained denoted as

[00129] I (x, ) = //²> (0) +∑ μ_αχ_α =\ +∑ μ_ίΛ

k=l k=l

[00130] where, {j_ik,k = I..K} is the slope of each linear segment.

[00131] PASAQ with No Assignment Constraint

[00132] In domains without assignment constraints, PASAQ consists of Al orithm 1, but with CF-OPT rewritten as follows:

[00134] s.t. ∑ ∑¾< (11)

iei^" k=l

[00135] 0<x_ft <— , Vz, : = 1... T (12)

K

[00136] Vi, k = \...K-\ (13)

[00137] x_lk(k+l)≤z_ik, Vi, ^ = 1...^-1 (14) [00138] ¾e {0,l}, Vz, £ = 1..J -1 (15)

[00139] Let's refer to the above MILP formulation as PASAQ-MILP. [00140] Lemma 5. The feasible region for x = (x_t =∑^₌₁½^^'€ ^of PASAQ- MILP is equivalent to that of P1. [00141] JUSTIFICATION. The auxiliary integer variable z_ik indicates whether or not x_ik <—. Equation (13) enforces that z_lk = 0 only when x_ik <—.

K K

[00142] Simultaneously, Equation (14) enforces that x_i(k+l) is positive only if z_ik = 1■ Hence, {. x_ik,K = \..K) is a valid partition of x, and x_i =∑_i=1½ ^{ancl tnat} x. e [0,1]. Thus, the feasible region of PASAQ-MILP is equivalent to P1.

[00143] Lemma 5 shows that the solution provided by PASAQ is in the feasible region of P1. However, PASAQ approximates the minimum value of CF-OPT by using PASAQ-MILP, and furthermore solves P1 approximately using binary search. Hence, an error bound needs to be shown on the solution quality of PASAQ .

[00144] Lemma 6, 7 and 8 is first shown on the way to build the proof for the error bound. Due to space constraints, many proofs are abbreviated; full proofs have been derived and made available in an on-line appendix.

Further, two constants are defined which are decided by the game payoffs:

C_y The notation used is defined in Table 3. The following, illustrates obtaining a bound on the difference between p*(the global optimal obtained from P1) and

where ( *) is the strategy obtained from PASAQ. However, along the way, a bound has to be obtained for the dif responding piecewise linear approxi

[00145] Lemma 6. Let

and

D{\) =∑ ll iX_j) > 0 be the piecewise linear approximation of N(x) and D(x) res ectivel . Then, Vx e X _/

[00148] Lemma 7. The difference between the objective function of P1, Obj (x) , and its corresponding piecewise linear approximation

, is

00149] PROOF.

N(x) N(x) N(x) N(x)

[00151] ^■ +

D(x) D(x) D(x) D(x) [00152] <

[00153] Ba

1

[00154] bj_pl(x)-Obj_pl(x)≤C

K

[00155] Lemma 8. Let L* and L* be final lower bound of PASAQ and

GOSAQ,

[00156] Ι*-Ι*<ς→ ε

[00157] Lemma 9. Let L* and U* be the final lower and upper bounds of PASAQ, and x * is the defender strategy returned by PASAQ. Then,

[00158] L*≤ Obj_Pl (3c*) < U *

[00159] Theorem 2. Let be the defender strategy computed by PASAQ, p^* is the global optimal defender expected utility,

[00160] 0 < p * -Obj_Pl (x*) < 2C_t— + (C₂ +

K

[00161] PROOF. The first inequality is implied since x* is a feasible solution. Furthermore,

[00162] p * -Obj_Pl (x*) = (p* -L*)+{L* -L

-Obj_Pl (x *))

[00163] +(obj_pl(x*)-Obj_pl(x*)) [00164] Algorithm 1 indicates that L*≤ p*≤ U*, hence p*- L*≤ e .

Additionally, Lemma 7, 8 and 9 provide an upper bound on

Obj_n(x*)-Obj_Pi(x*), L*-L* and L*-Obj_n{x*), therefore

[00165] p * -Obj_pi (x *)≤ ε + C_y + C₂ + C_y < 2C_y ^ + (C₂ + 1) ε

K K K

[00166] Theorem 2 suggests that, given a game instance, the solution quality of PASAQ is bounded linearly by the binary search threshold e and the piecewise linear accuracy— . Therefore the PASAQ solution can be made

K

arbitrarily close to the optimal solution with sufficiently small ε and sufficiently large K.

[00167] PASAQ With Assignment Constraints

[00168] In order to extend PASAQ to handle the assignment constraints, PASAQ-MILP can be modified as the following, referred to as PASAQ-MILP-

C,

[00169]

s.t. Constraint (11) -(15)

[00171] ∑ _j =l (17)

[00172] 0≤a_j≤l, A_}*A. (18)

[00173] PASAQ-MILP-C is a MILP so it can be solved optimally with any MILP solver (e.g., CPLEX). Proving similarly, as for Lemma 5, that the above MILP formulation has the same feasible region as P2. Hence, it leads to a feasible solution of P2. Furthermore, the error bound of PASAQ relies on the approximation accuracy of the objective function by the piecewise linear function and the fact that the subproblem PASAQ-MILP-C can be solved optimally. Both conditions have not changed from the cases without assignment constraints to the cases with assignment constraints. Hence, the error bound is the same as that shown in Theorem 2. [00174] EXPERI MENTAL RESULTS

[00175] Verification experi ments were performed, and are described herein separated into two sets: the first set focuses on the cases where there is no constraint on assigning the resources; the second set focuses on cases with assignment constraints. In both sets, the solution quality and runti me of the two new algorithms, GOSAQ and PASAQ, is compared with the previous benchmark algorith m BRQR. The results were obtained using CPLEX to solve the MILP for PASAQ. For both BRQR and GOSAQ, the MATLAB toolbox function fmincon was used to solve nonlinear opti mization problems. All experi ments were conducted on a standard 2.00GHz machine with 4GB main memory. For each setting of the experi ment parameters (i.e. number of targets, amount of resources and number of assignment constraints), 50 different game instances were used. In each game instance, payoffs R. and

R. are chosen uniformly randomly from 1 to 1 0, while P and P" are chosen uniformly randomly from -1 0 to -1 ; feasible assignments are generated by randomly setting each element Ay to 0 or 1 . For the parameter λ of the quantal response in Equation (1 ), the same value was used (λ = 0.76).

[00176] No Assignment Constraints

[00177] Experi mental results comparing the solution quality and runti me of the three algorith ms (GOSAQ, PASAQ and BRQR) was used in cases without assignment constraints.

[00178] Solution Quality: For each game instance, GOSAQ provides the ε- opti mal defender expected utility, BRQR presents the best local opti mal solution among all the local opti mum it finds, and PASAQ leads to an approxi mated global opti mal solution. The solution quality of different algorithms using average defender's expected utility over all the 50 game instances was measured. FIG. 4, includes Figures 4(a)-4(f), which show solution quality and runti me comparisons, without assignment constraints, for GOSAQ and PASAQ compared with the previous benchmark algorithm BRQR.

[00179] Figures 4(a), 4(c) and 4(e) show the solution quality results of different algorithms under different conditions. In all three figures, the average defender expected utility is displayed on the y-axis. On the x-axis, Figure 4(a) changes the numbers of targets (|^|) keeping the ratio of resources (M) to targets and ε fixed as shown in the caption ; Figure 4(c) changes the ratio of resources to targets fixing targets and ε as shown ; and Figure 4(e) changes the value of the binary search threshold ε. Given a setting of the parameters ^"|, and f), the solution qualities of different algorithms are displayed in a group of bars. For example, in Figure 4(a), (1^1) is set to 50 for the leftmost group of bars, M is 5 and ε = 0.01 . From left to right, the bars show the solution quality of BRQR (with 20 and 1 00 iterations), PASAQ (with 5, 1 0 and 20 pieces) and GOSAQ.

[00180] Key observations from Figures 4(a), 4(c) and 4(e) include: (i) The solution quality of BRQR drops quickly as the number of targets increases; increasing the number of iterations in BRQR i mproves the solution quality, but the i mprovement is very small, (ii) The solution quality of PASAQ i mproves as the number of pieces increases; and it converges to the Gosaq solution as the number of pieces becomes larger than 1 0. (iii) As the number of resources increases, the defender expected utility also increase; and the resource count does not i mpact the relationship of solution quality between the different algorithms, (iv) As ε becomes smaller, the solution quality of both GOSAQ and PASAQ i mproves. However, after epsilon becomes sufficiently small (< 0.1 ), no substantial i mprovement is achieved by further decreasing the value of ε. In other words, the solution quality of both GOSAQ and PASAQ converges.

[00181 ] In general BRQR has the worst solution quality; GOSAQ has the best solution quality. PASAQ achieves almost the same solution quality as GOSAQ when it uses more than 1 0 pieces.

[00182] Runti me: The runti me results are presented in figures 4(b), 4(d) and 4(f). In all three figures, the 7-axis display the runti me, the x-axis displays the variables which were varied in order to measure their i mpact on the runti me of the algorithms. For BRQR run ti me is the sum of the run-ti me across all its iterations.

[00183] Figure 4(b) shows the change in runti me as the number of targets increases. The number of resources and the value of ε are shown in the caption. BRQR with 1 00 iterations is seen to run significantly slower than GOSAQ and PASAQ. Figure 4(d) shows the i mpact of the ratio of resource to targets on the runti me. The figure indicates that the runti me of the three algorithms is independent of the change in the number of resources. Figure 4(f) shows how runti me of GOSAQ and PASAQ is affected by the value of ε. On the x-axis, the value for ε decreases from left to right. The runti me increases linearly as ε decreases exponentially. In both Figures 4(d) and 4(f), the number of targets and resources are displayed in the caption.

[00184] Overall, the results suggest that GOSAQ is the algorithm of choice when the domain has no assign ment constraints. Clearly, BRQR has the worst solution quality, and it is the slowest of the set of algorithms. PASAQ has a solution quality that approaches that of GOSAQ when the number of pieces is sufficiently large (> 1 0) , and GOSAQ and PASAQ also achieve comparable runti me efficiency. Thus, in cases with no assignment constraints, PASAQ offers no advantages over GOSAQ.

[00185] FIG. 5, includes Figures 5(a)-5(f), and shows solution quality and runti me comparison, with assignment constraint(s), for GOSAQ and PASAQ compared with the previous benchmark algorithm BRQR.

[00186] With Assignment Constraint

[00187] In the second set, assignment constraints are introduced into the problem. The feasible assignments are randomly generated. Experi mental results are presented on both solution quality and runti me.

[00188] Solution Quality: Figures 5(a) and 5(b) display the solution quality of the three algorithms with varying number of targets and varying number of feasible assignments In both figures, the average defender utility is displayed on the y-axis. In Figure 5(a) the number of targets is displayed on the x-axis, and the ration of |^| to is set to 60. BRQR is seen to have very poor performance. Furthermore, there is very little gain in solution quality from increasing its number of iterations. While GOSAQ provided the best solution, PASAQ achieves almost identical solution quality when the nu mber of pieces is sufficiently large (> 1 0) . Figure 5(b) shows how solution quality is i mpacted by the number of feasible assignments, which is displayed on the x-axis. Specifically, the x-axis shows nu mbers of assignment constraints, A to be 20 ti mes, 60 ti mes and 1 00 ti mes the number of targets. The number of targets is set to 60. Once again, BRQR has significantly lower solution quality, and it drops as the number of assignments increases; and PASAQ again achieves almost the same solution quality as GOSAQ, as the number the number of pieces is larger than 1 0.

[00189] Runti me: The runti me results in Figures 5(c), 5(e), 5(d) and 5(f) are presented. In all experi ments, 80 minutes was set as the cutoff. Figure 5(c) displays the runti me on the y-axis and the number of targets on the x- axis. It is clear that GOSAQ runs significantly slower than both PASAQ and BRQR, and slows down exponentially as the number of targets increases. Figure 5(e) shows extended runti me result of BRQR and PASAQ as the number of targets increases. PASAQ runs in less than 4 minutes with 200 targets and 1 2000 feasible assignments. BRQR runs significantly slower with higher number of iterations.

[00190] Overall, the results suggest that PASAQ is the algorithm of choice when the domain has assignment constraints. Clearly, BRQR has

significantly lower solution quality than PASAQ. PASAQ not only has a solution quality that approaches that of GOSAQ when the number of pieces is sufficiently large (> 1 0), PASAQ is significantly faster than GOSAQ (which suffers exponential slowdown with scale-up in the domain).

[00191 ] Accordingly, the algorithms described above, including the GOSAQ and PASAQ algorithms, can provide a number of advantages in security games. The GOSAQ can be used to find or guarantee the global opti mal solution in computing the defender strategy against an adversary's quantal response. The efficient approxi mation algorithm, PASAQ, can provide a more efficient computation of the defender strategy with nearly-opti mal solution quality (compared to GOSAQ). These algorithms model the human

adversaries' bounded rationality using the quantal response (QR) model. Further algorithms are also described for solving problems with resource assignment constraint. This work overcomes the difficulties in developing efficient methods to solve the massive security games in real applications, including solving a nonlinear and non-convex opti mization problem and handling constraints on assigning security resources in designing defender strategies.

[00192] Section 2 - A Unified Method for Handling Discrete and Continuous Uncertainty in Bayesian Stackelberg Games (H UNTER algorithm)

[00193] Another aspect of the present disclosure is directed to a unified method of handling discrete and continuous uncertainty in Bayesian

Stackelberg games, i.e., the H UNTER algorithm. Given their existing and potential real-world security applications, Bayesian Stackelberg games have received significant research interest. In these games, the defender acts as a leader, and the many different follower types model the uncertainty over discrete attacker types. Unfortunately since solving such games is an N P- hard problem, scale-up has remained a difficult challenge.

[00194] This section of the present disclosure describes methods (or algorithms) for addressing Bayesian Stackelberg games of large scale, where Bayesian Stackelberg refers to a Stackelberg game in which the defender acts as a leader, and there are many different follower types which model the uncertainty over discrete attacker types. The algorithms described herein provide for a unified approach to handling uncertainty not only over discrete follower types but also other key continuously distributed real world uncertainty, due to the leader's execution error, the follower's observation error, and continuous payoff uncertainty. To that end, an aspect of the present disclosure provides a new algorithm is presented for Bayesian Stackelberg games, called HUNTER, which can provide for or accommodate a scale up the number of types used. HUNTER combines one or more of the following five key features: i) efficient pruning via a best-first search of the leader's strategy space; ii) a novel linear program for computing tight upper bounds for this search ; iii) using Bender's decomposition for solving the upper bound linear program efficiently; iv) efficient inheritance of Bender's cuts from parent to child; v) an efficient heuristic branching rule.

Experi mental results have shown that H UNTER provides orders of magnitude speedups over the best existing methods to handle discrete follower types. HUNTER'S efficiency for Bayesian Stackelberg games can be exploited to also handle the continuous uncertainty using sample average approxi mation, as is described below in further detail. H UNTER-based approaches are experi mentally shown to also outperform latest robust solution methods under continuously distributed uncertainty.

[00195] INTRODUCTION

[00196] To address the challenge of discrete uncertainty, a novel algorithm for solving Bayesian Stackelberg games, called HUNTER, is described, preferably combining the following five key features. First, the H UNTER algorithm conducts a best-first search in the follower's best-response assignment space, which only expands a small number of nodes (within an exponentially large assignment space). Second, H UNTER computes tight upper bounds to speed up this search using a novel linear program. Third, HUNTER solves this linear program efficiently using Bender's decomposition. Fourth, the Bender's cuts generated in a parent node are shown to be valid cuts for its children, providing further speedups. Finally, H UNTER deploys a heuristic branching rule to further i mprove efficiency. Thus, this paper's contribution is in combining an Al search technique (best-first search) with multiple techniques from Operations Research (disjunctive program and Bender's decomposition) to provide a novel efficient algorithm; the

application of these techniques for solving Stackelberg games had not been explored earlier, and thus their application towards solving these games, as well as their particular synergistic combination in H UNTER are both novel. Experi ments have shown that H UNTER can dramatically i mprove the scalability of the number of types over other existing approaches.

[00197] The present disclosure also shows, via sample average

approxi mation, that HUNTER for Bayesian Stackelberg games can be used in handling continuously distributed uncertainty such as the leader's execution error, the follower's observation noise, and both players'

preference uncertainty. For comparison, a class of Stackelberg games motivated by security applications are considered, and enhance two existing robust solution methods, BRASS and RECON to handle such uncertainty. HUNTER is again shown to provide significantly better performance than BRASS and RECON. A final set of experi ments, described herein, also i llustrates H UNTER'S ability to handle both discrete and continuous uncertainty within a single problem.

[00198] BACKG ROUN D AN D NOTATION

[00199] This part of Section 2 of the present disclosure is focused on solving Bayesian Stackelberg games with discrete follower types, where as noted previously, a Stackelberg game is a multi-party (e.g. , two-person) game played by a leader and a follower. In Stackelberg games where the leader commits to a mixed strategy first, the follower observes the leader's strategy and responds with a pure strategy, maxi mizing his uti lity

correspondingly. This set-up can be generalized by extending the definition of the leader's strategy space and the leader and follower uti lities in two ways beyond what has previously been considered and by allowing for compact representation of constraints.

[00200] Assu ming the leader's mixed strategy is an A/-di mensional real

N

colu mn vector x e fi , bounded by a polytope Ax b, x O , generalizes the constraint of ∑_i xt = \ and allows for compact strategy representation with constraints. Given a leader's strategy x, the follower maxi mizes his uti lity by choosing from J pure strategies. For each pure strategy j = 1 , . . . , J played by the follower, the leader gets a uti lity of μ] χ + μ,],ο and the follower gets a uti lity of v^ x + v, j,o , where ^, ν,- are real vectors in R^N and -,o, vj,o e R . This use of μ_/,ο,ν;,ο terms generalizes the uti lity functions.

[00201] The leader's uti lity matrix L/ and the follower's uti lity matrix V ^'\s defined as the followin ,

[00203] Then for a leader's strategy x, the leader and follower's J for the follower's J pure strategies are if (^l) and

[00204] A Bayesian extension to the Stackelberg game allows multiple types of players, each with its own payoff matrix. A Bayesian Stackelberg game can be represented with S follower types by a set of uti lity matrix pairs ( I/¹ , \/¹), . . . , ( U^s, V^s), each corresponding to a type. A type s has a prior probability p^s representing the likelihood of its occurrence. The leader commits to a mixed strategy without knowing the type of the follower she faces. The follower, however, knows his own type s, and plays the best response / e {1 ,...,J} according to his utility matrix V^s. A strategy profile in a Bayesian Stackelberg game is (x, j), a pair of leader's mixed strategy x and follower's response j, where j = <y^'1 , . . . , y^'s) denotes a vector of the follower's responses for all types.

[00205] The solution concept of interest is a Strong Stackelberg Equilibrium (SSE), where the leader maxi mizes her expected utility assuming the follower chooses the best response and breaks ties in favor of the leader for

each type. Formally, let u(x, j) denote ^{th e} leader's expected utility, and v^(x, /) = (v_j ^s _sf x + v_j ^s _{s 0} denote the follower's expected utility for a type s. Then, (x^*, j^*) is an SSE if and only if,

[00206] (x^*, f ) = argmax{u(x, j v^s(x, ) > v^s(x,/),V ≠ }.

Table 4: Payoff matrices of a Bayesian Stackelberg game.

[00207] As an example, which will be returned to herein, a Bayesian Stackelberg game is considered with two follower types, where type 1 appears with probability 0.84 and type 2 appears with probability 0.1 6. The leader (defender) chooses a probability distribution of allocating one resource to protect the two targets whereas the follower (attacker) chooses the best target to attack. The payoff matrices in Table 4 are shown, where the leader is the row player and the follower is the column player. The utilities of the two types are identical except that a follower of type 2 gets a utility of 1 for attacking Target2 successfully, whereas one of type 1 gets 0. The leader's strategy is a column vector ( i + x₂)^T representing the probabilities of protecting the two targets. Given one resource, the strategy space of the leader is i + x₂≤ 1 , xi≥ 0, x₂≥ 0, i.e., A = (1 , 1 ), b = 1 . The payoffs in Figure 1 can be represented by the following utility matrices,

[00210] Bayesian Stackelberg games have been typically solved via tree search, where one follower type to a pure strategy at each tree level is assigned. For example, FIG. 6 shows a search tree of the example game in Table 4. Four linear programs are solved, one for each leaf node. At each leaf node, the linear program provides an opti mal leader strategy such that the follower's best response for every follower type is the chosen target at that leaf node, e.g. , at the leftmost leaf node, the linear program finds the opti mal leader strategy such that both type 1 and type 2 have a best response of attacking Targetl. Comparing across leaf nodes, the overall opti mal leader strategy can be obtained. In this case, the leaf node where type 1 is assigned to Targetl and type 2 to Target2 provides the overall opti mal strategy.

[00211] Instead of solving an LP for all J^s leaf nodes, recent work uses a branch-and-bound technique to speed up the tree search. The key to efficiency in branch-and-bound is obtaining tight upper and lower bounds for internal nodes, i.e., for nodes shown by circles in FIG. 6, where subsets of follower types are assigned to particular targets. For example, in FIG. 6, suppose the left subtree has been explored ; now if at the rightmost internal node (where type 1 is assigned to Target2) that the upper bound on solution quality is 0.5 is realized, the right subtree could be pruned without even considering type 2. One possible way of obtaining upper bounds is by relaxing the integrality constraints in DOBSS MILP. Unfortunately, when the integer variables in DOBSS are relaxed, the objective can be arbitrarily large, leading to meaningless upper bounds. HBGS computes upper bounds by heuristically utilizing the solutions of smaller restricted games. However, the preprocessing involved in solving many small games can be expensive and the bounds computed using heuristics can again be loose.

[00212] The H UNTER (handling uncertainty efficiently using relaxation) algorithm, based on the five key ideas described previously in this section, can provide a unified method for handling uncertainty in Bayesian

Stackelberg games, and can facilitate real-world solutions to security domain problems.

[00213] Algorithm Overview

[00214] To find the opti mal leader's mixed strategy, HUNTER would conduct a best-first search in the search tree that results from assigning follower types to pure strategies, such as the search tree in FIG. 6. Si mply stated, H UNTER ai ms to search this space much more efficiently than HBGS. As discussed earlier, efficiency gains are sought by obtaining tight upper bounds and lower bounds at internal nodes in the search tree (which corresponds to a partial assignment in which a subset of follower types are fixed). To that end, as illustrated in FIG. 7, an upper bound LP is used within an internal search node. The LP returns an upper bound UB and a feasible solution x^*, which is then evaluated by computing/determining the follower best response, providing a lower bound LB. The solution returned by the upper bound LP is also utilized in choosing a new type s^* to create branches. To avoid having this upper bound LP itself become a bottleneck, it can be solved efficiently using, e.g., Bender's decomposition, which will be

explained below.

[00215] FIG. 7 depicts steps of creating internal search nodes for an embodi ment of HUNTER.

[00216] To facilitate understanding of HUNTER'S behavior on a toy game instance, see FIG. 8, which illustrates HUNTER'S search tree in solving the example game from Table 4 above. To start the best-first search, at the root node, no types are assigned any targets yet; the upper bound LP is solved with the initial strategy space i + x₂≤ 1 , xi , X2≥ 0 (Node 1 ). As a result, an upper bound of 0.560 and the opti mal solution JC; = 2/3,JC; = 1/3 is obtained.

The solution returned is evaluated and a lower bound of 0.506 is obtained. Using H UNTER'S heuristics, type 2 is then chosen to create branches by assigning it to Targetl and Target2 respectively. Next, a child node (Node 2) is considered in which type 2 is assigned to Targetl, i.e., type 2's best response is to attack Targetl. As a result, the follower's expected utility of choosing Targetl must be higher than that of choosing Target2, i.e., - X + x₂≥ X - X2, si mplified as x^ - x₂≤ 0. Thus, in Node 2, an additional constraint is i mposed i - x₂≤ 0 on the strategy space and obtain an upper bound of 0.5. Since its upper bound is lower than the current lower bound 0.506, this branch can be pruned out. Next, the other child node (Node 3) is considered in which type 2 is assigned to Target2. This ti me constraint - i + x₂≤ 0 instead is added, and an upper bound of 0.506 is obtained.

Since the upper bound coincides with the lower bound, the expansion of the node further is not needed. Moreover, since both Targetl and Target2 for type 2 have been considered, the algorithm and return 0.506 can be terminated as the opti mal solution value.

[00217] HUNTER'S behavior line-by-line (see Algorithm 2 in FIG. 9) is now discussed. The best-first search is initialized by creating the root node of the search tree with no assignment of types to targets and with the computation of the node's upper bound (Line 2 and 3). The initial lower bound is obtained by evaluating the solution returned by the upper bound LP (Line 4). The root node is added to a priority queue of open nodes which is internally sorted in a decreasing order of their upper bounds (Line 5). Each node contains information of the partial assignment, the feasible region of x, the upper bound, and the Bender's cuts generated by the upper bound LP. At each iteration, the node is retrieved with the highest upper bound (Line 8), select a type s^* to assign pure strategies (Line 9), compute the upper bounds of the node's child nodes (Line 1 2 and 14), update the lower bound using the new solutions (Line 1 5), and enqueue child nodes with upper bound higher than the current lower bound (Line 1 6). As shown later, Bender's cuts at a parent node can be inherited by its children, speeding up the computation (Line 1 2).

[00218] In the rest of the section, the following are provided : 1 ) a

presentation of the upper bound LP, 2) an example of how to solve it using Bender's decomposition, and 3) verification of the correctness of passing down Bender's cuts from parent to child nodes, and 4) introduction of a heuristic branching rule.

[00219] Upper Bound Linear Program

[00220] A tractable linear relaxation of Bayesian Stackelberg games can be derived to provide an upper bound efficiently at each of HUNTER'S internal nodes. Applying the results in disjunctive program, can provide derivation of the convex hull for a single type. As is shown below, intersecting the convex hulls of all its types provides a tractable, polynomial-size relaxation of a Bayesian Stackelberg game.

[00221] Convex hull of a Single Type

[00222] Considering a Stackelberg game with a single follower type ( I/, V), the leader's opti mal strategy x^* is the best among the opti mal solutions of J LPs where each restricts the follower's best response to one pure strategy. Hence the opti mization problem can be represented as the following disjunctive program (i.e., a disjunction of "Multiple LPs"),

[00223] max

[00225] where D_y and dy are given by,

[00227] The feasible set of (1 ), denoted by H, is a union of J convex sets, each corresponding to a disjunctive term. The closure of the convex hull of H, clconv -/, can be represented as shown in FIG. 1 0.

[00228] The intuition for this being that the continuous variables #,∑^₌₁ = 1 are used to create all possible convex combination of points in H. represents a point in the convex set

defined by the y^'-th disjunctive term in the original problem (1 ). Finally, since all the extreme points of ciconv -/ belong to H, the disjunctive program (1 ) is equivalent to the linear program:

[00229] max{w|(x, u) e clconvH}

x,u

[00230] Tractable Relaxation

[00231] Building on the convex hulls of individual types, the relaxation of a Bayesian Stackelberg game with S types is now derived. This game is written with S types as the following disjunctive program,

[00233] si. Ax ^ b,x O (2)

[00235] Returning to the toy example, the corresponding disjunctive program of the game in Table 4 can be written as,

max 0.84M¹ + 0.16M²

[00236] (3)

U² ≤ Xy ( u² ≤ Xy + X₂

Xl - x₂≤ 0 J { - Xi - x₂≤ 0

[00237] Denote the set of feasible points (x, u . . . , u^s) of (2) by H . To avoid an expansion of (2) to a disjunctive normal form, which would result in a linear program with an exponential number (0{N J^s)) of variables, a much more tractable, polynomial-size relaxation of (2) is given in order to create ciconv -/^*, as is explained below. Denote the feasible set of each type s, (x, u^s) by H^s, and define )

clconvH⁵ ,Vs}. - Then the following program is a relaxation of (2) :

[00238] max ∑p^su \(x, U^s)G clconvH⁵ , Vs (4)

x,u^l ,...u^s [00239] Indeed, for any feasible point (x, u ... , u^s) in H, (x, u^s) must

Λ

belong to H^s, implying that (x, u^s) e clconv/-/^s. Hence l/*cli* implying that

Λ Λ

optimizing over H * provides an upper bound on H . On the other hand, H* will in general have points not belonging to H and thus the relaxation can lead to an overestimation.

[00240] For example, consider the disjunctive program in (3).

2 1 2

Oi =— ,x₂ =-,u^l =—,u² = 0) does not belong to H since -Λ¾ + X₂ <0 but u² <—x_l +x₂ ⁼——.However the point belongs to H * because: i)

2 1 2 2 1

Oi =— ,x₂ ⁼^"'^{m1 = ~}^> belongs to H¹ c clconv-/¹ ; ii) (x_l =— ,x₂ =-,u² = 0) belongs to clconv-/², as it is the convex combination of two points in H²:

(xi =-^,x₂ = ^~->^{u2 = ~}) ^and (^χι ⁼ 1J¾ = OJ ^_1) >

[00241] g,I,oJ = |x(IIIJ ₊ Ix(i,o,-i).

[00242] The upper bound LP (4) has 0(N J S) number of variables and constraints, and can be written as the following two-stage problem by explicitly representing clconv/-/^s:

[00244] si. Ax±b,x 0 (5)

[00245] where u^s{x) is defined to be the optimal value of, [00246] max ∑ ψ) ψ)≥ 0, Vj

7=1

S

.∑x_j =x, Xj 0,Vj

7=1

¾=i,0;≥o,v/(6)

7=1

[00250] Although written in two stages, the above formulation is in fact a single linear program, as both stages are maxi mization problems and combining the two stages will not produce any non-linear terms.

Formulations (5) and (6) are displayed in order to reveal the block structure for further speedup as explained below.

[00251] Note that so far, the relaxation for the root node of HUNTER'S search tree have only been derived without assigning any type to a pure strategy. This relaxation is also applied to other internal nodes in H UNTER'S search tree. For example, if type s is assigned to pure strategy j, the leader's strategy space is further restricted by the addition of constraints of Djx +

<S,

Λ

where A' and b'

J

[00252] Bender's Decomposition

[00253] Although much easier than solving a full Bayesian Stackelberg game, solving the upper bound LP can still be computationally challenging. Here, the block structure of (4) as observed above is invoked, which partitioned it into (5) and (6), where, (5) is a master problem and (6) for s = 1 , . . . , S are S subproblems. This block structure allows solution of the upper bound LP efficiently using multi-cut Bender's Decomposition.

Generally speaking, the computational difficulty of opti mization problems increases significantly with the number of variables and constraints. Instead of considering all variables and constraints of a large problem

si multaneously, Bender's decomposition can be used to partition the problem into multiple smaller problems, which can then be solved in sequence. For completeness, the technique is now briefly described.

[00254] In Bender's decomposition, the second-stage maxi mization problem (6) is replaced by its dual mini mization counterpart, with dual variables % ,π° ,η° for s = 1 , . . . , S: [00255] u^s (x) = min (π°)^τχ + η

[00256] si. O,Vi (7)

[00257] Since the feasible region of (7) is independent of x, its opti mal solution is reached at one of a finite nu mber of extreme points (of the dual variables) . Since u^s{x) is the mini mu m of (^)^Tx + η⁵ over all possible dual points, the following inequality must be true in the master problem,

[00258] η° < {πΐ)^τχ + ηΙ, k = \,...,K (8)

[00259] where, { π ,η , k = 1 , . . . , K axe all the dual extreme points.

Constraints of type (8) for the master problem are called opti mality cuts (infeasibi lity cuts, another type of constraint, are not believed to be relevant for this problem).

[00260] Since there are typically exponentially many extreme points for the dual formulation (7), generating all constraints of type (8) may not be practical. Instead, Bender's decomposition can be used, and which starts by solving the master problem (5) with a subset of these constraints to find a candidate opti mal solution (x^*, u^{1 *}, . . . , u^{s *}). It then solves S dual subproblems (7) to calculate u^s{x^'). If all the subproblems have u^s{x^') = u^s the algorith m stops. Otherwise for those u^s{x^') < u^s the corresponding constraints of type (8) are added to the master program for the next iteration .

[00261] Reusing Bender's Cuts

[00262] The upper bound LP computation can be further sped up at internal nodes of H UNTER'S search tree by not creating all of the Bender's cuts from scratch ; instead, Bender's cuts from the parent node can be reused in its chi ldren. Suppose u^s≤ (^)^Tx + η⁵ is a Bender's cut in the parent node. This means u^s cannot be greater than (^)^Tx + η⁵ for any x in the feasible region of the parent node. Because a chi ld node's feasible region is always more restricted than its parent's, a conclusion is that u^s cannot be greater than {7f)^Tx + η⁵ for any x in the chi ld node's feasible region, i .e. , u^s≤ (^)^Tx + η⁵ must also be a valid cut for the chi ld node. [00263] Heuristic Branching Rules

[00264] Given an internal node in the search tree of HUNTER, the type to branch on next must be decided upon, i.e., the type for which J child nodes will be created at the next lower level of the tree. As described below, the type selected to branch on has a significant effect on efficiency. For some embodi ments, a type can be selected whereby the upper bound at these children nodes will decrease most significantly. To that end, HUNTER chooses the type whose returned by (6) violates the integrality constraint the most. Recall that Θ" is used to generate convex combinations. The motivation here is that if all Θ" returned by (6) are integer vectors, the solution of the upper bound LP (5) and (6) is a feasible point of the original problem (2), i mplying the relaxation already returns the opti mal solution. More specifically, HUNTER chooses type s^* whose corresponding 0^s* has the maxi mu m entropy, i.e. , s* = argmax₅- ^₌₁ ^ logtf^* .

[00265] CONTIN UOUS UNCERTAI NTY I N STACKELBERG GAMES

[00266] HUNTER can be used or modified to handle continuous uncertainty via the sample average approximation technique. Below, an uncertain Stackelberg game model is introduced with continuously distributed uncertainty in leader's execution, follower's observation, and both players' utilities. Then it is shown that the uncertain Stackelberg game model can be written as a two-stage mixed-integer stochastic program, to which existing convergence results of the sample average approxi mation technique apply. Finally, it is shown that the sampled problems are equivalent to Bayesian Stackelberg games, and consequently can also be solved by H UNTER.

[00267] Uncertain Stackelberg Game Model

[00268] The following types of uncertainty in Stackelberg games with known distributions are shown. First, an assumption can be made that there is uncertainty in both the leader and the follower's utilities L/ and V. Second, the leader's execution and the follower's observation can be assumed to be noisy. In particular, the executed strategy and observed strategy are linear perturbations of the intended strategy are assumed, i.e., when the leader commits to x, the actual executed strategy is y = F^Tx + f and the observed strategy by the follower is z = G^Tx + g, where (F, f) and (G, g) are uncertain. Here f and g are used to represent the execution and observation noise that is independent on x. In addition, F and G are matrices allowing execution and observation noise to be modeled as linearly dependent on x. Note that G and g can be dependent on F and f. For example, an execution noise can be represented that is independent of x and follows a Gaussian distribution with 0 mean using F = l_N and f ~ N(0,∑), where I_N is the Nx N identity matrix. Assume U, V, F, f, G, and g are random variables, following some known continuous distributions. A vector ξ = ( U, V, F, f, G, g) can be used to represent a realization of the above inputs, and the notation ξ( ω) can be used to represent the corresponding random variable.

[00269] The uncertain Stackelberg game, as described in further detail below, can be written as a two-stage mixed-integer stochastic program. Let Q(x, ξ) be the leader's utility for a strategy x and a realization ξ, assuming the follower chooses the best response. The first stage maxi mizes the expectation of leader's utility with respect to the joint probability distribution of ξ{ ω), i.e. ,

[00270] max{E[g(x, (w))] I Ax^b, x )} . The second stage computes

[00271] where (9)

i* = arg max™_! v (G^Tx + g) + v_ifi

[00272] Sample Average Approxi mation

[00273] Sample average approxi mation is a popular solution technique for stochastic programs with continuously distributed uncertainty. It can be applied to solving uncertain Stackelberg games as follows. First, a sample ξ ...,ξ^!ί of S realizations of the random vector ξ( ω) is generated. The expected value f n then be approxi mated by the sample average The sampled problem is given by,

[00275] The sampled problem provides tighter and tighter statistical upper bound of the true problem with increasing number of samples; the number of samples required to solve the true problem to a certain accuracy grows linearly in the dimension of x.

[00276] In the sampled problem, each sample corresponds to a tuple (I/, V, F, f, G, g). The following proposition shows is equivalent to some ^where F = G = I_N and / = = 0, implying the sampled execution and observation noise can be handled by simply perturbing the utility matrices.

[00277] PROPOSITION 1. For any leader's strategy x and follower's strategy j, both players get the same expected utilities in two noise

realizations (U, V, F, f, G, g) and (LJ, V, IN, 0, IN, 0), where,

[00279] PROOF. Both players' expected utility vectors for both noise realizations to establish the equivalence are calculated:

[00280] u _x) = U^<{ *¾) = £/'( r,| _{+ f}}

[00282] A direct implication of Proposition 1 is that the sampled problem (10) and (9) is equivalent to a Bayesian Stackelberg game of S equally weighted types, with utility matrices {II^s , V^s), s = 1 ,...,S. Hence, via sample average approximation, HUNTER can be used to solve Stackelberg games with continuous payoff, execution, and observation uncertainty.

[00283] A Unified Approach

[00284] Applying sample average approximation in Bayesian Stackelberg games with discrete follower types, both discrete and continuous uncertainty can be handled simultaneously using HUNTER. For this, each discrete follower type can be replaced by a set of samples of the continuous distribution, converting the original Bayesian Stackelberg game to a larger one. The resulting problem can again be solved by HUNTER, providing a solution robust to both types of uncertainty.

[00285] EXPERI MENTAL RESULTS

[00286] To verify that H UNTER can handle both discrete and continuous uncertainty in Stackelberg games, three sets of experi ments were conducted considering i) only discrete uncertainty, ii) only continuous uncertainty, and iii) both types of uncertainty. The utility matrices were randomly generated from a uniform distribution between -1 00 and 1 00. Results were obtained on a standard 2.8GHz machine with 2GB main memory, and were averaged over 30 trials.

[00287] Handling Discrete Follower Types

[00288] FIGS. 1 1 (a)-1 1 (d) show experi mental analysis of H UNTER and runti me comparisons with HBGS and DOBSS. For discrete uncertainty, the runti me of HUNTER was compared with DOBSS and HBGS (specifically, HBGS-F, the most efficient variant), the two best known algorithms for general Bayesian Stackelberg games. These algorithms were compared, varying the nu mber of types and the nu mber of pure strategies per player. The tests used a cutoff ti me of one hour for all three algorithms.

[00289] FIG. 1 1 (a) shows the performance of the three algorithms when the number of types increases. The games tested in this set have 5 pure strategies for each player. The x-axis shows the number of types, while the y-axis shows the runti me in seconds. As can be seen in FIG. 1 1 (a),

HUNTER provides significant speed-up, of orders of magnitude over both HBGS and DOBSS3 (the line depicting H UNTER is almost touching the x- axis in FIG. 1 1 (a). For example, it was found that H UNTER can solve a Bayesian Stackelberg game with 50 types in 1 7.7 seconds on average, whereas neither HBGS nor DOBSS could solve an instance in an hour. FIG. 1 1 (b) shows the performance of the three algorithms when the number of pure strategies for each player increases. The games tested in this set have 1 0 types. The x-axis shows the number of pure strategies for each player, while the y-axis shows the runti me in seconds. HUNTER again was shown to provide significant speed-up over both HBGS and DOBSS. For example, HUNTER on average was able to solve a game with 1 3 pure strategies in 1 08.3 seconds, but HBGS and DOBSS took more than 30 minutes.

[00290] The following analyzes the contributions of HUNTER'S key components to its performance. First, the runti me of HUNTER with two search heuristics, best-first (BFS) and depth-first (DFS) is considered, when the number of types is further increased. Setting the pure strategies for each player to 5 , the number of types can be increased from 1 0 to 200. In Table 5, the average runti me and average number of nodes explored in the search process is summarized. DFS, as seen, is faster than BFS when the number of types is small, e.g. , 1 0 types. However, BFS was seen to always explore significantly fewer number of nodes than DFS and be more efficient when the number types is large. For games with 200 types, the average runti me of BFS based H UNTER was 20 minutes, highlighting its scalability to a large number of types. Such scalability is achieved by efficient pruning - for a game with 200 types, H UNTER explores on average 5.3xl0³ nodes with BFS and l.lxlO⁴ nodes with DFS, compared to a total of 5²⁰⁰ = 6.2xl0¹³⁹ possible leaf nodes.

[00291] Table 5: Scalability of H UNTER to a large number of types

[00292] Second, the effectiveness of the two heuristics is tested :

inheritance of Bender's cuts from parent node to child nodes and the branching rule utilizing the solution returned by the upper bound LP. The number of pure strategies for each agent was fixed to 5 and the number of types was increased from 1 0 to 50. In Figure 1 1 (c), the runti me of three variants of HUNTER are shown : i) Variant-I does not inherit Bender's cuts and chooses a random type to create branches; ii) Variant-l l does not inherit Bender's cuts and uses the heuristic branching rule; iii) Variant-I l l (HUNTER) inherits Bender's cuts and uses the heuristic branching rule. The x-axis represents the number of types while the y-axis represents the runti me in seconds. As can be seen, each individual heuristic helps speed up the algorithm significantly, showing their usefulness. For example, it was shown to take 14.0 seconds to solve an instance of 50 types when both heuristics were enabled (Variant-I l l) compared to 51 .5 seconds when neither of them was enabled (Variant-I).

[00293] Finally, a consideration is made of the performance of HUNTER in finding quality bounded approxi mate solutions. To this end, HUNTER is allowed to terminate once the difference between the upper bound and the lower bound decreases to η, a given error bound. The solution returned is therefore an approxi mate solution provably within η of the opti mal solution. In this set of experi ment, 30 games were tested with 5 pure strategies for each player and 50, 1 00, and 1 50 types with varying error bound η from 0 to 1 0. As shown in Figure 1 1 (d), HUNTER can effectively trade off solution quality for further speedup, indicating the effectiveness of its upper bound and lower bound heuristics. For example, for games with 1 00 types,

HUNTER returned within 30 seconds a subopti mal solution at most 5 away from the opti mal solution (the average optimal solution quality is 60.2).

Compared to finding the global opti mal solution in 1 78 seconds, H UNTER is able to achieve six-fold speedup by allowing at most 5 quality loss.

[00294] Handling Continuous Uncertainty

[00295] For continuous uncertainty, ideally HUNTER would be compared with other algorithms that handle continuous execution and observation uncertainty in general Stackelberg games; however, no such algorithm are known to exist. Hence this investigation is restricted to the more restricted security games, so that two previous robust algorithms BRASS and RECON can be used in such a comparison. To introduce the uncertainty in these security games, it can be assumed that the defender's execution and the attacker's observation uncertainty each follows independent uniform distributions. That is, an assumption is made that for an intended defender strategy x = (x_l,...,x_N) where x, represents the probability of protecting target

/^', the maxi mum execution error associated with target /^' is _h and the actual executed strategy is y = <yi ,...,y/v), where y, follows a uniform distribution between x_i - _i and x_t + _t for each /^'. Si milarly, the maxi mum observation error for target /^' is is assumed and the actual observed strategy is z = <zi ,...,z/v) where z, follows a uniform distribution between y, - ?, and y, + ¼ for each /^'.

[00296] HUNTER was used with 20 samples and 1 00 samples to solve the problem above via sample average approxi mation as described previously. For each setting, H UNTER was repeated 20 ti mes with different sets of samples and reported the best solution found (as shown below, H UNTER'S competitors were used for 20 settings for selecting the best solutions).

Having generated a solution with 20 or 1 00 samples, evaluating its actual quality is difficult in the continuous uncertainty model - certainly any analytical evaluation is extremely difficult. Therefore, to provide an accurate esti mation of the actual quality, 1 0,000 samples were drawn from the uncertainty distribution and the solution was evaluated using these samples.

[00297] For comparison, two existing robust solution methods BRASS and RECON were considered. As experi mentally tested, when its parameter ε is chosen carefully, BRASS strategy is one of the top performing strategies under continuous payoff uncertainty. RECON assumes a maxi mum execution error o and a maxi mu m observation error β, computing the risk- averse strategy for the defender that maxi mizes the worst-case performance over all possible noise realization. To provide a more meaningful

comparison, solutions of BRASS / RECON were found repeatedly with multiple settings of parameters, and reports are provided herein for the best one. For BRASS, 20 ε settings were tested, and for RECON, the setting was made = β and 20 settings were tested.

[00298] For the experi ments, 30 randomly generated security games were tested with five targets and one resource. The maxi mu m execution and observation error was set to = β = 0.1 . The utilities in the game were drawn from a uniform distribution between -1 00 and +1 00. Nonetheless, the possible opti mal solution quality existed in a much narrower range. Over the 30 instances tested, the opti mal solution quality by any algorithm were found to vary between -26 and +1 7. In Table 6, the solution quality of HUNTER compared to BRASS and RECON is shown, respectively. In Table 6, the #Wins shows the number of instances out of 30 where H UNTER returned a better solution than BRASS / RECON. Avg. Diff. shows the average gain of HUNTER over BRASS (or RECON), and the average solution quality of the corresponding algorithm (shown in the parentheses). Max. Diff. shows the maxi mu m gain of H UNTER over BRASS (or RECON), and the solution quality of the corresponding instance and algorithm (shown in the

parentheses). HUNTER, as shown with 20 and 1 00 samples, outperformed both BRASS and RECON on average. For example, RECON on average returned a solution with quality of -5.1 , while even with 20 samples, the average gain H UNTER achieved over RECON is 0.6. The result is

statistically significant with a paired t-test value of 8.9X10^"6 and l.OxlO^"3 for BRASS and RECON respectively. Indeed, when the nu mber of samples used in H UNTER increased to 1 00, H UNTER was able to outperform both BRASS and RECON in every instance tested. Not only is the average difference in this case statistically significant, but the actual solution quality found by H UNTER - as shown by max difference - can be significantly better in practice than solutions found by BRASS and RECON.

[00299] Table 6: Quality gain of H UNTER against BRASS and RECON under continuous execution and observation uncertainty.

[00300] Handling Both Types of Uncertainty

[00301] In another experi ment, Stackelberg games with both discrete and continuous uncertainty were considered. Since no previous algorithm in known to handle both types of uncertainty, the runti me results only of HUNTER are shown. Tests were run on security games with five targets and one resource, and with multiple discrete follower types whose utilities were randomly generated. For each type, the same utility distribution and the same execution and observation uncertainty were used as previously described. Table 7 summarizes the runti me results of HUNTER for 3, 4, 5, 6 follower types, and 1 0, 20 samples per type. As shown, H UNTER can efficiently handle both uncertainty si multaneously. For example, H UNTER spends less than 4 minutes on average to solve a problem with 5 follower types and 20 samples per type.

[00302] Table 7: Runti me results (in seconds) of H UNTER for handling both discrete and continuous uncertainty.

[00303] CONCLUSIONS

[00304] With increasing nu mbers of real-world security applications of leader-follower Stackelberg games, it is critical that to address uncertainty in such games, including discrete attacker types and continuous uncertainty such as the follower's observation noise, the leader's execution error, and both players' payoffs uncertainty. Previously, researchers have designed specialized sets of algorithms to handle these different types of uncertainty, e.g. algorithms for discrete follower types have been distinct from algorithms that handle continuous uncertainty. However, in the real-world, a leader may face all of this uncertainty si multaneously, and thus a single unified algorithm that handles all this uncertainty is desired.

[00305] To that end, a novel unified algorithm, called H UNTER, has been presented herein, which handles discrete and continuous uncertainty by scaling up Bayesian Stackelberg games. The H UNTER algorithm is able to provide speedups of orders of magnitude over existing algorithms.

Additionally, using sample average approxi mation, H UNTER can handle continuously distributed uncertainty.

[00306] Section 3 - Multi-Objective Opti mization for Security Games

[00307] As was noted above, an aspect of the present disclosure in directed to a multi-objective opti mization for security games. The aspect includes a treatment or description of multi-objective security games (MOSG), combining security games and multi-objective opti mization. MOSGs have a set of Pareto opti mal (non-dominated) solutions referred to herein as the Pareto frontier. The Pareto frontier can be generated by solving a sequence of constrained single-objective opti mization problems (CSOP), where one objective is selected to be maxi mized while lower bounds are specified for the other objectives.

[00308] The burgeoning area of security games has focused on real-world domains where security agencies protect critical infrastructure from a diverse set of adaptive adversaries. There are security domains where the payoffs for preventing the different types of adversaries may take different forms (seized money, reduced cri me, saved lives, etc) which are not readily comparable. Thus, it can be difficult to know how to weigh the different payoffs when deciding on a security strategy. To address the challenges of these domains, a fundamentally different solution concept is described herein, multi-objective security games (MOSG), which combines security games and multi-objective opti mization. Instead of a single opti mal solution, MOSGs have a set of Pareto opti mal (non-dominated) solutions referred to as the Pareto frontier. The Pareto frontier can be generated by solving a sequence of constrained single-objective opti mization problems (CSOP), where one objective is selected to be maxi mized while lower bounds are specified for the other objectives. Techniques or algorithms as described herein for providing multi-objective opti mization for security games can include the following features: (i) an algorithm, Iterative e -Constraints, for generating the sequence of CSOPs; (ii) an exact approach for solving an MI LP formulation of a CSOP (which also applies to multi-objective

opti mization in more general Stackelberg games) ; (iii) heuristics that achieve speedup by exploiting the structure of security games to further constrain a CSOP; (iv) an approxi mate approach for solving an algorithmic formulation of a CSOP, increasing the scalability of the approach with quality guarantees. Proofs on the level of approxi mation and detailed experi mental evaluation of the certain embodi ments are provided below.

[00309] As was noted above, game theory is an increasingly i mportant paradigm for modeling security domains which feature complex resource allocation. Security games, a special class of attacker-defender Stackelberg games, are at the heart of several major deployed decision-support applications. Such systems include ARMOR at LAX airport, I RIS deployed by the US Federal Air Marshals Service, G UARDS developed for the US Transportation Security Administration, and PROTECT used in the Port of Boston by the US Coast Guard.

[00310] In these applications, the defender is trying to maxi mize a single objective. However, there are domains where the defender has to consider multiple objectives si multaneously. For example, the Los Angeles Sheriff's Department (LASD) has stated that it needs to protect the city's metro system from ticketless travelers, common cri minals, and terrorists. From the perspective of LASD, each one of these attacker types provides a unique threat (lost revenue, property theft, and loss of life). Given this diverse set of threats, selecting a security strategy is a significant challenge as no single strategy can mini mize the threat for all attacker types. Thus, tradeoffs must be made and protecting more against one threat may increase the

vulnerability to another threat. However, it is not clear how LASD should weigh these threats when determining the security strategy to use. One could attempt to establish methods for converting the different threats into a single metric. However, this process can become convoluted when attempting to compare abstract notions such as safety and security with concrete concepts such as ticket revenue.

[00311] Bayesian security games have been used to model domains where the defender is facing multiple attacker types. The threats posed by the different attacker types are weighted according to the relative likelihood of encountering that attacker type. There are three potential factors li miting the use of Bayesian security games: (1 ) the defender may not have information on the probability distribution over attacker types, (2) it may be i mpossible or undesirable to directly compare and combine the defender rewards of different security games, and (3) only one solution is given, hiding the tradeoffs between the objectives from the end user.

[00312] As described below, for many domains a new game model, multi- objective security games (MOSG) can be utilized advantageously, which combines game theory and multi-objective opti mization. For these models, the threats posed by the attacker types are treated as different objective functions which are not aggregated, thus eli minating the need for a

probability distribution over attacker types. Unlike Bayesian security games which have a single opti mal solution, MOSGs have a set of Pareto opti mal (non-dominated) solutions which is referred to herein as the Pareto frontier. By presenting the Pareto frontier to the end user, they are able to better understand the structure of their problem as well as the tradeoffs between different security strategies. As a result, end users are able to make a more informed decision on which strategy to enact.

[00313] As described herein, MOSG solutions provide a set of algorith ms for computing Pareto opti mal solutions for MOSGs. Key features of such solutions include one of more of the following : (i) Iterative e -Constraints, an algorithm for generating the Pareto frontier for MOSGs by producing and solving a sequence of constrained single-objective opti mization problems (CSOP) ; (ii) an exact approach for solving a mixed-integer linear program (MILP) formulation of a CSOP (which also can apply to multi-objective opti mization in more general Stackelberg games) ; (iii) heuristics that exploit the structure of security games to speedup solving CSOPs; and (iv) an approxi mate approach for solving CSOPs, which greatly increases the scalability of the MOSG approach while maintaining quality guarantees.

Additionally, analysis of the complexity and completeness for the algorithms is provided, as well as experi mental results.

[00314] EXEMPLARY DOMAI NS

[00315] As described above, an example of a security domain to which a MOSG model can be applied is the stated scenario that the LASD must protect the Los Angeles metro system from ticketless travelers, cri minals, and terrorists. Each type of perpetrator is distinct and presents a unique set of challenges. Thus, LASD may have different payoffs for preventing the various perpetrators. Targeting ticketless travelers will increase the revenue generated by the metro system as it will encourage passengers to purchase tickets. Pursuing cri minals will reduce the amount of vandalism and property thefts, increasing the overall sense of passenger safety. Focusing on terrorists could help to prevent or mitigate the effect of a future terrorist attack, potentially saving lives. LASD has finite resources with which to protect all of the stations in the city. Thus, it is not possible to protect all stations against all perpetrators at all ti mes. Therefore, strategic decisions must be made such as where to allocate security resources and for how long. These allocations should be determined by the amount of benefit they provide to LASD. However, if preventing different perpetrators provides different, incomparable benefits to LASD, it may be unclear how to decide on a strategy. In such situations, a multi-objective security game model could be of use, since the set of Pareto opti mal solutions can explore the trade-offs between the different objectives. LASD can then select the solution they feel most comfortable with based on the information they have.

[00316] MULTI-OBJ ECTIVE SECU RITY GAMES

[00317] A multi-objective security game is a multi-player game between a defender and n attackers, in which the defender tries to prevent attacks by covering targets T = {ti, t₂,

using m identical resources which can be distributed in a continuous fashion amongst the targets and according to multiple different objectives. The defender's strategy can be represented as a coverage vector c e C where c_t is the amount of coverage placed on target f and represents the probability of the defender successfully preventing any attack on t. C = {(c_t) |0 < c_t < 1 ,∑_te τ c_t < m} is the defender's strategy space.

The attacker i's mixed strategy a. = («■) is a vector where (a is the probability of attacking t.

[00318] U defines the payoff structure for an MOSG, with L// defining the payoffs for the security game played between the defender and attacker /^'. Uf^d{t) is the defender's utility if t is chosen by attacker /^' and is fully covered by a defender resource. If f is not covered, the defender's penalty is Uf'^d (t) . The attacker's utility is denoted si milarly by Uf {t) and Uf'^a (t) . A property of security games is that Uf^d (t) > Uf {t) and Uf'^a (t) > Uf^a (t) which means that placing more coverage on a target is always beneficial for the defender and disadvantageous for the attacker. For a strategy profile (c,a/> for the game between the defender and attacker /^', the expected utilities for both agents are given by:

[00319] i// (c,a_!. )

(_Ci ,t), (c,a ∑ , t/," (_C, , i) [00320] where

[00321 ] Uf {c„t) = c_tU (f ) + (1 - c, W it) and t/f (c, , t ) = c_tU^e* {t)+ {l - c, W (t) are the payoff received by the defender and attacker /^', respectively, if target t is attacked and is covered with c_t resources.

[00322] The standard solution concept for a two-player Stackelberg game is Strong Stackelberg Equilibrium (SSE), in which the defender selects an opti mal strategy based on the assumption that the attacker will choose an opti mal response, breaking ties in favor of the defender, uf (c) and Uf (c) can be denoted as the payoff received by the defender and attacker /^', respectively, when the defender uses the coverage vector c and attacker /^' attacks the best target while breaking ties in favor of the defender.

[00323] With multiple attackers, the defender's utility (objective) space can be represented as a vector U^d{c) = ( Uf {c)). An MOSG defines a multi- objective opti mization problem: [00324] max(t/f (c)),...,t/_B ^d (c)) .

[00325] Solving such multi-objective opti mization problems is a

fundamentally different task than solving a single-objective opti mization problem. With multiple objectives functions tradeoffs, exist between the different objectives such that increasing the value of one objective decreases the value of at least one other objective. Thus for multi-objective

opti mization, the traditional concept of optimality is replaced by Pareto opti mality; definitions are provided below.

[00326] DEFIN ITION 1 . {Dominance). A coverage vector c e C is said to dominate c ' e C if U? {c)≥ Uf (C) for all i = 1 ,...,n and U {c) > uf (C) for at least one index i.

[00327] DEFIN ITION 2. (Pareto Optimality) A coverage vector c e C is Pareto optimal if there is no other c ' e C that dominates c. The set of non- dominated coverage vectors is called Pareto optimal solutions C^* and the corresponding set of objective vectors Ω = { L/^d(c)/c e C^*} is called the Pareto frontier.

[00328] The present disclosure provides algorithms to find Pareto opti mal solutions in MOSGs. If there are a finite number of Pareto opti mal solutions, it is preferable to generate all of them for the end-user. If there are an infinite number of Pareto opti mal solutions, it is i mpossible to generate all the Pareto opti mal solutions. In this case, a subset of Pareto opti mal solutions can be generated, which can approxi mate the true Pareto frontier with quality guarantees.

[00329] MOSGs build on security games and multi-objective opti mization. The relationship of MOSGs to previous work in security games and in particular Bayesian security games has already been reviewed above. In this section, the research on multi-objective opti mization will be pri marily reviewed. There are three representative approaches for generating the Pareto frontier in multi-objective opti mization problems. These include weighted summation, where the objective functions are assigned weights and aggregated, producing a single Pareto optimal solution. The Pareto frontier can then be explored by sampling different weights. Another approach is multi-objective evolutionary algorithms (MOEA). Evolutionary approaches such as NSGA-I I are capable of generating multiple approxi mate solutions in each iteration. However, due to their stochastic nature, both weighted summation and MOEA cannot bound the level of approxi mation for the generated Pareto frontier. This lack of solution quality guarantees may be unacceptable for security domains.

[00330] The third approach is the e -constraint method in which the Pareto frontier is generated by solving a sequence of CSOPs. One objective is selected as the pri mary objective to be maxi mized while lower bound constraints are added for the secondary objectives. The original e -constraint method discretizes the objective space and solves a CSOP for each grid point. This approach is computationally expensive since it exhaustively searches the objective space of secondary objectives. There has been work to i mprove upon the original e -constraint method, such as proposing an adaptive technique for constraint variation that leverages information from solutions of previous CSOPs. However, such a method requires solving 0(/ "^"1 ) CSOPs, where / is the number of solutions in the Pareto frontier. Another approach, the aug mented e -constraint method, reduces computation by using infeasibility information from previous CSOPs. However, this approach only returns a predefined number of points and thus cannot bound the level of approxi mation for the Pareto frontier.

[00331] Approaches for solving MOSGs according to the present disclosure significantly modify the idea of the e -constraint method for application to security domains that demand both efficiency as well as quality guarantees when providing decision support. Exemplary embodi ments only need to solve O(nk) CSOPs and can provide approxi mation bounds.

[00332] ITERATIVE e -CONSTRAI NTS

[00333] The e -constraint method formulates a CSOP for a given set of constraints b, producing a single Pareto opti mal solution. The Pareto frontier is then generated by solving multiple CSOPs produced by modifying the constraints in b. Below, a presentation of the Iterative e -Constraints algorithm is given, which is an algorithm for systematically generating a sequence of CSOPs for an MOSG. These CSOPs can then be passed to a solver Φ to return solutions to the MOSG. Following portions of the

disclosure present 1 ) an exact MI LP approach, which can guarantee that each solution is Pareto opti ma, and 2) a faster approxi mate approach for solving CSOPs.

[00334] Algorithm for Generating CSOPs

[00335] Iterative e -Constraints uses one or more (preferably all) of the following four key features: 1 ) The Pareto frontier for an MOSG can be found by solving a sequence of CSOPs. For each CSOP, U^d {c) is selected as the pri mary objective, which will be maxi mized. Lower bound constraints b are then added for the secondary objectives U₂ ^d (c) , ... , U_n ^d (c) . 2) The sequence of

CSOPs are iteratively generated by exploiting previous Pareto opti mal solutions and applying Pareto dominance. 3) It is possible for a CSOP to have multiple coverage vectors c that maxi mize U^d {c) and satisfy b. Thus, lexicographic maxi mization is used to ensure that CSOP solver Φ only returns Pareto opti mal solutions. 4) It may be i mpractical (even i mpossible) to generate all Pareto opti mal points if the frontier contains a large nu mber of points, e.g., the frontier is continuous. Therefore, a parameter e is used to discretize the objective space, trading off solution efficiency versus the degree of approxi mation in the generated Pareto frontier.

[00336] FIG. 1 2 depicts and example of a Pareto Frontier for a Bi-Objective MOSG. A si mple MOSG example is now presented with two objectives and € = 5. FIG. 1 2 shows the objective space for the problem as well as several points representing the objective vectors for different defender coverage vectors. In this problem, U_x ^d will be maxi mized while b₂ constrains U^d . The initial CSOP is unconstrained (i.e., b₂ = -∞), thus the solver Φ will maxi mize U^d and return solution A = (1 00, 1 0). Based on this result, that any point v =

{ 1/1 , 1/2} (e.g. , B) is not Pareto opti mal if v₂ < 1 0, as it would be dominated by A is known. A new CSOP is then generated, updating the bound to b₂ = 1 0 + e . Solving this CSOP with Φ produces solution C=(80, 25) which can be used to generate another CSOP with b₂ = 25 + e . Both D=(60,40) and E=(60,60) satisfy b₂ but only E is Pareto opti mal. Lexicographic

maxi mization ensures that only E is returned and dominated solutions are avoided (details in Section 6). The method then updates b₂ = 60 + e and Φ returns F=(30,70), which is part of a continuous region of the Pareto frontier from U₂ ^d = 70 to U₂ ^d = 78. The parameter e causes the method to select a subset of the Pareto opti mal points in this continuous region. In particular this example returns G=(1 0,75) and in the next iteration (b₂ = 80) finds that the CSOP is infeasible and terminates. The algorithm returns a Pareto frontier of A, C, E, F, and G.

[00337] Algorithm 3, shown in FIG. 1 3, systematically updates a set of lower bound constraints b to generate the sequence of CSOPs. Each ti me a CSOP is solved, a portion of the n - 1 di mensional space formed by the secondary objectives is marked as searched with the rest divided into n - 1 subregions (by updating b for each secondary objective). These n - 1 subregions are then recursively searched by solving CSOPs with updated bounds. This systematic search forms a branch and bound search tree with a branching factor of n - 1 . As the depth of the tree increases, the CSOPs are more constrained, eventually becoming infeasible. If a CSOP is found to be infeasible, no child CSOPs are generated because they are guaranteed to be infeasible as well. The algorithm terminates when the entire secondary objective space has been searched.

[00338] Two modifications can be made to i mprove the efficiency of the algorithm (Algorithm 3). 1 ) Preventing redundant computation resulting from multiple nodes having an identical set of lower bound constraints by recording the lower bound constraints for all previous CSOPs in a list called previousBoundsList. 2) Preventing the solving of CSOPs which are known to be infeasible based on previous CSOPs by recording the lower bound constraints for all infeasible CSOPs in a list called infeasibleBoundsList.

[00339] Approxi mation Analysis

[00340] Assume the full Pareto frontier is Ω and the objective space of the solutions found by the Iterative e -Constraints method is Q_e . [00341] THEOREM 3. Solutions in Q_e are non-dominated, i.e., Q_eeQ.

[00342] PROOF. Let c^* be the coverage vector such that U_d{c^') e Q_e and assume that it is dominated by a solution from a coverage vector c . That means Uf(c) > Uf (c) for all /^'= 1,...,nand for some j, U](c) > U](c). This means that (c) was a feasible solution for the CSOP for which c^* was found to be optimal. Furthermore, the first time the objectives differ, the solution (c) is better and should have been selected in the lexicographic

maximization process. Therefore c^* £ Q_e which is a contradiction. □

[00343] Given the approximation introduced by e , one immediate question is to characterize the efficiency loss. Here, a bound to measure the largest efficiency loss is defined:

[00344] P(G)= max minmaxiv. -v')

νεΩ\Ω_ε νΐΩ_Ε l≤i≤n

[00345] This approximation measure can be used to compute the maximum distance between any point v e Ω \ Q_e on the frontier to its "closest" point v' € Q_e computed by the algorithm. The distance between two points is the maximum difference of different objectives.

[00346] THEOREM 4. p(e <€.

[00347] PROOF. It suffices to prove Th.4 by showing that for any v e Ω \ Q_e, there is at least one point v'e Q_e such that v[ > ii and v' > v,-^■ - e for /> 1.

[00348] Algorithm 4, shown in FIG.14, recreates the sequence of CSOP problems generated by Iterative e -Constraints while ensuring that the bound b < v throughout. Since Algorithm 4 terminates when update b is not updated, this means that v' + e > v/for all / > 1. Summarizing, the final solution b and v'= ί ((Φ( )) satisfy b < v and v' > v,-- e for all / > 1. Since v is feasible for the CSOP with bound b, but (0(b)) = v'≠ v then v[ > v^. □

[00349] Given Theorem 4, the maximum distance for every objective between any missed Pareto optimal point and the closest computed Pareto optimal point is bounded by e. Therefore, as e approaches 0, the generated Pareto frontier approaches the complete Pareto frontier in the measure p(e). For example if there are k discrete solutions in the Pareto frontier and the smallest distance between any two is S then setting e = δ/2 makes Ω_ε = Ω. In this case, since each solution corresponds to a non-leaf node in the search tree, the number of leaf nodes is no more than (n - 1 )k. Thus the algorithm solves at most 0{nk) CSOPs.

[00350] MI LP APPROACH

[00351] Previously, a high level search algorith m for generating the Pareto frontier by producing a sequence of CSOPs was introduced. An exact approach is presented below for defining and solving a mixed-integer linear program (MI LP) formulation of a CSOP for MOSGs. It is then shown how heuristics that exploit the structure and properties of security games can be used to i mprove the efficiency of the MILP formulation.

[00352] Exact MILP Method

[00353] As stated above, to ensure Pareto opti mality of solutions

lexicographic maxi mization is required to sequentially maxi mizing all the objective functions. Thus, for each CSOP, n MILPs must be solved in the worst case where each MILP is used to maxi mize one objective. For the X^th MI LP in the sequence, the objective is to maxi mize the variable

which represents the defender's payoff for security game λ. This MILP is

constrained by having to maintain the previously maxi mized values d for 1 < j < λ as well as satisfy lower bound constraints for λ < k < n.

[00354] A lexicographic MILP formulation is presented for a CSOP for MOSGs in FIG. 14. Equation (1 ) is the objective function, which maxi mizes the defender's payoff for objective λ,

Equation (2) defines the defender's payoff. Equation (3) defines the opti mal response for attacker j. Equation (4) constrains the feasible region to solutions that maintain the values of objectives maxi mized in previous iterations of lexicographic maxi mization. Equation (5) guarantees that the lower bound constraints in b will be satisfied for all objectives which have yet to be opti mized.

[00355] If a mixed strategy is opti mal for the attacker, then so are all the pure strategies in the support of that mixed strategy. Thus, the pure strategies of the attacker were only considered. Equations (6) and (7) constrain attackers to pure strategies that attack a single target. Equations (8) and (9) specify the feasible defender strategy space.

[00356] Once the MI LP has been formulated, it can be solved using an opti mization software package such as CPLEX. It is possible to increase the efficiency of the MILP formulation by using heuristics to constrain the decision variables. A si mple example of a general heuristic which can be used to achieve speedup is placing an upper bound on the defender's payoff for the pri mary objective. Assume i is the defender's payoff for the pri mary objective in the parent CSOP and d[ is the defender's payoff for the pri mary objective in the child CSOP. As each CSOP is a maxi mization problem, it must hold that i > d[ because the child CSOP is more constrained than the parent CSOP. Thus, the value of i can be passed to the child CSOP to be used as an upper bound on the objective function.

[00357] FIG. 1 5 shows MILP formulation definitions for an embodi ment. The MI LP is a variation of the opti mization problem formulated previously for security games. The same variations can be made to more generic

Stackelberg games, such as those used for DOBSS, giving a formulation for multi-objective Stackelberg games in general.

[00358] Exploiting Game Structures

[00359] In addition to placing bounds on the defender payoff, it is possible to constrain the defender coverage in order to i mprove the efficiency of the MI LP formulation. Thus, an approach for translating constraints on defender payoff into constraints on defender coverage is realized. This approach, shown in FIG. 1 6 as Algorithm 5, and referred to herein as ORIGAMI-M, achieves this translation by computing the mini mum coverage needed to satisfy a set of lower bound constraints b such that Uf (c) >

for 1 < i < n.

This mini mu m coverage is then added to the MILP in FIG. 1 4 as constraints on the variable c, reducing the feasible region and leading to significant speedup as verified in experi ments.

[00360] ORIGAMI-M is a modified version of the ORIGAMI algorith m and borrows many of its key concepts. At a high level, ORIGAMI-M starts off with an empty defender coverage vector c, a set of lower bound constraints b, and m defender resources. An attempt is made to compute a coverage c which uses the mini mum defender resources to satisfy constraints b. If a constraint fo/ is violated, i.e., Uf {c) < b_h ORIGAMI-M updates c by computing the mini mum additional coverage necessary to satisfy £>,. Since a focus is on satisfying the constraint on one objective at a ti me, the constraints for objectives that were satisfied in previous iterations may become unsatisfied again. The reason is that additional coverage may be added to the target that was attacked by this attacker type, causing it to become less attractive relative to other alternatives for the attacker, and possibly reducing the defender's payoff by changing the target that is attacked. Therefore, the constraints in b should be checked repeatedly until quiescence (no chances are made to c for any bj). If all m resources are exhausted before b is satisfied, then the CSOP is infeasible.

[00361] The process for calculating mini mum coverage for a single constraint fo/ is built on two properties of security games: (1 ) the attacker chooses the opti mal target; (2) the attacker breaks ties in favor of the defender. The set of opti mal targets for attacker /^' for coverage c is referred to as the attack set, r,(c). Accordingly, adding coverage on target t Γ/ does not affect the attacker is strategy or payoff. Thus, if c does not satisfy £>,-, only consider adding coverage to targets in Γ,. Γ, can be expanded by increasing coverage such that the payoff for each target in Γ, is equivalent to the payoff for the next most opti mal target. Adding an additional target to the attack set cannot hurt the defender since the defender receives the opti mal payoff among targets in the attack set.

[00362] Referring to Algorithm 5 in FIG. 1 6, the idea for ORIGAMI-M is to expand the attack set Γ, until b is satisfied. The order in which the targets are added to Γ, is by decreasing value of U. {c_t , t) . Sorting these values, so that t/; (ci ,fi) > t/; ( C2,f₂) > ... > U" {C\ r ti n) , leads to l^~/(c) starts only with target ?i . Assume that the attack set includes the first q targets. To add the next target, the attacker's payoff for all targets in Γ, must be reduced to U. (c_q+i , iq₊i) (Line 1 1 ). However, it might not be possible to do this. Once a target t is fully covered by the defender, there is no way to decrease the attacker's payoff below U.'^a{t). Thus, if ma i_≤t≤_qUf'^a {t) > U {c_q+i,t_q+i) (Line

7) , then it is impossible to induce the adversary /^'to attack target ?_q+i. In that case, the attacker's payoff for targets in the attack set to maxi≤t≤_qU '^a {ή (Line

8) must be reduced. Then for each target fe Γ,, the amount of additional coverage, addCov[f\ is computed, necessary to reach the required attacker payoff (Line 13). If the total amount of additional coverage exceeds the amount of remaining coverage, then addedCov is recomputed and each target in the attack set is assigned ratio of the remaining coverage so to maintain the attack set (Line 17). There is then a check to see if c + addedCov satisfies fo, (Line 18). If fo_/is still not satisfied, then the coverage c is updated to include addedCov (Line 26) and the process is repeated for the next target (Line 28).

[00363] Then if c + addedCov expands Γ, and exceeds £>,, it may be possible to use less defender resources and still satisfy £>,. Thus, the algorithm MIN-COV, shown as Algorithm 6 in FIG. 17, is used to compute, Vf € Γ_/, the amount of coverage needed to induce an attack on V which yields a defender payoff of £>,. For each V, MIN-COV generates a defender coverage vector c', which is initialized to the current coverage c. Coverage c is updated such that the defender payoff for i' is £>,, yielding an attacker payoff U. {c'.,t) (Line 6). The coverage for every other target t e T\{t) is updated, if needed, to ensure that V remains in Γ,, i.e. U. {c'.,t) > U°{c_t.,t) (Line 9).

After this process, c'is guaranteed to satisfy £>,. From the set of defender coverage vectors, MIN-COV returns the c'which uses the least amount of defender resources. If while computing the additional coverage to added, either Γ, is the set of all targets or all m security resources are exhausted, then both fo, and the CSOP are infeasible.

[00364] If b is satisfiable, ORIGAMI-M will return the minimum coverage vector c^* that satisfies b. This coverage vector can be used to replace Equation (8) with c < α< λ. [00365] ORIGAMI-A

[00366] In the previous section, heuristics to i mprove the efficiency of the described MILP approach were shown. However, solving MILPs, even when constrained, is computationally expensive. Thus, ORIGAMI-A, shown as Algorithm 7 in FIG. 1 8, is presented as an extension to ORIGAMI-M which eliminates the computational overhead of MI LPs for solving CSOPs. The key idea of ORIGAMI-A is to translate a CSOP into a feasibility problem which can be solved using ORIGAMI-M. A series of these feasibility problems using binary search in order to approxi mate the opti mal solution to the CSOP is then generated. As a result, this algorithmic approach is much more efficient.

[00367] ORIGAMI-M computes the mini mum coverage vector necessary to satisfy a set of lower bound constraints b. As the MI LP approach is an opti mization problem, lower bounds are specified for the secondary

objectives but not the pri mary objective. This opti mization problem is converted into a feasibility problem by creating a new set of lower bounds constraints b⁺ by adding a lower bound constraint for the pri mary objective to the constraints b. The lower bound constraint can be set = min_{te T}U"'^D (?), the lowest defender payoff for leaving a target uncovered. Now instead of finding the coverage c which maxi mizes U? {c) and satisfies b,

ORIGAMI-M can be used to determine if there exists a coverage vector c such that b⁺ is satisfied.

[00368] ORIGAMI-A finds an approxi mately opti mal coverage vector c by using ORIGAMI-M to solve a series of feasibility problems. This series is generated by sequentially performing binary search on the objectives starting with initial lower bounds defined in b⁺. For objective /^', the lower and upper bounds for the binary search are, respectively, b* and max_te TU^ {t), the highest defender payoff for covering a target. At each iteration, b⁺ is updated by setting b⁺ = {upper + lower)/2 and then passed as input to ORIGAMI-M. If b⁺ is found to be feasible, then the lower bound is updated to 6 ^" and c is updated to the output of ORIGAMI-M, otherwise the upper bound is updated to b⁺ . This process is repeated until the difference between the upper and lower bounds reaches the termination threshold, a. Before proceeding to the next objective, b⁺ is set to Uf {c) in case the binary search terminated on an infeasible problem. After searching over each objective, ORIGAMI-A will return a coverage vector c such that U? {c) - U? {c)≤ a, where c^* is the optimal coverage vector for a CSOP defined by b.

[00369] The solutions found by ORIGAMI-A are no longer Pareto optimal. Let Ω_α be the objective space of the solutions found by ORIGAMI-A. Its efficiency loss can be bound using the approximation measure p(e,a) = max_{ve Ω} min_v,_e¾ maxi_≤i≤n. ( v,- - v_;).

[00370] THEOREM 5. p(e ,α) < max{e ,α}.

[00371] PROOF. Similar to the proof of Theorem 4, for each point v e Ω, Algorithm 2 can be used to find a CSOP with constraints b which is solved using ORIGAMI-A with coverage c such that 1) b,-< v_/for /> 1 and 2) v_t > v,--

€ for /> 1 where v'= L/^d(c).

[00372] Assume that the optimal coverage is c^* for the CSOP with

constraints b. It follows that U? {c) > since the coverage resulting in point v is a feasible solution to the CSOP with constraints b. ORIGAMI-A will terminate if the difference between lower bound and upper bound is no more than a. Therefore, v[ > U? {c) - a. Combining the two results, it follows that v ≥ V† - a.

[00373] Therefore, for any point missing in the frontier v e Ω, a point v' e Ω_α can be found such that 1 ) v > vi - a and

> v, - e for /^' > 1. It then follows that p(e , a) < max{e , a}. □

[00374] EVALUATION

[00375] An evaluation was performed by running the full algorithm in order to generate the Pareto frontier for randomly-generated MOSGs. For the experiments, the defender's covered payoff U.'^d {t) and attacker's uncovered payoff U"'^a {t) were uniformly distributed integers between 1 and 10 for all targets. Conversely, the defender's uncovered payoff U"'^d {t) and attacker's covered payoff U.'^a {t) were uniformly distributed integers between -1 and -

10. Unless otherwise mentioned, the setup for each experiment is 3 objectives, 25 targets, e = 1.0, and a = 0.001. The amount of defender resources m was fixed at 20% of the number of targets. For experiments comparing multiple formulations, all formulations were tested on the same set of MOSGs. A maximum cap on runtime for each sample is set at 1800 seconds, the MILP formulations were solved using CPLEX version 12.1. The results were averaged over 30 trials.

[00376] Runtime Analysis

[00377] Five MOSG formulations were evaluated. Referring to the baseline MILP formulation as MILP-B, the MILP formulation adding a bound on the defender's payoff for the primary objective is MILP-P. MILP-M uses

ORIGAMI-M to compute bounds on defender coverage. MILP-P can be combined with MILP-M to form MILP-PM. The algorithmic approach using ORIGAMI-A will be referred to by name. For the number of targets, all five formulations for solving CSOPs were evaluated. ORIGAMI-A and the fastest MILP formulation, MILP-PM, where then selected to evaluate the remaining factors. Results are shown in FIGS. 19-22.

[00378] Effect of the Number of Targets: This section presents results showing the efficiency of different MOSG formulations as the number of targets is increased. In FIG. 19, the x-axis represents the number of the targets in the MOSG. The y-axis is the number of seconds needed by Iterative e -Constraints to generate the Pareto frontier using the different formulations for solving CSOPs. The baseline MILP formulation, MILP-B, was observed to have the highest runtime for each number of targets tested. By adding an upper bound on the defender payoff for the primary objective, MILP-P was observed to yield a runtime savings of 36% averaged over all numbers of targets compared to MILP-B. MILP-M used ORIGAMI-M to compute lower bounds for defender coverage, resulting in a reduction of 70% compared to MILP-B. Combining the insights from MILP-P and MILP-M, MILP-PM achieved an even greater reduction of 82%. Removing the computational overhead of solving MILPs, ORIGAMI-A was the most efficient formulation with a 97% reduction. For 1 00 targets, ORIGAMI-A required 4.53 seconds to generate the Pareto frontier, whereas the MI LP-B took 229.61 seconds, a speedup of >50 ti mes. Even compared to fastest MILP

formulation, MI LP-PM at 27.36 seconds, ORIGAMI-A still achieved a 6 ti mes speedup. T-test yielded p-value<0.001 for all comparison of different formulations when there are 75 or 1 00 targets.

[00379] An additional set of experi ments were conducted to determine how MI LP-PM and ORIGAMI-A scale up for an order of magnitude increase in the number of targets by testing on MOSGs with between 200 and 1 000 targets. Based on the trends seen in the data, it was concluded that ORIGAMI-A significantly outperforms MI LP-PM for MOSGs with large number of targets. Therefore, the number of targets in an MOSG is not believed to be a prohibitive bottleneck for generating the Pareto frontier using ORIGAMI-A. See FIG. 20.

[00380] Effect of the Number of Objectives: Another key factor on the efficiency of Iterative e -Constraints algorithm is the number of objectives which determines the di mensionality of the objective space that Iterative e - Constraints must search. Experi ments for MOSGs with between 2 and 6 objectives were run. For these experi ments, the number of targets was fixed at 1 0. FIG. 21 shows the effect of scaling up the number of objectives. The x-axis represents the number of objectives, whereas the y-axis indicates the average ti me needed to generate the Pareto frontier. For both MILP-PM and ORIGAMI-A, an exponential increase in runti me was observed as the number of objectives is scaled up. For both approaches, the Pareto frontier was computed in under 5 seconds for 2 and 3 objectives. Whereas, with 6 objectives neither approach is able to generate the Pareto frontier before the runti me cap of 1 800 seconds. These results show that the number of objectives, and not the number of targets, is the key li miting factor in solving MOSGs.

[00381] Effect of Epsilon : A third critical factor on the running ti me of Iterative e -Constraints is the value of the e parameter which determines the granularity of the search process through the objective space. In FIG. 22, results are shown for e values of 0.1 , .25, .5, and 1 .0. Both MI LP-PM and ORIGAMI-A were observed to have a sharp increase in runti me as the value of € is decreased due to the rise in the number of CSOPs solved. For example, with e = 1 .0, the average Pareto frontier consisted of 49 points, whereas for e = 0.1 that number increased to 8437. Due to the fact that e is applied to the n - 1 di mensional objective space, the increase in the runti me resulting from decreasing e is exponential in the number of secondary objectives. Thus, using small values of e can be computationally expensive, especially if the nu mber of objectives is large.

[00382] Effect of the Si milarity of Objectives: In previous experi ments, all payoffs were sampled from a uniform distribution resulting in independent objective functions. However, it is possible that in a security setting, the defender could face multiple attacker types which share certain si milarities, such as the same relative preferences over a subset of targets. To evaluate the effect of objective si milarity on runti me, a single security game was used to create a Gaussian function with standard deviation a from which all the payoffs for an MOSG are sampled. FIG. 23 shows the results for using ORIGAMI-A to solve MOSGs with between 3 and 7 objectives using σ values between 0 and 2.0 as well as for uniformly distributed objectives. For σ = 0, the payoffs for all security games are the same, resulting in Pareto frontier consisting of a single point. In this extreme example, the number of objectives did not i mpact the runti me. However, as the number of objectives was increased, less dissi milarity between the objectives is needed before the runti me started increasing dramatically. For 3 and 4 objectives, the amount of si milarity had negligible i mpact on runti me. The experi ments with 5 objectives were observed to ti me out after 1 800 seconds for the uniformly distributed objectives. Whereas, the experi ments with 6 objectives were observed to ti me out at σ = 1 .0 and with 7 objectives at σ = 0.5. Thus, it is possible to scale to larger number of objectives if there is si milarity between the attacker types. [00383] Solution Quality Analysis

[00384] Effect of Epsilon : If the Pareto frontier is continuous, only a subset of that frontier can be generated. Thus, it is possible that one of the Pareto opti mal points not generated by Iterative e -Constraints would be the most preferred solution, were it presented to the end user. As was proved earlier, the maxi mum utility loss for each objective resulting from this situation could be bounded by e . Experi ments were conducted to empirically verify the bounds and to determine if the actual maxi mum objective loss was less than€ .

[00385] Ideally, the Pareto frontier generated by Iterative e -Constraints would be compared to the true Pareto frontier. However, the true Pareto frontier may be continuous and i mpossible to generate, thus the true frontier was si mulated by using e = 0.001 . Due to the computational complexity associated with such a value of e , the number of objectives to 2 was fixed. FIG. 24 shows the results for e values of 0.1 , .25, .5, and 1 .0. The x-axis represent the value of e , whereas the y-axis represents the maxi mum objective loss when comparing the generated Pareto frontier to the true Pareto frontier. It was observed that the maxi mum objective loss was less than€ for each value of e tested. At e = 1 .0, the average maxi mu m objective loss was only 0.63 for both MILP-PM and ORIGAMI-A. These results verify that the bounds for the MOSG algorithms presented herein are correct and that in practice a better approxi mation of the Pareto frontier can be generated, i.e. , better than the bounds might suggest.

[00386] Comparison against Uniform Weighting : The MOSG model was introduced, in part, because it eli minates the need to specify a probability distribution over attacker types a priori. However, even if the probability distribution is unknown it is still possible to use the Bayesian security game model with a uniform distribution. Experi ments were conducted to show the potential benefit of using MOSG over Bayesian security games in such cases. The maxi mu m objective loss sustained by using the Bayesian solution as opposed to a point in the Pareto frontier generated by Iterative e - Constraints was computed. If v ' is the solution to a uniformly weighted Bayesian security game then the equation for maxi mu m objective loss is

) . FIG. 25 shows the results for e values of 0.1 , .25, .5, and 1 .0. At e = 1 .0, the maxi mu m objective loss was observed to be only 1 .87 and 1 .85 for MILP-PM and ORIGAMI-A, respectively. Decreasing e all the way to 0.1 was shown to increase the maxi mu m objective loss by less than 1 2% for both algorith ms. These results suggest that e has li mited i mpact on maxi mum objective loss, which is a positive result as it i mplies that solving an MOSG with a large e can still yield benefits over a uniform weighted Bayesian security game.

[00387] Exemplary embodi ments of the described algorithms are presented below, with reference to FIGS. 1 2-1 8; these exemplary embodi ments are described below by way of example and other algorithms and variations of or additions/deletions to the described algorithms may be used within the scope of the present disclosure.

[00388] Algorithm 3: Iterative-Epsilon-Constraints

[00389] Line 1 : This is a heuristic that checks to see if a solution has already been computed for the lower bound vector, b. If it is, then this subproblem (CSOP) is pruned (not computed), helping to speed up the algorithm.

[00390] Line 2: If the CSOP is not pruned, then b is added to the list of lower bound vectors that have been computed. So if any CSOP in the future has a lower bound vector identical to b, it will be pruned.

[00391] Line 3: This is the CSOP (defined by b) being passed to the CSOP solver which returns the solution c.

[00392] Line 4: Checks to see if c is a feasible solution.

[00393] Line 5: Given that c is feasible solution (coverage vector over targets) then the vector v represents the payoffs for the defender (one payoff for each objective).

[00394] Line 6: For each feasible solution found, n - 1 CSOPs are

generated where n is the number of objectives for the defender.

[00395] Line 7: the lower bound vector b is copied into a new vector b'. [00396] Line 8: The lower bound is now updated for objective i in b' to the payoff for objective i obtained by solution c. A discretization factor epsilon is added to allow for a tradeoff between runti me and granularity of the Pareto frontier.

[00397] Line 9: b' is compared against the list of infeasible of lower bound vectors. If there exists a member, s, in that list for which the bounds for each objective in b' are greater than or equal to the corresponding bound in s then it is known that b' is also infeasible and should be pruned.

[00398] Line 1 0 : Recursive call to Iterative-Epsilon-Constraints on the updated lower bound vector b'.

[00399] Line 1 1 : If solution c is infeasible (from Line 4) then b is added to the list of infeasible lower bound vectors.

[00400] Figure 14: MI LP Formulation

[00401] Line 1 : The objective function maxi mizes the defender's payoff for objective lambda.

[00402] Line 2: Specifies the defender's payoffs for each objective and each target given the defender's and attackers' strategies.

[00403] Line 3: Specifies the attacker payoff for each attacker type and each target given the defender's and attackers' strategies.

[00404] Line 4: Guarantees that the payoffs for objectives maxi mized in previous iterations of the lexicographic maxi mization will be maintained.

[00405] Line 5: Guarantees that the lower bound constraints in b will be satisfied for all objectives which have yet to be opti mized.

[00406] Line 6: Li mits the attackers to pure strategies (either they attack a target or they don't).

[00407] Line 7: Ensure that each attacker only attacks a single target.

[00408] Line 8: Specifies that the amount coverage placed on each target is between 0 and 1 , since these values represent marginal probabilities.

[00409] Line 9: Specifies that the total amount coverage placed on all targets is less than the total number of defender resources, m. [00410] Algorithm 5: ORIGAMI-M

[00411] Line 1 : Initializes c (the solution to be returned by ORIGAMI-M to an empty coverage vector (no coverage on any targets).

[00412] Line 2: A while-loop that is repeated while the lower bound constraint for any objective in b is not satisfied by the defender payoffs produced by the current solution c.

[00413] Line 3: Sorts the list of targets in descending order according to attacker type i's payoff for attack each target given the current coverage c.

[00414] Line 4: Sets the variable left to the amount of remaining defender resources and the variable next (which represents the index in the sorted list of the next target to be added to the attack set) to 2.

[00415] Line 5: A while-loop that is repeated while there remain targets to be added to the attack set.

[00416] Line 6: A new coverage vector addCov (which will eventually be added to c) is initialized.

[00417] Line 7: Checks to see if there is a target in the current attack set which, regardless of the amount of the amount of coverage placed on it, will prevent the next target from being added to the attack set.

[00418] Line 8: If Line 7 is true, set variable x equal to fully covered payoff for attacker i on the target preventing the next target from being added to the attack set.

[00419] Line 9: The variable noninducibleNextTarget is set to true to indicate later that Line 7 was true.

[00420] Line 1 1 : If Line 7 is false, set variable x equal to payoff for attacker type i for attacking the next target to be added given the current coverage c.

[00421] Line 1 2 : A for-loop over each target currently in the attack set.

[00422] Line 1 3 : Calculates the amount of coverage that needs to be added such that each target in the attack set yields the payoff to attacker type i as the next target to added to the attack set, given c. [00423] Line 14: Checks to see if the amount of additional coverage need to add the next target to the attack set is greater than the amount of remaining defender resources.

[00424] Line 1 5 : If Line 14 is true, resourcesExceeded is set to true.

[00425] Line 1 6 / 1 7: If Line 1 4 is true, then addedCov is recomputed and each target in the attack set is assigned a ratio of the remaining coverage so as to maintain the attack set

[00426] Line 1 8 : Checks if combining the coverage vectors (c and

addedCov) produces a coverage vector which yields a defender payoff for objective i which satisfies the lower bound on objective i, b,.

[00427] Line 1 9 : MI N-COV is called to see if it is possible to use even fewer resources than the combined coverage vector while still satisfying the lower bound constraint on objective i. The result is stored as c'.

[00428] Line 20 : Checks to see if the solution returned by MIN-COV is feasible.

[00429] Line 21 : If Line 20 is true, the current solution c is updated to c'.

[00430] Line 22 : If this line is reached, the program/algorithm breaks out of the while-loop.

[00431] Line 23 : If Line 1 8 if false, a check is made to see if either the amount of defender resources has been exceeded or it is not possible to add the next target to the attack set.

[00432] Line 24: If Line 23 is true, the lower bound constraints in b cannot be satisfied and the CSOP is infeasible. Thus, ORIGAMI-M terminates.

[00433] Line 26 : If Lines 1 8 and 23 are false, the coverage vector

addedCov is added to the current coverage vector c.

[00434] Line 27: If Lines 1 8 and 23 are false, the amount of resources used is subtracted, to add the next target to attack set from the amount of remaining defender resources.

[00435] Line 28 : If Lines 1 8 and 23 are false, the next variable is

incremented to indicate that another target has been added to the attack set. [00436] Line 29 : Once the while-loop (Line 5) has completed, a check is made to see if all targets have been added to the attack set.

[00437] Line 30 : If Line 29 is true, a check is made to see if there are any remaining defender resources.

[00438] Line 31 : If Line 30 is true, MI N-COV is called which figures how best to allocate the remaining defender resources and returns that coverage vector as c.

[00439] Line 32 : a check is now made to see if the coverage vector returned by MIN-COV is feasible.

[00440] Line 33 : If Line 32 is true, the lower bound constraints in b cannot be satisfied and the CSOP is infeasible. Thus, ORIGAMI-M terminates.

[00441] Line 35 : If Line 30 is false, the lower bound constraints in b cannot be satisfied and the CSOP is infeasible. Thus, ORIGAMI-M terminates.

[00442] Line 36 : Returns the solution c.

[00443] Algorithm 6: MI N-COV

[00444] Line 2: Initializes c^* (the solution to be returned by MIN-COV) to an empty coverage vector (no coverage on any targets).

[00445] Line 3: Initializes the variable minResources to m (the total number of defender resources).

[00446] Line 4: A for-loop over each target t' in the attack set for attacker type i induce by the coverage vector c.

[00447] Line 5: Initialize a new coverage vector c' with c.

[00448] Line 6: Computes the amount of coverage needed on target t' to give the defender a payoff of b, for objective i if attacker type i attacks target t'.

[00449] Line 7: A for-loop over the set of targets minus target t'.

[00450] Line 8: Checks to see if the payoff for attacker type i is greater from attacking target t than target t' given the current amount of coverage placed on both targets.

[00451] Line 9: If it is, the coverage for target t is recomputed so that attacking target t yields the same payoff to attacker type i as target t'. [00452] Line 1 0 : Checks to see if c' satisfies the lower bound on defender payoff for objective i AN D uses less total coverage than minResources.

[00453] Line 1 1 : If Line 1 0 is true, set c^* (the best solution found so far) to c'.

[00454] Line 1 2 : If Line 1 0 is true, set minResources variable the amount of resources used in c'.

[00455] Line 1 3 : Return the solution c^*.

[00456] Algorithm 7: ORIGAMI-A

[00457] Line 1 : Initializes c (the solution to be returned by ORIGAMI-A) to an empty coverage vector (no coverage on any targets).

[00458] Line 2: Computes the lowest possible value for the pri mary objective (the target with the lowest payoff for the defender when fully uncovered).

[00459] Line 3: Initializes a new lower bound vector b+ as the union of the bound on the pri mary objective (objective 1 ) computed in Line 2 and the lower bound vector b for a given CSOP.

[00460] Line 4: The for-loop that specifies that will perform binary search over the defender payoff for each of the n objectives.

[00461] Line 5: The variable lower is initialized to the lower bound for objective i in b+.

[00462] Line 6: The variable upper is initialized to the highest possible value for objective i (the target with the highest payoff for the defender when fully covered).

[00463] Line 7: A while-loop that specifies that the binary search over the payoff for objective i continues until the termination condition is reached (i.e., the difference between the upper and lower bounds of the binary search are less than alpha).

[00464] Line 8: Computes the new lower bound for objective i in b+ by dividing the upper and lower bounds for the search in half (hence binary search). [00465] Line 9: ORIGAMI-M is called passing the updated lower bound vector b+ which returns the solution c'.

[00466] Line 1 1 : If c' is infeasible then the upper variable is updated with lower bound for objective I from b+.

[00467] Line 1 3 : If c' is feasible then c (the best solution found thus far) is updated to c' and the lower variable with the lower bound for objective i from b+.

[00468] Line 14: Once the binary search over objective i has terminated, the lower bound for objective i in b+ is updated to the defender payoff for objective i produced by solution c.

[00469] Line 1 5 : Return the solution c.

[00470] MOSG - CONCLUSION

[00471] A new model, multi-objective security games (MOSG), has been developed, as described herein, for domains where security forces balance multiple objectives. Advantageous features include: 1 ) Iterative e - Constraints, a high-level approach for transforming MOSGs into a sequence of CSOPs, 2) exact MILP formulations, both with and without heuristics, for solving CSOPs, and 3) ORIGAMI-A, an approxi mate approach for solving CSOPs. Bounds for both the complexity as well as the solution quality of the MOSG approaches were then provided; additionally detailed experi mental comparison of the different approaches was also presented, and these results confirmed the efficacy of the MOSG approach under tested

circumstances.

[00472] Accordingly, the present disclosure provides different solution methodologies for addressing the issues of protecting and/or patrolling security domains, e.g. , identified infrastructures or resources, with li mited resources. The solution methodologies can provide opti mal solutions to attacker-defender Stackelberg security games that are modeled on a real- world application of interest. These opti mal solutions can be used for directing patrolling strategies and/or resource allocation for particular security domains. It will be understood that any of the algorithms in accordance with the present disclosure can be considered a means for solving a Stackelberg game modeling a particular security domain.

[00473] The above-described features, such as algorithms and methods and portions thereof, and applications can be i mplemented as and/or facilitated by software processes that are specified as a set of instructions recorded on a computer readable storage medium (also referred to as computer readable medium). When these instructions are executed by one or more processing unit(s) (e.g. , one or more processors, cores of

processors, or other processing units), they cause the processing unit(s) to perform the actions indicated in the instructions. Examples of computer readable media include, but are not li mited to, CD-ROMs, flash drives, RAM chips, hard drives, EPROMs, etc. The computer readable media does not include carrier waves and electronic signals passing wirelessly or over wired connections.

[00474] In this specification, the term "software" is meant to include firmware residing in read-only memory or applications stored in magnetic storage or flash storage, for example, a solid-state drive, which can be read into memory for processing by a processor. Also, in some i mplementations, multiple software technologies can be i mplemented as sub-parts of a larger program while remaining distinct software technologies. In some

i mplementations, multiple software technologies can also be i mplemented as separate programs. Finally, any combination of separate programs that together i mplement a software technology described here is within the scope of the subject technology. In some i mplementations, the software programs, when installed to operate on one or more electronic systems, define one or more specific machine i mplementations that execute and perform the operations of the software programs.

[00475] A computer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, declarative or procedural languages, and it can be deployed in any form, including as a stand alone program or as a module, component, subroutine, object, or other unit suitable for use in a computing environment. A computer program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g. , one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g. , files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.

[00476] These functions described above can be i mplemented in digital electronic circuitry, in computer software, firmware or hardware. The techniques can be i mplemented using one or more computer program products. Programmable processors and computers can be included in or packaged as mobile devices. The processes and logic flows can be performed by one or more programmable processors and by one or more programmable logic circuitry. General and special purpose computing devices and storage devices can be interconnected through communication networks.

[00477] Some i mplementations include electronic components, for example microprocessors, storage and memory that store computer program instructions in a machine-readable or computer-readable medium

(alternatively referred to as computer-readable storage media, machine- readable media, or machine-readable storage media). Some examples of such computer-readable media include RAM, ROM, read-only compact discs (CD-ROM), recordable compact discs (CD-R), rewritable compact discs (CD- RW), read-only digital versatile discs (e.g., DVD-ROM, dual-layer DVD- ROM), a variety of recordable/rewritable DVDs (e.g. , DVD-RAM, DVD-RW, DVD+RW, etc.), flash memory (e.g., SD cards, mini-SD cards, micro-SD cards, etc.), magnetic or solid state hard drives, read-only and recordable Blu-Ray® discs, ultra density optical discs, any other optical or magnetic media, and floppy disks. The computer-readable media can store a computer program that is executable by at least one processing unit and includes sets of instructions for performing various operations. Examples of computer programs or computer code include machine code, for example is produced by a compiler, and files including higher-level code that are executed by a computer, an electronic component, or a microprocessor using an interpreter.

[00478] While the above discussion may refer to microprocessor or multi- core processors that execute software, some i mplementations can be performed by one or more integrated circuits, for example application specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs). In some i mplementations, such integrated circuits execute instructions that are stored on the circuit itself.

[00479] As used in this specification and any clai ms of this application, the terms "computer", "server", "processor", and "memory" all refer to electronic or other technological devices. These terms exclude people or groups of people. For the purposes of the specification, the terms display or displaying means displaying on an electronic device. As used in this specification and any clai ms of this application, the terms "computer readable medium" and "computer readable media" are entirely restricted to tangible, physical objects that store information in a form that is readable by a computer.

These terms exclude any wireless signals, wired download signals, and any other ephemeral signals.

[00480] To provide for interaction with a user, i mplementations of the subject matter described in this specification can be i mplemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well ; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input. In addition, a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a web browser on a user's client device in response to requests received from the web browser. [00481] The subject matter described in this specification can be i mplemented in a computing system that includes a back end component, e.g. , as a data server, or that includes a middleware component, e.g. , an application server, or that includes a front end component, e.g. , a client computer having a graphical user interface or a Web browser through which a user can interact with an i mplementation of the subject matter described in this specification, or any combination of one or more such back end, middleware, or front end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network ("LAN") and a wide area network ("WAN"), an internetwork (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to- peer networks).

[00482] The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. In some aspects of the disclosed subject matter, a server transmits data (e.g., an HTML page) to a client device (e.g., for purposes of displaying data to and receiving user input from a user interacting with the client device). Data generated at the client device (e.g., a result of the user interaction) can be received from the client device at the server.

[00483] It is understood that any specific order or hierarchy of steps in the processes disclosed is an illustration of example approaches. Based upon design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged, or that all illustrated steps be performed. Some of the steps may be performed si multaneously. For example, in certain circu mstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components illustrated above should not be understood as requiring such separation, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.

[00484] The components, steps, features, objects, benefits and advantages that have been discussed are merely illustrative. None of them, nor the discussions relating to them, are intended to li mit the scope of protection in any way. Numerous other embodi ments are also contemplated. These include embodi ments that have fewer, additional, and/or different

components, steps, features, objects, benefits and advantages. These also include embodi ments in which the components and/or steps are arranged and/or ordered differently.

[00485] Unless otherwise stated, all measurements, values, ratings, positions, magnitudes, sizes, and other specifications that are set forth in this specification, including in the clai ms that follow, are approxi mate, not exact. They are intended to have a reasonable range that is consistent with the functions to which they relate and with what is customary in the art to which they pertain.

[00486] All articles, patents, patent applications, and other publications that have been cited in this disclosure are incorporated herein by reference.

[00487] The phrase "means for" when used in a clai m is intended to and should be interpreted to embrace the corresponding structures and materials that have been described and their equivalents. Si milarly, the phrase "step for" when used in a clai m is intended to and should be interpreted to embrace the corresponding acts that have been described and their equivalents. The absence of these phrases in a clai m mean that the clai m is not intended to and should not be interpreted to be li mited to any of the corresponding structures, materials, or acts or to their equivalents.

[00488] The scope of protection is li mited solely by the clai ms that now follow. That scope is intended and should be interpreted to be as broad as is consistent with the ordinary meaning of the language that is used in the clai ms when interpreted in light of this specification and the prosecution history that follows and to encompass all structural and functional

equivalents. Notwithstanding, none of the claims are intended to embrace subject matter that fails to satisfy the requirement of Sections 1 01 , 1 02, or 1 03 of the Patent Act, nor should they be interpreted in such a way. Any unintended embracement of such subject matter is hereby disclai med.

[00489] Except as stated i mmediately above, nothing that has been stated or illustrated is intended or should be interpreted to cause a dedication of any component, step, feature, object, benefit, advantage, or equivalent to the public, regardless of whether it is or is not recited in the clai ms.

[00490] The terms and expressions used herein have the ordinary meaning accorded to such terms and expressions in their respective areas, except where specific meanings have been set forth. Relational terms such as first and second and the like may be used solely to distinguish one entity or action from another, without necessarily requiring or i mplying any actual relationship or order between them. The terms "comprises," "comprising," and any other variation thereof when used in connection with a list of elements in the specification or clai ms are intended to indicate that the list is not exclusive and that other elements may be included. Si milarly, an element proceeded by "a" or "an" does not, without further constraints, preclude the existence of additional elements of the identical type.

[00491] The Abstract is provided to help the reader quickly ascertain the nature of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or li mit the scope or meaning of the clai ms. In addition, various features in the foregoing Detailed Description are grouped together in various embodi ments to streamline the disclosure. This method of disclosure is not to be interpreted as requiring that the clai med embodi ments require more features than are expressly recited in each clai m. Rather, as the following clai ms reflect, inventive subject matter lies in less than all features of a single disclosed embodi ment. Thus, the following clai ms are hereby incorporated into the Detailed Description, with each clai m standing on its own as separately clai med subject matter.

Claims

CLAIMS What is clai med is:

1 . A computer-executable program product for determining a

defender's patrolling strategy within a security domain and according to according to a Stackelberg game in which the attackers have a quantal response (QR) strategy, the computer-executable program product comprising a non-transitory computer-readable medium with resident computer-readable instructions, the computer readable instructions comprising instructions for: fixing the policy of a defender to a mixed strategy according to a

Stackelberg game for a security domain including a set of targets that the defender covers, wherein the defender has limited resources; formulating an optimization problem for a strategy an attacker follows, wherein the optimization problem is for the optimal response to the leader's policy, wherein the attacker's strategy is a quantal response (QR) strategy; maximizing the payoff of the defender, given that the attacker uses an optimal response that is function of the defender's policy, and formulating the problem as a non-convex fractional objective function having a polyhedral feasible region; performing a binary search to solve the problem, wherein the binary search includes iteratively estimating a global optimal value of the fractional objective function; reformulating the defender payoff problem as a convex objective function by performing a non-linear variable substitution; solving the convex objective function to find the optimal solution, wherein the defender's strategy for the security domain is determined; and directing a patrolling strategy of the defender within the security domain based on the optimal solution.

2. The computer-executable program product of claim 1 , wherein the step of solving the convex objective function to find the optimal solution comprises using a piecewise linear function to approximate the nonlinear objective function, wherein the objective function is converted to a mixed-integer linear program (MILP).

3. The computer-executable program product of claim 1 , wherein the computer-readable instructions comprise the optimization problem having resource assignment constraints.

4. The computer-executable program product of claim 2, wherein the computer-readable instructions comprise the MILP having resource assignment constraints.

5. The computer-executable program product of claim 2, wherein the MILP is of the form:

subject to ∑ ∑x_ik <M

iei^ k=l subject to 0<x_ft <-^, Vz, k = \...K,

K subject to z_ik—≤x_ik, Vz, k = \...K-\ subject to x_ik(k+l)≤z_ik, Vz, k = \...K-\._, subject to z_fte{0,l}, Vz, k = \...K-\ subject to ∑x_ik = Vz e

subject to a_j =l, and subject to 0≤<2_j <1, A_jG Ji and wherein τ is the set of targets; /^' e T denotes target i ; x, is the

Probability that target /^' is covered by a resource; R is the defender reward for covering /^' if it is attacked; p ^d. is the defender penalty on not covering /^' if it is attacked; R ^a. is the attacker reward for attacking /^' if it is not covered; p ^a. is the attacker penalty on attacking /^' if it is covered; A is the set of defender strategies; A_j e A denotes f^h strategy; a_y is the probability for defender to choose strategy A_{j ;} and, M is the total number of resources.

6. A system for determining a defender's patrolling strategy within a security domain, the system comprising: a memory a processor having access to the memory and configured to: fix the policy of a defender to a mixed strategy according to a Stackelberg game for a security domain including a set of targets that the defender covers, wherein the defender has limited

resources; formulate an optimization problem for a strategy an attacker follows, wherein the optimization problem is for the optimal response to the leader's policy, wherein the attacker's strategy is a quantal response (QR) strategy; maximize the payoff of the defender, given that the attacker uses an optimal response that is function of the defender's policy, and formulating the problem as a non-convex fractional objective function having a polyhedral feasible region; perform a binary search to solve the problem, wherein the binary search includes iteratively estimating a global optimal value of the fractional objective function; reformulate the defender payoff problem as a convex objective function by performing a non-linear variable substitution; solve the convex objective function to find the optimal solution, wherein the defender's strategy for the security domain is determined; and direct a patrolling strategy of the defender within the security domain based on the optimal solution.

7. The system of claim 6, wherein the processor is further configured to formulate an optimization problem for a strategy for a plurality of attacker.

8. The system of claim 6, wherein the processor is further configured to formulate the objective function for a plurality of defenders.

9. The system of claim 6, wherein the processor is configured to solve the convex objective function to find the optimal solution comprises using a piecewise linear function to approximate the nonlinear objective function, wherein the objective function is converted to a mixed-integer linear program (MILP).

10. The system of claim 6, wherein the processor is configured solve the optimization problem using a means for solving a Stackelberg game modeling the security domain.

1 1 . A computer-executable program product for determining a defender's patrolling strategy within a security domain and according to a

Bayesian Stackelberg game model, the computer-executable program product comprising a non-transitory computer-readable medium with resident computer- readable instructions, the computer readable instructions comprising instructions for: fixing the policy of a defender to a mixed strategy according to a

Stackelberg game for a security domain including a set of targets that the defender covers; formulating an optimization problem for a strategy of each of a plurality of different attacker types, wherein each different type of attacker has its own optimization problem with its own respective payoff matrix for the optimal response to the leader's policy; formulating the strategy of the defender as an optimization problem with a defender objective function; formulating a search tree having a plurality of levels and a plurality of leaf nodes, wherein one attacker type is assigned to a pure strategy at each tree level, and wherein each leaf node is represented by a linear program that provides an optimal leader strategy such that the attacker's best response for every attacker type is the chosen target at that leaf node; performing a best-first search in the search tree; obtaining upper and lower bounds at internal nodes in the search tree; solving the defender objective function to find the optimal solution, wherein the defender's strategy for the security domain is determined; and directing a patrolling strategy of the defender within the security domain based on the optimal solution.

12. The computer-executable program product of claim 1 1 , wherein the step of obtaining upper and lower bounds at internal nodes in the search tree comprises using an upper-bound (UB) linear program (LP) within an internal search node to produce an upper bound (UB) and a feasible solution.

13. The computer-executable program product of claim 12, wherein the feasible solution is utilized to produce a lower bound (LB) for the search, by determining the follower best response to the feasible solution.

14. The computer-executable program product of claim 12, wherein the computer-readable instructions comprise instructions for solving the upper-bound LP using Bender's decomposition.

15. The computer-executable program product of claim 14, wherein the computer-readable instructions further comprise instructions for reusing Bender's cuts from a parent node of the leaf nodes for those in its child nodes.

16. A computer-executable program product for determining a

defender's patrolling strategy within a security domain and according to a Stackelberg game model, the computer-executable program product comprising a non-transitory computer-readable medium with resident computer-readable instructions, the computer readable instructions comprising instructions for: fixing the policy of a defender to a mixed strategy according to a

Stackelberg game for a security domain including a set of targets that the defender covers; formulating an optimization problem for a strategy an attacker follows, wherein the optimization problem is for the optimal response to the leader's policy, ; formulating the strategy of the defender as an optimization problem with multiple defender objective functions; solving the defender objectives functions to find a Pareto frontier representing multiple Pareto optimal solutions, wherein the defender's strategy for the security domain is determined based on the Pareto frontier; and directing a patrolling strategy of the defender within the security domain based on a selected a Pareto optimal solution of the Pareto frontier.

17. The computer-executable program product of claim 16, wherein the Pareto frontier is determined using the Iterative e -Constraints algorithm.

18. The computer-executable program product of claim 17, wherein the step of using the Iterative e -Constraints algorithm includes formulating multiple constrained single-objective optimization problems (CSOPs).

19. The computer-executable program product of claim 18, wherein the computer-readable instructions comprise instructions for formulating the multiple CSOPs in MILP form.

20. The computer-executable program product of claim 19, wherein the computer-readable instructions comprise instructions for solving the MILP.