## Programme 1 Architectures parall#les, bases de donn#es, r#seaux et syst#mes distribu#s

INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE

On submodular value functions of dynamic programming

Eitan Altman & Ger Koole

N? 2658

Octobre 1995

PROGRAMME 1

apport de recherche

ISSN 0249-6399

On submodular value functions of dynamic programming

Eitan Altman & Ger Koole

Programme 1 Architectures parall les, bases de donn es, r seaux et syst mes distribu s Projet Mistral Rapport de recherche n 2658 Octobre 1995 23 pages

in complex Dynamic programming (DPs). We consider in particular DPs that include concatenation and linear combinations of standard DP operators, as well as combination of maximizations and minimizations. These DPs have many applications and interpretations, both in stochastic control (and stochastic zero-sum games) as well as in the analysis of (noncontrolled) discrete-event dynamic systems. The submodularity implies the monotonicity of the selectors appearing in the DP equations, which translates, in the context of stochastic control and stochastic games, to monotone optimal policies. Our work is based on the score-space approach of Glasserman and Yao. Key-words: dynamic programming, submodularity, admission control, stochastic games

Abstract: We investigate in this paper submodular properties of the value function arrizing

(R sum : tsvp) Email: {altman,gkoole}@sophia.inria.fr

Unite de recherche INRIA Sophia-Antipolis ? 2004 route des Lucioles, BP 93, 06902 SOPHIA-ANTIPOLIS Cedex (France) Telephone : (33) 93 65 77 77 – Telecopie : (33) 93 65 77 65 ?? ??

valeur qui vient de la programmation dynamique (PD) complexe. En particulier nous consid rons des successions et des combinaisons lin aires des op rateurs standards de la PD, et aussi des combinaisons de maximisation et minimisation. Ces programmes dynamiques ont plusieurs applications en contr le stochastique (et jeux stochastiques sommes nulles), mais aussi dans l'analyse des syst mes v nements discrets non contr l s. La sous-modularit implique la monotonie des actions qui appara ssent dans les quations de la PD. Ces propri t s conduisent des politiques optimales monotones pour le contr le stochastique et les jeux stochastiques. Notre travail est bas sur la m thode des compteurs de Glasserman et Yao. Mots-cl : programmation dynamique, sous-modularit , contr le d'admission, jeux stochastiques

R sum : Dans cet article, nous tudions les propri t s sous-modulaires de la fonction de

Fonctions de valeur sous-modulaires et programmation dynamique

On submodular value functions

3

1 Introduction

Dynamic programming has been a key tool for the study of control of stochastic systems in di erent areas of applications. White has presented 48] in 1985 a survey of applications of Markov decision processes where the results have been implemented or have had some in uence on decisions. Fields of applications covered were economy, marketing, nance, hydroelectricity, maintenance, hospital admission scheduling, pavement management, shing, environmental studies, control of reservoirs, agriculture policy, chemical and biological control. Most of the applications were handled using dynamic programming (dp) techniques. Meanwhile, there has been an important impact of dp techniques in stochastic control and optimization in areas other than those covered by White's survey. In the eld of telecommunication networks, it has been used in bu er management for ATM (asynchronous transfer mode) switches, in mobile communications 36], and in telephone networks (e.g. 16]). Applications in airline management can be found in 29]. Dynamic programming techniques have been successfully applied in di erent areas in image processing. For example, in 33] (Chapter 13) it is used for reconstruction of shape from shading, and in 37] (and references therein) it is used for line detection. Another type of problems (not related to control) in which dp has proven to be a tool of major importance is the calculation of shortest or longest paths in graphs. For example, the computation of execution times of tasks on parallel computers 13] requires dp for the evaluation of maximum path on some graphs. The maximization in the dp is due to the fact that a task can be executed only after the execution of all its predecessor tasks have been completed; it is thus related to and type of constraints (see 11]). Problems involving shortest paths on graphs are related to or type constraints; thus, the minimum delay it takes for a message to arrive to a destination in a network is often computed using dp with minimization (the Bellman-Ford algorithm, see Bertsekas and Gallager 14], and other algorithms 39]). Quite often, the direct use of dp for computing performance measures or optimal control policies is impossible due to the curse of dimensionality . However, by exploiting some structure of the problem, dp can be used to establish some properties of the optimal control, and in particular, one can show that optimal controls belong to some small subclass of policies that can be characterized by some parameters. The original control problem can then be transformed to a simpler optimization problem over this parameter space. In admission control and ow control problems into queueing networks it has been shown already in the beginning of the seventies ( 45, 51]) that, under some conditions, one may restrict to threshold policies (or more generally, to monotone policies). Computation of an optimal policy within these class of policies goes back to Naor 38]. Similar results were shown to hold in the control of service vacations, see 50, 24, 44]. Other types of structural results were obtained for optimal scheduling of service: 28] (for which a c xed priority rule is obtained), and LEPT ( 15]) and optimal routing to queues (join the shortest queue, 49], and optimal switching curves 21]).

RR n 2658

4

Eitan Altman & Ger Koole

Since the above results were obtained, in the sixtees and seventees, there was a growing interest in identifying general properties of problems involving dp, that produce structural properties of optimal policies, and several abstract approaches have been proposed. For example, White 48] has proposed an abstract approach for monotonicity of optimal policies in stochastic control problems with noisy information. Results on the control of generalized semi-Markov processes (GSMP) were obtained in 18, 19], that are based on submodularity properties of the value functions. Their methodology is quite related to Weber and Stidham 46] who consider a continuous time model of a cycle of queues with Poisson arrivals, and show that submodularity for every pair of service events propagates. They also indicate how their results can be applied to other models. In the applications as well as in the abstract frameworks described in the previous paragraphs, we focused on standard dp equations having a form, in the simplest cases, of a minimization of (or maximization) over some sets of actions of the sum of: (i) an immediate cost; plus (ii) some future value after one step. More generally, they had the form of an immediate cost, plus the sum of minimizations of di erent immediate costs plus a future value after one step. There exist however, more complex forms of dp, which we shall call complex dp . Here are some examples: (i) dp's with both maximization and minimization (or the val operator of matrix games) have already been used long ago as a tool for solving zero-sum dynamic games, see e.g. 43]. This may re ect dynamic control problems where two controllers have opposite objectives, or also a situation of dynamic control by a single controller with a worst case type of a criterion 7, 5, 32]. Recently, this type of dp was used in the context of (non-controlled) discrete event systems, for the performance evaluation of parallel processing with both or and and constraints 30]. (ii) Convex combinations of dp operations. This situation arises in team problems with special information structure (see 41, 27]), and in control problems with a random environment 25, 26, 7]. (iii) Coupling of the dp equation with other linear equations. This occurred in equations describing the time evolution of free choice Petri-nets (which are useful for the modeling of parallel computing), see 12]. (iv) Composition of di erent Dynamic programming operators. This situation arrizes when several (possibly controlled) events happen consecutively, immediately after each other. In many cases, the above phenomena appear together (e.g. 7]). There has been growing research literature in recent years on structural properties in speci c problems involving complex dp. Structural properties in stochastic zero-sum games were obtained in 1, 3, 2, 5, 6, 7, 32]. (For results in non-zero sum stochastic games, see 22, 23, 35, 4].) Structural results were obtained in problems involving composition and convex combinations of dynamic programming operators, see e.g. 25, 26, 7]. The starting point of this paper is the framework of Glasserman and Yao 18, 19] for analyzing monotone structural properties in dp. The analysis is based on the submodularity of value functions that imply the monotonicity properties of optimal strategies. An essential contribution of Glasserman and Yao 18, 19] was to identify a natural context of analysis

INRIA

On submodular value functions

5

of monotonicity. Indeed, they showed that under fairly general conditions, one may obtain monotonicity in some augmented state space of the scores (these are counters of di erent types of events). One may then obtain monotonicity with respect to the original state space if some further (more restrictive) conditions on the structure of the problem are satis ed. The purpose of this paper is to extend the abstract framework of 18, 19] to complex dp. In particular, we shall consider dp involving compositions and convex combinations of dp operators, and combination of both minimization and maximization operators. We further investigate monotonicity in min-max types of dp, and monotonicity in some xed parameters of the problem. In the context of control, the monotonicity results mean that a certain parameter should be controlled monotonically as function of some state parameter. In section 2 we start by formulating the value function in terms of the scores, using complex dp. Given a mapping from scores into states, we derive the value function as formulated in terms of the states. The advantages of using complex dp are explained. Then we present our three main examples: the control of a single queue, the cycle and the tandem of queues studied in 46], and a fork-join queue. For all three it is shown how complex dp allows us to study at the same time di erent variants of the models, like both discrete and continuous time models. Section 3 contains the main result. First we study an unbounded score space, and it is shown how submodularity, which is the crucial property, propagates. Bounds to the score space (which prevent for example a negative queue length in the state space model) are introduced by taking costs outside the bounded score space in nite, and it is shown how submodularity leads to the wanted monotonicity properties. The results are illustrated with the examples, showing for example in the fork-join queue that servers work harder if there were more arrivals, but also that, if one server manages to reduce his queue length fast, all others start working harder to meet up. We study games in section 4. For technical reasons we con ne ourselves to two events, assuming that one of them has a maximization instead of a minimization operator. Again monotonicity results hold, and are illustrated with the examples.

2 Model

Our approach in this section is to start by de ning a value function, based on a set of dp operators. We shall present two variants of the dp. The rst corresponds to working in the so called score space (or model state), and the second corresponds to the so called system state. Quantities that appear in the dp equations may have di erent interpretations, which will be presented later. We introduce two sets that, together, have the role of states of a system described by dp equations: The countable set of environment states, denoted by V ,

2.1 The value function: scores and environment states

RR n 2658

6

Eitan Altman & Ger Koole

the set of scores D = (Dv ; v 2 V ), where Dv INm are the scores available at environment state v (with IN = f0; 1; 2; : : :g, and m some xed number). An element (v; d), with v 2 V and d 2 Dv is called a (model) state. One important reason for adding the environment states (with respect to the basic model in 18, 19]) is that the submodularity properties of the value function (and hence, the monotonicity results) might hold only for a subset of scores. Without the extra environment states, it is required to have submodularity simultaneously for all scores. (In fact, 18, 19] allow for a weaker condition, i.e. to have either submodularity or supermodularity. However, there is a gap in the proof for that more general setting in Lemma 4.2. See 20].) Another advantage of introducing the environment states is that in the context of stochastic control, it allows us to relax the independence and exponential assumptions on the time between transitions, which was previously considered ( 18, 19]), and is quite unnatural in many applications, such as telecommunication networks. The role and interpretation of scores and environment states will become clear in many examples later on. In 18, 19] the scores are de ned as counters of events in the context of GSMPs. For every 1 i m and every function f (de ned on all (v; d) with v 2 V and d 2 Dv ) we de ne the dynamic programming (dp) operator Ti by

Ti f (v; d) = hi (v; d) + min fci( ) + f (v; d + ei) + (1 ? )f (v; d)g

i i

(1)

Ti f (v; d) = hi (v; d) + f (v; d) if d + ei 62 Dv . In the rst case we call event i active, in the latter case inactive. Both hi and ci are arbitrary cost functions, 0 i i 1, and ei = (0; : : : ; 0; 1; 0; : : : ; 0) with the 1 in ith position. Now we de ne the value function. Let Sr for 1 r R represent a sequence of dp operators, i.e., Sr is of the form Sr = Ti1 : : : Tia . The value function is recursively de ned by X r X qvw Sr Jn (w; d): (2) Jn+1 (v; d) = vw

Here we call vw the transition probabilities of the environment. We assume that w vw = 1 for each v, and that J0 is given. Furthermore, to prevent leaving the state space, we assume that Dv Dw if vw > 0. The function Jn is our main subject of investigation. In Section 3 we formulate conditions on D, fi, ci and J0 under which certain properties of Jn propagate. From this we will be able to derive monotonicity properties of the optimal policy. But let us rst consider which systems can be analyzed using this type of value function.

w2V

1

if d + ei 2 Dv and

r R

P

INRIA

On submodular value functions

7

2.2 Model states versus system states

Quite often, there may exist a state space X that describes the system and is simpler and smaller" (in some sense) than (V; D). E.g., consider an M/M/1 queue; its state can be described by two counters, one counting the arrivals and the other the departures (this would correspond to the D component in the previous state description); alternatively, the queue can be described by a single system variable describing its length. As in 18, 19], it is in the rst formulation (i.e. of the enlarged state space (V; D)) that monotonicity will be obtained in a natural way, and under some further assumptions, these properties can be translated to monotonicity properties with respect to X . We next obtain a dynamic programming equation for the new state X that corresponds to (2. We assume that there exists a function which maps the model states (v; d) into the ~ system states such that X = f (v; d) j v 2 V; d 2 Dv g. We will construct a value function Jn ~n( (v; d)) = Jn (v; d), and such that minimizing actions correspond. The on X such that J main problem is the fact that a system state can be represented by several model states, i.e., is not injective. ~ ~ We de ne J0 , hi : X ! IR, and assume that for all (v1; d1 ) and (v2; d2 ), di 2 Dvi , 1 ~ such that (v ; d1 ) = (v2; d2 ), J0 ( (v1; d1 )) = J0(v1; d1 ) = J0(v2 ; d2 ) and ~ i( (v1; d1 )) = h 1 1 2 2 hi(v ; d ) = hi (v ; d ). This means that all direct costs in corresponding states are equal. Now we de ne the dp operators for the system. To assure that the same events are active in corresponding model states, we assume that for all (v1; d1 ) and (v2; d2 ), di 2 Dvi , such that (v1 ; d1 ) = (v2; d2 ), an event is either active or inactive in both (v1; d1 ) and (v2; d2 ). To every event in the model space (resulting in a transition say from (v; d) to (v; d + ei)) corresponds a transition in the system state, say from x = (v; d) to Aix = (v; d + ei). ~ Thus de ne the dp operators Ti by ~ Tif~(x) = ~ i(x) + min fci( ) + f~(Aix) + (1 ? )f~(x)g h

i i

~ h Ti f~(x) = ~ i(x) + f~(x) ~if~(x) = Ti f (v; d). if Aix 62 X . From this it follows that T Finally, we have to consider the changes in the environment. Here the conditions are somewhat complicated; the reasons for this choice are well illustrated in the admission control example, later in this section. We assume that if there are d1, d2 such that (v1; d1 ) = r r (v2; d2 ), then v1 w1 = v2 w2 and qv1 w1 = qv2 w2 if (w1 ; d1 ) = (w2; d2 ). Now we can r ~ and q by ~ (v;d) (w;d) = vw and qr (v;d) (w;d) = qvw . unambiguously de ne ~ ~ ~ ~ ~ Finally, if Sr = Ti1 : : : Tia , let Sr = Ti1 : : : Tia . The value function in the system state is de ned by X X r ~ qxy Sr Jn(y): ~ ~ ~ Jn+1 (x) = ~xy ~ It can now be shown inductively that Jn ( (v1; d1 )) = Jn (v1; d1 ) = Jn (v2; d2 ) if (v1; d1 ) = (v2; d2 ).

y

1

if Aix 2 X and

r R

RR n 2658

8

Eitan Altman & Ger Koole

Often the events have a physical interpretation in the system state, like an arrival to a queue. By de ning the system state through the model, and the way Jn is de ned, the system state cannot depend on the order of events. This property is called permutability in 18]. There the analysis is done in the other direction: starting from a permutable system a value function in terms of the scores is derived.

Remark In the next section we will show certain properties of the value function Jn hold

also for Ti1 :::Tia Jn . Then, after showing that the same properties are preserved while taking linear combinations, we conclude that the same properties hold for Jn+1 . This approach however can also be applied to other models, falling outside the scope of this paper. For example, the optimality of the c rule for discrete time models follows directly from the result for the continuous time version, as given in Hordijk & Koole 26]. Note that this discrete time result is proven directly in Weishaupt 47]. This ends the description of the modeling phase.

2.3 Interpretation of the dynamic programming

In the context of Markov decision processes (MDPs), (2) may re ect the following. The problem is already in a time-discretized form (rather than a continuous time problem). The macro stages: There are some basic time periods (called macro-stages) parametrized by n, and Jn (v; d) is the value corresponding to n more steps to go, given that the initial state is (v; d). The state: The state space of the MDP is a tuple (v; d; r; k). r denotes a micro-state (or phase) within the period n, and k will denotes a micro-stage within that phase. k can be taken to be 0 at the beginning of a macro-stage. Actions, transitions and costs: At the beginning of a macro-stage we are at some state r (v; d; 0). With probability vw qvw we move instantaneously to environment state w, and to a type r micro-state (phase), 1 r R. We may denote the new state as (w; d; (r; 0)). If the transition was to some phase r then a speci c r-type sequence of ar instantaneous controlled transitions occur, and the state evolution has the form (w; d; (r; 0)) ! (w; d1 ; (r; 1)) ! ::: ! (w; dar ?1 ; (r; ar ? 1)) ! (w; dar ; 0): We may thus consider a second time counter that counts these micro stages within the phase r. The state after the kth instantaneous transition within a r-type sequence (r phase) has the form (w; dk ; (r; k)). The transition probabilities within consecutive micro-stages are functions of the current state and action. At the kth micro-stage, an action 2 j(k;r) ; j(k;r) ] is chosen, the state moves to (w; d + ej(k;r) ; (r; k + 1)) with probability i, and to (w; d; (r; k + 1)) with probability 1 ? . (Here, j (k; r) de nes which dp operator is used at the kth micro-stage in the rth macro-stage.) Moreover,

INRIA

On submodular value functions

9

an immediate cost of hj(k;r) (w; d) + cj(k;r) ( ) is charged. After ar micro transition we arrive to a new macro-stage with a state of the form (w; dar ; 0). In the above description we have in fact two time parameters, one - counting the time period n (the macro-stage), and a second, within each time period, which counts the micro-stage. This distinction may become important when one is interested in monotonicity properties in time. Standard objectives are expected costs over a nite interval, the expectedP average costs and discounted costs. Note that discounting can be introduced by taking w vw = , where < 1 is the discount factor. There is a large literature on the existence of optimal policies and the convergence of Jn to the discounted or average costs. Although general conditions can be given, we refrain from doing so and deal with these issues for each model separately. In applications, a macro-stage often corresponds to models with either (i) a xed time period (slotted time); these models are called discrete-time models. (ii) sampled versions of continuous time models. In that case, the duration of a macro-stage is exponentially distributed. This model is the one obtained by the uniformization techniques due to Lippman 34] (see also Serfozo 42]). (This is the type of model that was studied in 18, 19].) We call these the time-discretized models. A basic restrictive feature of the time-discretized models, as studied in 18, 19], is that only one single transition may occur at a time (i.e. at a macro-stage). On the other hand, in discrete time models, typically several transitions may occur in each time-period (macrostage). This makes discrete time models usually harder to analyze. However, if we assume that in a discrete time model events happen one after each other, possibly instantaneously, we can model it with consecutive dp operators, each representing a single event. This results typically in convex combinations of dp operators, although more complex constructions can be useful. Thus our dp formulation (2) enables to handle these situations as well. Consider a Markov policy that chooses when there are n more macro-stages to go and when the micro-state (phase) and micro-stage are (k; r), a minimisor appearing in 0 Tik (r) Jn (v; d; k; r), k = 0; 1; :::; a(r) ? 1, where J 0 (v; d; k; r) = Tik+1 (r):::Tia (r) Jn (v; d). It is well known that this policy is optimal under fairly general conditions (see e.g. 40]).

2.4 In nite horizon

The dynamic programming equation (2) may appear in the form of a xed point equation:

J (v; d) =

X

w 2V

vw

1

X

r R

r qvw Sr J (w; d):

(3)

Under fairly general conditions, there exists a solution to (3) and it is unique within some class of functions (see e.g. 40]). Moreover, in the context of control, J (v; d) equals the optimal cost (the minimum cost over all policies) for a control problem with an in nite horizon (in nitely many macrostages) and with initial state (v; d). Consider a stationary policy that chooses when the

RR n 2658

10

Eitan Altman & Ger Koole

0 micro-state (phase) and micro-stage are (k; r), a minimisor appearing if Tik (r) Jn (v; d; k; r), k = 0; 1; :::; a(r) ? 1, where J 0 (v; d; k; r) = Tik+1 (r):::Tia(r) J (v; d). It is well known that this policy is optimal under fairly general conditions (see e.g. 40]). This happens to be true for both continuous time models that are time-discretized (where discretization is done at time instants that are exponentially distributed, 18, 19]) or for real continuous time control (see e.g. Fleming and Soner 17] Ch. 3). Finally, J (v; d) can be computed using value iteration, i.e. by calculating Jn inductively in (2) and taking the limit as n ! 1. We present in the next sections conditions for the submodularity of Jn. This will imply the submodularity of the limit J , and hence the existence of monotone policies for both the continuous time control problem as well as its discretized version.

Admission control model This example consists of a single queue for which the arrivals can be controlled. To model this we take two events, the rst corresponding to arrivals, the second to departures. For the moment we assume that the environment consists of a single state (which we omit from the notation). Thus the model state is d = (d1; d2 ), and D = fd j d1 d2 g. The two dynamic programming operators can for example be de ned by T1 f (d) = h(d) + 0 min f(1 ? )K + f (d + e1) + (1 ? )f (d)g

and if d1 > d2 and

1

2.5 Examples

T2 f (d) = h(d) + min f? R + f (d + e2 ) + (1 ? )f (d)g

2 2

T2 f (d) = h(d) + f (d) otherwise. Here K can be interpreted as blocking costs, R as a reward for serving a customer, and h as holding costs. If 2 = 2 the departures are uncontrolled. The system state is de ned by (d) = d1 ? d2 . Thus if (d1) = (d2), both d1 and d2 ~ are either active or inactive, given the de nition of D. This ensures that Jn (d) = Jn ( (d)) if the obvious conditions (on transition probabilities and cost functions) are satis ed. The value function of a continuous time model (sampled at exponentially distributed times) could now be de ned by Jn+1 (d) = p1T1 Jn(d) + p2 T2 Jn (d): i (Formally, this means an environment with V = fvg, vv = 1, qvv = pi and Si = Ti, for i = 1; 2 and p1 + p2 = 1.) This is indeed the value function of a queue with Poisson arrivals (with rate p1 1 ) and exponential service times (with rate p2 2), for an arbitrary constant . A discrete time model could have the value function Jn+1 (d) = T1 T2 Jn (d):

INRIA

On submodular value functions

11

Now 1 and 2 serve as arrival and departure probabilities. This simple model can be generalized in various ways. A rst way would be to allow batch arrivals. Assume that customers arrive in batches of size B . This can easily be modeled, just by taking D = fd j Bd1 d2 g and (d) = Bd1 ? d2 . Another generalization would be a model in which there are both controlled an uncontrolled arrivals. The simplest way to model this would be by taking 1 > 0. However, this doesn't always work, as in the case of a discrete time model in which we can have both types of arrivals at the same epoch. This can be solved by letting the uncontrolled arrivals be part of the environment, i.e., v counts the number of uncontrolled arrivals. Now Dv = fd j v + d1 d2g, and let vv and vv+1 be independent of v, with vv + vv+1 = 1. Note that it follows that Dv Dv+1 , and that J (v; d) depends only on v + d1 ? d2 (which is equal to (v; d)), as long as the cost functions do. This makes that all conditions given in this section are satis ed. The last generalization we discuss is that to general arrival streams. Take i = 1, let vw be the transition probabilities of some Markov chain. If this Markov chain does not 1 change state, a departure can occur, and thus qvv = 1 with S1 = T2 . If the Markov chain 2 changes state, an arrival is generated at the transition with probability qvw (for v 6= w). The 3 2 corresponding dp operator is S2 = T1 . With probability qvw = 1 ? qvw no arrival occurs, and thus S3 is the null event. Let again Dv = fd j d1 d2 g. This results in X 1 2 3 Jn+1 (v; d) = vw qvw T2 Jn (v; d) + qvw T1 Jn (v; d) + qvw Jn (v; d) ; which is the value functions of a continuous time model (sampled at exponentially distributed times) where the arrivals are generated by a Markov arrival process (MAP). Approximation results exist showing that this type of arrival process is dense in the set of all arrival processes (Asmussen & Koole 10]). Of course, the MAP could at the same time be used to govern the departure process, really giving it the interpretation of an environment.

w

Tandem model In Weber & Stidham 46] a cycle of m queues is considered. Thus customers leaving queue i join queue i + 1 (with queue m + 1 identi ed with queue 1). The service rate at each queue can be controlled. Furthermore, there are uncontrolled exogenous arrivals at each queue. We model this by taking d = (d1; : : : ; dm ) and v = (v1; : : : ; vm ), where di represents a customer leaving center i, P vi represents an uncontrolled arrival to and center i. De ne si = ?ei + ei+1 , and (v; d) = i (viei + disi ). The state space is de ned by Dv = fd j (v; d) 0g. In 46] the continuous time model is studied, having a value function of the form X i X qvv TiJn (v; d): Jn+1 (v; d) = vv+ei Jn (v + ei ; d) + vv

It is readily seen that this choice satis es the conditions formulated in this section. In the next section we will see that the results in 46] are a subset of ours. The results proven there hold also for several related models, like the discrete time version, or the model with arrivals according to an MAP, as described in the admission control model.

i i

RR n 2658

12

Eitan Altman & Ger Koole

Fork-join queue Our third example is that of a fork-join queue. A fork-join queue consists of two or more parallel queues. Jobs arrive at the system according to a Poisson process, and on arrival they place exactly one task in each queue (the fork primitive). Take jV j = 1. Let operator T1 represent the arrival event, and operator Ti , 2 i m, the departure event at queue i ? 1. Then (d) = (d1 ? d2; : : : ; d1 ? dm ), where the ith component represents the number of customers in the ith queue. The number of customers in state x is given by maxi xi, if we assume that the queues work in a FIFO manner. This, or a convex function thereof, are interesting cost functions. In the next section we will see how the optimal service rate at each of the queues changes as the state changes. Also the admission control can be studied. This model can be generalized in various ways, similarly to the previous examples, e.g. to arrivals according to an MAP or to batch arrivals. Combinations with the previous examples are also possible, resulting for example in a cycle of fork-join queues.

We call a function f D-submodular if f (v; d) + f (v; d + ei + ej ) f (v; d + ei) + f (v; d + ej ) for all i 6= j , all v 2 V , and for all d such that d, d + ei + ej , d + ei and d + ej 2 Dv . Let us rst assume that Dv = INm for all v. This means that each event is always active. ( ( For Sr = Ti1 : : : Tia de ne Srb) = Tib : : : Tia . Thus Srb) consists of the a ? b last operators of ( Sr . With Sra+1) we denote the identity. Then we have the following result.

3.1 The unconstrained model

3 Monotone selectors

Lemma 3.1 If hi for all i and J are D-submodular, then Srb Jn (and in particular Jn )

are also D-submodular for all n, r and b.

0 ( )

Proof We show that if a function f is submodular in (v; d) for some i and j (i 6= j ), it propagates through Tk , for an arbitrary k. Assuming that Jn is D-submodular, it then ( follows that Srb) Jn is D-submodular. By noting that convex combinations of submodular functions are again submodular, it is then easily shown that Jn+1 is also D-submodular. Thus, we assume that (dropping v in the notation) f (d) + f (d + ei + ej ) f (d + ei) + f (d + ej ) (i 6= j ). We have to show that Tk f (d)+ Tkf (d + ei + ej ) Tk f (d + ei)+ Tk f (d + ej ), while event k is active in all 4 states. Note rst that hk is submodular. Denote with 1 ( 2) the minimizing for score d + ei (d + ej ). Assume that 1 2 (the other case is similar by symmetry). First consider the case i 6= k. Note that

Tk f (d) + Tk f (d + ei + ej ) hk (d) + ck ( 1) + 1 f (d + ek ) + (1 ? 1 )f (d)+ hk (d + ei + ej ) + ck ( 2) + 2 f (d + ei + ej + ek ) + (1 ? 2 )f (d + ei + ej ):

INRIA

On submodular value functions

13

Summing

and gives

hk (d) + hk (d + ei + ej ) hk (d + ei) + hk (d + ej ); ck ( 1) + ck ( 2) ck ( 1 ) + ck ( 2 ); 1 f (d + ek ) + 1 f (d + ei + ej + ek ) 1 f (d + ei + ek ) + 1 f (d + ej + ek ); ( 2 ? 1 )f (d + ej ) + ( 2 ? 1 )f (d + ei + ej + ek ) ( 2 ? 1 )f (d + ei + ej ) + ( 2 ? 1 )f (d + ej + ek );

(4) (5) (6)

(1 ? 1 )f (d) + (1 ? 1 )f (d + ei + ej ) (1 ? 1 )f (d + ei) + (1 ? 1 )f (d + ej )

Tk f (d) + Tk f (d + ei + ej ) Tk f (d + ei ) + Tk f (d + ej ): Note that (5) doesn't hold if i = k. If i = k, we start with Ti f (d) + Ti f (d + ei + ej ) hi(d) + ci( 2 ) + 2 f (d + ei ) + (1 ? 2 )f (d)+ hi(d + ei + ej ) + ci( 1 ) + 1 f (d + 2ei + ej ) + (1 ? 1 )f (d + ei + ej ): The inequality follows as the previous one if we replace (4) (6) by 1 f (d + ei ) + 1 f (d + 2ei + ej ) 1 f (d + 2ei ) + 1 f (d + ei + ej ) and (1 ? 2 )f (d) + (1 ? 2 )f (d + ei + ej ) (1 ? 2 )f (d + ei ) + (1 ? 2 )f (d + ej ): true Further generality can be obtained by letting i and ci depend on the environment. As this adds merely to the notation, we will not do this. From lemma 3.1 the monotonicity of the optimal policy can be derived. (Note that we use increasing and decreasing in the non-strict sense throughout the paper.) Theorem 3.2 For the unconstrained case, if hi for all i and J0 are D-submodular, then the optimal control of event i is increasing in dk , for all i 6= k.

Proof Ti f (v; d) can be rewritten as

i i

hi(v; d) + min fci( ) + (f (v; d + ei) ? f (v; d))g + f (v; d):

If f is D-submodular, then f (v; d + ei + ej ) ? f (v; d + ej ) f (v; d + ei) ? f (v; d). From this the result follows easily. The above Theorem should be understood as follows: consider some macro-stage r and micro-stage k. There exists an optimal policy according to which the action j(k;r) (v; d) (see de nitions in Subsection 2.3) is used at state (v; d) and the action j(k;r) (v; d + ei) is used at state (v; d + ei) where i 6= j (k; r), and j(k;r) (v; d + ei) j(k;r) (v; d).

RR n 2658

14

Eitan Altman & Ger Koole

3.2 In nite costs

As it stands, the result is not very useful. For example, in the admission control model it would allow for the departure event always to be active, which could result in negative queue lengths. To prevent this, we need the Dv to be proper subsets of INm , and to show that D-submodularity still propagates. A simple way to deal with this problem is extending Dv to INm by taking h(v; d) = 1 if d 62 Dv , and then applying lemma 3.1. For this to yield nite costs we have to assume that non-permanent events (i.e., events for which there are (v; d) and i such that d 2 Dv and d + ei 62 Dv ) are controllable to 0 (i.e., i = 0). (These terms come from 18] and 46], respectively.) This ensures that the control can be such that in nite costs states can be avoided, i.e., the optimal action for dp operator Ti in (v; d) with d + ei 62 Dv will be = 0 as Jn (v; d + ei) = 1. (We assume that 0 1 = 0.) Note that controlling the active event i at 0 is equivalent to i being nonactive, if we assume that ci(0) = 0. The condition that Dv Dw if vw > 0 ensures that in nite costs are avoided due to a change of state in the environment. Furthermore, we assume that if, for some v, d + ei and d + ej 2 Dv , then so are d and d + ei + ej . (This is called compatible in 46], p. 208.) This insures that if h and J0 are submodular on Dv , then so they are on INm . Of course, this assumes that ci (0) = 0 for all non-permanent i (this condition is forgotten in 18]). In fact, the above condition can be simpli ed, as argued in theorem 5.2 of 18]: for all (v; d) such that d 2 Dv , we assume that if d + ei ; d + ej 2 Dv , then d + ei + ej 2 Dv . This is equivalent to saying that Dv is closed under maximization. In 18] it is shown that this max-closure is equivalent to both permutability and non-interruption. An event i is called non-interruptive if in case events i and j are active in x 2 Dv , this implies that i is also active in x + ej 2 Dv , for all j 6= i. Using theorem 3.2, we arrive at the following.

Corollary 3.3 If (i) hi for all i and J0 are D-submodular, (ii) non-permanent events are controllable to 0 and have ci(0) = 0, (iii) Dv is closed under maximization, then the optimal control of event i is increasing in dj , for all i 6= j .

Often ci is concave (in particular, linear) for some event i. It is readily seen that the optimal control in this case is either i or i . This type of control, where only the extremal values of the control interval are used, is sometimes called bang-bang control.

Example (admission control model (continued)) We apply the results obtained so far to the admission control model. To do this, we take 2 = 0 to assure that this nonpermanent event is controllable to 0. It is readily seen that both events are non-interruptive. Therefore all that is needed to prove the submodularity of Jn is the submodularity of h and J0 . We have (d) = (d + e1 + e2 ), this means that having the direct system costs ~ convex is h a su cient condition for the existence of an optimal threshold policy. (Note that the direct

INRIA

On submodular value functions

15

~ Note that this condition is weaker than convexity of h. Another generalization discussed in the previous section are arrivals modeled by an MAP. Now h(v; ) needs to be submodular for every v. Note that there are no restrictions on the costs between di erent states of the environment. This allows for many types of di erent costs functions. A generalization we did not yet discuss is a nite bu er. This generalization is direct as = 0. 1

costs need not be increasing.) The monotonicity results translated to the system state give that the admission probabilities decrease and the service probabilities increase as the queue length increases. As the cost for controlling the events is linear in , this leads to bang-bang control, i.e., at a certain threshold level the control switches from the low to the high rate or vice versa. Let us also consider the model with batch arrivals of size B . As (d) = Bd1 ? d2 , submodularity of h results in the following su cient condition for ~ : h ~ ~ h(x) + ~ (x + B ? 1) ~ (x ? 1) + h(x + B ); x > 0: h h

Example (tandem model (continued)) Let us apply corollary 3.3 to the cycle of queues

as studied in Weber & Stidham 46]. The service events at the queues are non-permanent, and they can indeed be controlled to 0. In 46] ci(0) need not be 0 if i is a departure event; instead of this an event is always active and the control 0 is selected if the corresponding queue is empty. This is obviously equivalent. The max-closure is easily veri ed. Rewriting the submodularity in terms of the system states gives the inequalities (2) in 46]. They take as cost functions additive functions, which are convex in each component of the state space. This is a su cient condition for the corresponding hi to be submodular. Indeed, some P ~ algebra shows that if hi (x) = j gj (xj ), then h(d + ei + ej ) + h(d) = h(d + ei) + h(d + ej ) if ji ? j j > 1, and h(d + ei + ei+1 ) + h(d) ? h(d + ei) ? h(d + ei+1) = 2gi+1 (xi+1) ? gi+1 (xi+1 ? 1) ? gi+1 (xi+1 + 1) 0 by convexity, for x = (d). Corollary 3.3 now gives us the monotonicity of the optimal control for each n. Together with results on the existence of average cost optimal policies (on which, for this speci c model, is elaborated in section 3 of 46]), we arrive at Weber & Stidham's main result on the existence of optimal monotone average costs policies ( 46], p. 206). We assumed that exogenous arrivals were part of the environment. The reason for this is that, for a reasonable type of cost function as in 46], submodularity does not hold: we expect that the control in queue j is decreasing in the number of arrivals at queue j + 1. However, Weber & Stidham 46] study also a model where customers departing from queue m leave the system. Then it makes sense to include the arrivals to the rst queue as an event, giving that the optimal control of each server is increasing in the number of arrivals at the rst queue.

Example (fork-join queue (continued)) The main interest for the fork-join queue is the control of the servers. We can apply corollary 3.3 if we assume that they are controllable

RR n 2658

16

Eitan Altman & Ger Koole

to 0 with ci(0) = 0. We take hi(d) = max2 j m dj ? d1 (for some or all i), which corresponds to ~ (x) = max1 j m?1 xj . It is easily checked that hi is D-submodular and that D is closed h under maximization. Thus the optimal control of a departure event is increasing in the other events. This means that an arrival increases the optimal rate, but also a departure at another queue does.

3.3 Projection

Extending the cost function from D to INm has several drawbacks, one being the necessity of the assumption that non-permanent events are controllable to 0. For example, if we were in the admission control model just to control arrivals, i.e., if we would take T2 f (d) = h(d) + f (d + e2 ) if d + e2 2 D and T2 f (d) = h(d) + f (d) if d + e2 62 D, then the second event is non-permanent but not controllable to 0. This problem would not exist if, in the case where the second event is controllable to zero (and R = 0), the optimal control would always be equal to 1. This appears to be the case if h(d) h(d + e2). In the admission ~ control model this means that we should assume that h(x) ~ (x + 1), i.e., the direct costs h are increasing in the queue length. This method is called projection in 18], and it has also been applied to the control of a single queue with delayed noisy information in Altman & Koole 8]. We formalize the ideas, by rst considering a model in which the non-permanent events are controllable to 0. After that we show that this model is equivalent to the one we are interested in. Theorem 3.4 If (i) hi for all i and J0 are D-submodular, (ii) hj (v; x + ei) hj (v; x) for all non-permanent i, j and x such that x and x + ei 2 Dv , (iii) non-permanent events are controllable to 0 and have ci( ) = 0, (iv) Dv is closed under maximization, then the optimal control of event i is i if x + ei 2 Dv .

( costs outside the score space it follows that Sr ) Jn is D-submodular. We show inductively (b) (b) that Sr Jn (v; d + ei) Sr Jn (v; d) for i non-permanent and for all d; d + ei 2 Dv . Thus assume that f satis es all conditions. We show rst that Tk f (v; d + ei) Tk f (v; d) if d; d + ei 2 Dv . This follows easily if both d + ei + ek and d + ek 2 Dv , or if d + ek 62 Dv . Now assume that d + ei + ek 62 Dv . By (iv) also d + ek 62 Dv , and the inequality still holds. Finally we show Ti f (v; d + ei) Ti f (v; d). This follows easily. This theorem states that if we have a non-permanent event which is not controllable to 0, and if hk (v; d + ei) hk (v; d) for all k and appropriate d, then we might as well make the event controllable to 0 as the event will always be controlled at the highest rate possible. This leads to the following. Corollary 3.5 If (i) hi for all i and J0 are D-submodular,

Proof The conditions are a superset of thoseb in corollary 3.3, and thus by taking in nite

INRIA

On submodular value functions

17

(ii) hk (v; d + ei) hk (v; d) for all k, d, d + ei 2 Dv , and ci( ) = 0 for all non-permanent i, (iii) Dv is closed under maximization, then the optimal control of event i is increasing in dk , for all i 6= k.

Note that the condition on the ci cannot be found in 18]. Of course the corollaries 3.3 and 3.5 can be combined in a single model.

Example (admission control model (continued)) As is argued above, if the departure process is non-controllable, the condition that the direct costs are increasing makes that the monotonicity result still holds. This model is thus an example where the corollaries 3.3 and 3.5 are combined: corollary 3.3 is used for the arrival event, corollary 3.5 is used for the departure event. Remark Besides taking in nite costs outside the state space and projection other methods

are possible to deal with the boundary. An example of such a model can be found in 31]. There a tandem model with nite bu ers is studied. The costs for rejecting a customer are equal to 1, and these are the only costs in the system (however, this implies implicitly costs for queues that are full, since the controller is forced to reject an arriving customer when the queue is full). The boundary is dealt with by including, besides submodularity, inequalities (formulated in the system space) of the form f~(x + ei ) 1 + f~(x).

3.4 Admission control model with random batches

In the admission control model we considered a generalization to batch arrivals. This could be further analyzed by assuming that each batch has a random size. However, this would mean that becomes a random function, and D would depend on its realization. Thus we cannot apply the theory developed above directly. An elegant solution is to assume that d1 does not count the number of batches, but the number of customers that have arrived. If we assume that a batch of k customers has probability k , this would result in

T1 f (d) = 0 min f(1 ? )K +

1

X

k

k f (d + ke1 ) + (1 ?

)f (d)g:

Instead of submodularity, we show that the following inequality propagates:

f (d) +

X

k

k f (d + ke1 + e2 )

X

k

k f (d + ke1 ) + f (d + e2 ):

Copying the proof of lemma 3.1 for the current case is easily done. As the arrival event is permanent, no complications arise due to boundary issues: taking in nite costs outside D = fd1 d2 g retains submodularity. With regard to the costs in the system state, the

RR n 2658

18

Eitan Altman & Ger Koole

~ situation is similar to that in the case of xed batch sizes. The su cient condition on h is as follows: ~ ~ (x) + X k ~ (x + k ? 1) ~ (x ? 1) + X k h(x + k); x > 0: h h h

k k

Again convexity of ~ implies the above inequality. h Note that if the departures cannot be controlled we have to assume again that the costs are monotone. Above we showed that the rate at which an event should be controlled increases as other events have occurred more often. One would conjecture that (under certain conditions) the reverse holds for the event itself, i.e., that the optimal control for an event is decreasing in d . This would mean proving convexity in d . However, we were not able to prove convexity of Jn in one or all components. In 46] however an argument is given showing that, under stationary conditions, the optimal control of event i is decreasing in di for all i. As it applies also to our other examples, we give it here. We consider problems with an in nite horizon cost. First note that there exist discounted and average optimal policies that are stationary, i.e., they depend only on the state (and on the micro-stage), and not on the time (i.e. on the macro-stage). This means that the rate at which event i should be controlled is increased if another event j occurs. However, in the cycle of queues, the same system state is reached if all transitions have red once. This means that also the optimal control of event i should be the same as before the ring of all events. As the optimal rate increased with the ring of each event except event i itself, this means that the optimal control of event i should be decreasing in di. The same applies to the admission control model and the fork-join queue. Note that in the case of batch arrivals of size B the other event(s) have to re B times each to reach the same system state.

3.5 Convexity and stationarity

4 Dynamic programming involving maximization and minimization

In this section we restrict ourselves to two events, where in the second event the minimization is replaced by maximization. We thus consider the same Dynamic programming as in (2), where the min in (1) is replaced by max for i = 2. Let T1 be the minimizing operator, and T2 the maximizing operator. This type of dynamic programming equation is used in stochastic (or Markov) games (see 43] having two players, each one controlling a di erent event, both having the same objective function, which is maximized by one and minimized by the other, i.e., a zero sum

INRIA

On submodular value functions

19

setting. Another application of such dynamic programming appears in the context of (noncontrolled) discrete event systems, and is used for the performance evaluation of parallel processing with both or and and constraints 30]. Now we show that in this setting submodularity still propagates, rst for the unconstrained case. The reason for considering only two events is that we cannot prove the lemma for m > 2. Lemma 4.1 If hi for all i and J0 are D-submodular, then Sr(b) Jn is also D-submodular for all n, r and b.

Proof Propagating T1 is similar to lemma 3.1. Thus consider T2 . Denote with 1 ( 2 ) the maximizing for score d (d + ei + ej ). First assume that 1 2 . Summing h2(d) + h2 (d + e1 + e2) h2(d + e1 ) + h2 (d + e2 ); c2 ( 1 ) + c2 ( 2) c2 ( 1) + c2 ( 2 ); 2 f (d + e2 ) + 2 f (d + e1 + 2e2 ) 2 f (d + e1 + e2 ) + 2 f (d + 2e2 ); ( 1 ? 2 )f (d + e2) + ( 1 ? 2 )f (d + e1 + e2) ( 1 ? 2 )f (d + e2 ) + ( 1 ? 2 )f (d + e1 + e2 ); and (1 ? 1 )f (d) + (1 ? 1 )f (d + e1 + e2) (1 ? 1 )f (d + e1 ) + (1 ? 1 )f (d + e2 ) gives T2 f (d) + T2 f (d + e1 + e2) T2 f (d + e1 ) + T2 f (d + e2 ); where we took 1 as (suboptimal) action in d + e1 and 2 in d + e2 . If 1 2 , we sum 1 f (d + e2 ) + 1 f (d + e1 + 2e2 ) 1 f (d + e1 + e2 ) + 1 f (d + 2e2 ); ( 2 ? 1 )f (d) + ( 2 ? 1 )f (d + e1 + e2) ( 2 ? 1 )f (d + e1) + ( 2 ? 1 )f (d + e2); ( 2 ? 1)f (d + e2) + ( 2 ? 1 )f (d + e1 + 2e2) ( 2 ? 1 )f (d + e1 + e2 ) + ( 2 ? 1 )f (d + 2e2); and (1 ? 2 )f (d) + (1 ? 2 )f (d + e1 + e2) (1 ? 2 )f (d + e1 ) + (1 ? 2 )f (d + e2); together with the inequalities for h2 and c2 . From this lemma the monotonicity of the optimal policy for the unbounded case follows. Because the maximizing actions are chosen in T2 , the optimal control for this operator is decreasing in d1. Theorem 4.2 If hi for all i and J0 are D-submodular, then the optimal control of event 1 is increasing in d2 , and the optimal control of event 2 is decreasing in d1.

RR n 2658

20

Eitan Altman & Ger Koole

To deal with the boundary, we cannot immediately apply corollary 3.3 or 3.5, as the maximization does not avoid 1-cost states. This calls for costs ?1 outside the state space. However, to maintain submodularity, we would have to replace the max-closure condition. We will not investigate this further, as in the example below the maximizing operator is permanent. Although the setting in this section is that of a game, the optimal policies are nonrandomized as the decisions do not occur simultaneously. Monotonicity that involves also randomization is studied in 9] in the more general framework of non zero-sum stochastic games. In particular, for the zero-sum case, the dynamic programming equations have a value" operator instead of a min and max operators (see 1, 3, 5]).

Example (admission control model (continued)) We consider here our admission control model, but with T1 the departure event and T2 the arrival event. Note that T2 , the maximizing event, is permanent. This model can be seen as a queue with controlled service which is operated under worst case conditions. Intuitively, under worse conditions, customers arrive if the queue is already full. This is indeed what follows from theorem 4.2. Thus, if c2 is linear, there is a threshold (possibly 0) such that arrivals are generated at the lowest rate below the threshold, and at the highest rate above it.

References

1] E. Altman. Flow control using the theory of zero-sum Markov games. IEEE Transactions on Automatic Control, 39:814 818, 1994. 2] E. Altman. A Markov game approach for optimal routing into a queueing network. Technical Report 2178, INRIA Sophia Antipolis, 1994. 3] E. Altman. Monotonicity of optimal policies in a zero sum game: A ow control model. Advances of Dynamic Games and Applications, pages 269 286, 1994. 4] E. Altman. Non zero-sum stochastic games in admission, service and routing control in queueing systems. Submitted, 1995. 5] E. Altman and A. Hordijk. Zero-sum Markov games and worst-case optimal control of queueing systems. Technical Report TW-94-01, Leiden University, 1994. 6] E. Altman, A. Hordijk, and F.M. Spieksma. Contraction conditions for average and -discount optimality in countable state markov games with unbounded rewards. Technical Report TW-93-16, Leiden University, 1993. 7] E. Altman and G.M. Koole. Stochastic scheduling games and Markov decision arrival processes. Computers and Mathematics with Applications, 26(6):141 148, 1993. 8] E. Altman and G.M. Koole. Control of a random walk with noisy delayed information. Systems and Control Letters, 24:207 213, 1995.

INRIA

On submodular value functions

21

9] E. Altman and G.M. Koole. Submodular dynamic games. Working paper, 1995. 10] S. Asmussen and G.M. Koole. Marked point processes as limits of Markovian arrival streams. Journal of Applied Probability, 30:365 372, 1993. 11] F. Baccelli, G. Cohen, G.J. Olsder, and J.P. Quadrat. Synchronization and Linearity. Wiley, 1992. 12] F. Baccelli and B. Gaujal. Stationary regime and stability of free-choice Petri nets. In Proceedings of the 11th International Conference on Analysis and Optimization of Systems, 1994. 13] F. Baccelli and Z. Liu. On the execution of parallel programs on multiprocessor systems a queueing theory approach. Journal of the ACM, 37:373 414, 1990. 14] D.P. Bertsekas and R.G. Gallager. Data Networks. Prentice-Hall, 1987. 15] J. Bruno, P. Downey, and G.N. Frederickson. Sequencing tasks with exponential service times to minimize the expected ow time or makespan. Journal of the ACM, 28:100 113, 1981. 16] E.A. Feinberg and M.I. Reiman. Optimality of randomized trunk reservation. Probability in the Engineering and Informational Sciences, 8:463 489, 1994. 17] W.H. Fleming and H.M. Soner. Controlled Markov Processes and Viscosity Solutions. Springer-Verlag, 1993. 18] P. Glasserman and D.D. Yao. Monotone optimal control of permutable GSMPs. Mathematics of Operations Research, 19:449 476, 1994. 19] P. Glasserman and D.D. Yao. Monotone Structure in Discrete Event Systems. Wiley, 1994. 20] P. Glasserman and D.D. Yao. Addendum to Monotone optimal control of permutable GSMPs . In preparation, 1995. 21] B. Hajek. Optimal control of two interacting service stations. IEEE Transactions on Automatic Control, 29:491 499, 1984. 22] R. Hassin and M. Haviv. Equilibrium strategies and the value of information in a two line queueing system with threshold jockeying. Stochastic Models, 10:415 435, 1994. 23] M. Haviv. Stable strategies for processor sharing systems. European Journal of Operations Research, 52:103 106, 1991. 24] D. P. Heyman. Optimal operating policies for M jGj1 queueing systems. Operations Research, 16:362 382, 1968.

RR n 2658

22

Eitan Altman & Ger Koole

25] A. Hordijk and G.M. Koole. On the assignment of customers to parallel queues. Probability in the Engineering and Informational Sciences, 6:495 511, 1992. 26] A. Hordijk and G.M. Koole. On the optimality of LEPT and c rules for parallel processors and dependent arrival processes. Advances in Applied Probability, 25:979 996, 1993. 27] K. Hsu and S.I. Marcus. Decentralized control of nite state Markov processes. In Proceedings of the 19th IEEE Conference on Decision and Control, pages 143 148, 1980. 28] N.K. Jaiswal. Priority Queues. Academic Press, 1968. 29] S. Janakiram, S. Stidham, Jr., and A. Shaykevich. Airline yield management with overbooking, cancellations and no-shows. Technical Report UNC/OR/TR94-9, University of N.C. at Chapel Hill, 1994. 30] A. Jean-Marie and G.J. Olsder. Analysis of stochastic min-max systems: Results and conjectures. Technical Report 93-94, Delft University of Technology, 1993. 31] G.M. Koole and Z. Liu. Nonconservative service for minimizing cell loss in ATM networks. In Proceedings of the 33rd Allerton Conference on Communication, Control, and Computing, 1995. To appear. 32] H.-U. K enle. On the optimality of (s; S )-strategies in a minimax inventory model with average cost criterion. Optimization, 22:123 138, 1991. 33] H.J. Kushner and P. Dupuis. Numerical Methods for Stochastic Control Problems in Continuous Time. Springer-Verlag, 1992. 34] S.A. Lippman. Applying a new device in the optimization of exponential queueing systems. Operations Research, 23:687 710, 1975. 35] S.A. Lippman and J.W. Mamer. Preemptive innovation. Journal of Economic Theory, 61:104 119, 1993. 36] U. Madhow, M.L. Honing, and K. Steiglitz. Optimization of wireless resources for personal communications mobility tracking. In Proceedings of IEEE Infocom '94, pages 577 584, 1994. 37] N. Merlet and J. Zerubia. A curvature-dependent energy function for detecting lines in satellite images. In Proceedings 8th SCIA, Tromso, Norway, 1993. 38] P. Naor. On the regulation of queueing size by levying tolls. Econometrica, 37:15 24, 1969. 39] A. Orda, R. Rom, and M. Sidi. Minimum delay routing in stochastic networks. IEEE/ACM Transactions on Networking, 1:187 198, 1993.

INRIA

On submodular value functions

23

40] M.L. Puterman. Markov Decision Processes. Wiley, 1994. 41] F.C. Schoute. Decentralized control in packet switched satellite communication. IEEE Transactions on Automatic Control, 23:362 371, 1978. 42] R.F. Serfozo. An equivalence between continuous and discrete time Markov decision processes. Operations Research, 27:616 620, 1979. 43] L.S. Shapley. Stochastic games. Proccedings National Academy of Science USA, 39:1095 1100, 1953. 44] M.J. Sobel. Optimal average-cost policy for a queue with start-up and shut down costs. Operations Research, 17:145 162, 1969. 45] S. Stidham, Jr. Socially and individually optimal control of arrivals to a GI jM j1 queue. Management Science, 24:1598 1610, 1970. 46] R.R. Weber and S. Stidham, Jr. Optimal control of service rates in networks of queues. Advances in Applied Probability, 19:202 218, 1987. 47] J. Weishaupt. Optimal myopic policies and index policies for stochastic scheduling problems. ZOR - Mathematical Methods of Operations Research, 40:75 89, 1994. 48] D.J. White. Real applications of Markov decision processes. Interfaces, 15:73 83, 1985. 49] W. Winston. Optimality of the shortest line discipline. Journal of Applied Probability, 14:181 189, 1977. 50] M. Yadin and P. Naor. Queueing systems with removable service station. Operations Research Quaterly, 14:393 405, 1963. 51] U. Yechiali. On optimal balking rules and toll charges in a GI jM j1 queueing process. Operations Research, 19:349 370, 1971.

RR n 2658

Unite de recherche INRIA Lorraine, Technopole de Nancy-Brabois, Campus scienti?que, ? ? ` 615 rue du Jardin Botanique, BP 101, 54600 VILLERS LES NANCY Unite de recherche INRIA Rennes, Irisa, Campus universitaire de Beaulieu, 35042 RENNES Cedex ? Unite de recherche INRIA Rhone-Alpes, 46 avenue Felix Viallet, 38031 GRENOBLE Cedex 1 ? ? ? Unite de recherche INRIA Rocquencourt, Domaine de Voluceau, Rocquencourt, BP 105, 78153 LE CHESNAY Cedex ? Unite de recherche INRIA Sophia-Antipolis, 2004 route des Lucioles, BP 93, 06902 SOPHIA-ANTIPOLIS Cedex ?

? Editeur INRIA, Domaine de Voluceau, Rocquencourt, BP 105, 78153 LE CHESNAY Cedex (France) ISSN 0249-6399

### 相关推荐

- Programme 1--- Architectures parallles, bases de donnes, rseaux et systmes distribus
- Programme 1 Architectures parall#les, bases de donn#es, r#seaux
- Programme 1--- Architectures parall`eles, bases de donn'ees, r'eseaux et syst`emes distribu
- ARCHITECTURES SYSTOLIQUES ET PARALL
- Les yeux et les cheveux
- La culture et les arts en France
- Les vetements et la mode
- DEA Réseaux et Systèmes Distribués Building and Querying a Repository of BPEL Process Sp
- 01-La veille dame et les soldes

### 最新更新

- Programme 1 Architectures parall#les, bases de donn#es, r#seaux et syst#mes distribu#s
- 大学生心理健康主题班会_图文.ppt精品文档
- 浅论构建绿色税收体系的设想(一)
- 桃花江的品牌资质分析报告
- 最实用的线下推广网站
- 客户说发货慢怎么回复_物流走件慢回复话术
- 镇江恒隆电气设备有限公司灌云第一分公司企业信用报告-天眼查
- 除湿机机械系统故障分析_论文
- 【优质文档】2018年标准二手房转让合同范本-优秀word范文 (9页)
- (2018人教部编版)语文九年级上学期同步练*课件：13.诗词三首 (共52张PPT)
- 【2019】部编语文六年级上册：类文阅读-3古诗词三首-阅读提升,知识拓展word版
- 东莞市中堂嘉诚织带厂(企业信用报告)
- 桃花江Taohuajiang的品牌资质分析报告
- 玉溪师范学院政法学院08政教团支部2009-2010学年上学期工作计划
- 年产xxx套污泥处置处理设备新建项目可行性研究报告
- 错别字调查报告作文

### 猜你喜欢

- 基于Web 的测控工程项目管理系统研究及其应用
- 关于校园消防安全知识
- 江苏省发展高新技术条例(2002修正)
- GB 50017-2003《钢结构设计规范》应用讲座
- 亚太财产保险有限公司衡水中心支公司(企业信用报告)- 天眼查
- 为了健康，不要贪食冷饮
- 如何建立小学低年级学生的数感
- 2019贵州地区大堂经理职位薪酬报告
- 情感教育在初中道德与法治教学中的渗透剖析
- 我的老师——老官上任数把火
- 成人高考备考技巧
- 秋学期小学三年级语文上册第二单元7听听秋的声音导学案新人教版
- 综合实践活动课程“积分制”学习成效评价的优势
- 【精编完整版】山亭区10万亩优质长红枣基地产业化建设项目可研报告
- 工艺案例~印刷包装普及版资料 43页PPT文档
- 基于Moodle的翻转课堂教学模式设计
- 推荐K12学*山东省郯城县红花镇中考历史复*八上第18课战略大决战*题02新人教版
- 最新competitive intelligence architecture精品资料
- 20XX二年级家长会班主任发言稿精选
- 2012浙江省驾校考试科目一自动档考试技巧与口诀
- 《遥远的地方有一片海》读后感100字
- 哈尔滨卓烨快递有限公司(企业信用报告)- 天眼查
- 九鼎内衣整体品牌设计提案.ppt
- 【免费下载】领导干部如何写个人述职述廉报告
- 青春期怎么预防长青春痘？消除青春痘的小方法
- java8 - 方法引用(method referrance)
- 青岛华建阳光电力科技有限公司(企业信用报告)- 天眼查
- 六年级语文下册精品课件-5.鲁滨孙漂流记人教部编版(共35张PPT)
- 2019-2020年七年级英语下册 Unit 5 Amazing things Reading小练习(新版)牛津版
- 《阿长与山海经》读书笔记6篇
- 宣城市文景餐饮管理有限公司企业信用报告-天眼查
- 论我国《物权法》中的拾得遗失物制度-最新文档资料
- 【推荐下载】城市河流及其生态治理规划研究
- mysql负载均衡与同步_MySQL多主同步 负载均衡
- 组态软件概述组态王软件的安装组态王的使用入门课件-精选文档
- 乙烯基醋酸酯气体监测报警系统(网络版)
- 2018年3月6日是什么日子？农历是多少？ 农历7月16日日子好吗
- 转基因食品的利与弊(1)
- 关于梦的优美的句子
- 激励员工有策略_成长故事
- 新形势下群众文化建设及发展策略的论文
- 厦门兴元洲进出口贸易有限公司(企业信用报告)- 天眼查