LicenseJuggler

From GridWiki
Jump to: navigation, search

Contents

Motivation - Why are we doing this?

Enterprises with extended need for compute power such as EDA companies since many years are at the forefront of Grid Computing. To satisfy their need those enterprises in many cases are used to operate self-contained and autonomous DRMS (Distributed Resource Management System) such as Sun Grid Engine spread over different buildings, locations and sometimes even geographical regions and sometimes the number DRMS goes into the dozens! The omnipresent cost-pressure and the vision of the "Global Grid" feeds more and more demand to consolidate dispersed DRMS-based infrastructures. Consolidating dispersed DRS-based infrastructure can be a non-trivial task due to a variety of often contrary objectives.

One of the most commonly raised consolidation objectives is how EDA tool software licenses can reasonably be shared amongst locally dispersed DRMS installations. In contrast to centralizing actual compute hardware putting license sharing into practice is comparably easy and anyhow helps cost cutting. The "License Juggler" described here should be perceived as a proove of concept on the road towards a related feature/product/solution/whatever that helps Sun Grid Engine adminstrators to put license sharing into practice rather than talking about it :-)

Please feel invited to provide feedback via Grid Engine users@ mailing list.

Copyright statement

Note: This document is under Grid Engine project SISSL license.

Building Blocks

  • the license juggler package contains everything needed for license sharing amongst the Sun Grid Engine installations License juggler package

What's inside the juggler package?

The license juggler package contains an example setup where 10 licenses are shared amongst three different locations

        emea/policy
        emea/rsh
        amer/policy
        amer/rsh
        apac/policy
        apac/rsh
        global_policy

and some small files (bourne shell, C program)

        juggler
        juggler_remote
        schedule.c

that make the actual license juggler.

How to use the license juggler?

The following sections below will give you an overview on license juggler key concepts. Once license juggler is set-up the script "juggler" can be started to optimize enterprise-widde license utilization

        # juggler emea,apac,amer lic1,lic2,lic3,lic4,lic5,lic6,lic7,lic8,lic9,lic10 3
        Fri Jul  8 14:34:54 MEST 2005: ======= new interval =======
        Fri Jul  8 14:34:54 MEST 2005: ======= updating cluster information =======
        emea/rsh juggler_remote utilized lic1,lic2,lic3,lic4,lic5,lic6,lic7,lic8,lic9,lic10
        apac/rsh juggler_remote utilized lic1,lic2,lic3,lic4,lic5,lic6,lic7,lic8,lic9,lic10
        amer/rsh juggler_remote utilized lic1,lic2,lic3,lic4,lic5,lic6,lic7,lic8,lic9,lic10
        emea/rsh juggler_remote required lic1,lic2,lic3,lic4,lic5,lic6,lic7,lic8,lic9,lic10
        amer/rsh juggler_remote required lic1,lic2,lic3,lic4,lic5,lic6,lic7,lic8,lic9,lic10
        apac/rsh juggler_remote required lic1,lic2,lic3,lic4,lic5,lic6,lic7,lic8,lic9,lic10
        Fri Jul  8 14:35:10 MEST 2005: =======          scheduling         =======
        lic1: cluster=emea rr_index=0   demand=6   util=2   min=0   max=4   targetted=3   assigned=2  
        lic1: cluster=apac rr_index=2   demand=4   util=4   min=0   max=4   targetted=2   assigned=2  
        lic1: cluster=amer rr_index=1   demand=6   util=2   min=0   max=4   targetted=3   assigned=2  
        lic1: global=8   demand=16  util=8   assigned=6  
        lic2: cluster=emea rr_index=0   demand=30  util=0   min=0   max=4   targetted=1   assigned=0  
        lic2: cluster=apac rr_index=2   demand=43  util=1   min=0   max=4   targetted=0   assigned=0  
        lic2: cluster=amer rr_index=1   demand=45  util=0   min=0   max=4   targetted=0   assigned=0  
        lic2: global=1   demand=118 util=1   assigned=0  
        lic3: cluster=emea rr_index=0   demand=36  util=4   min=0   max=4   targetted=4   assigned=4  
        lic3: cluster=apac rr_index=2   demand=36  util=4   min=0   max=4   targetted=4   assigned=4  
        lic3: cluster=amer rr_index=1   demand=32  util=8   min=0   max=15  targetted=12  assigned=12 
        lic3: global=20  demand=104 util=16  assigned=20 
        lic4: cluster=emea rr_index=0   demand=14  util=0   min=0   max=4   targetted=2   assigned=0  
        lic4: cluster=apac rr_index=2   demand=10  util=4   min=0   max=4   targetted=1   assigned=1  
        lic4: cluster=amer rr_index=1   demand=14  util=0   min=0   max=4   targetted=1   assigned=0  
        lic4: global=4   demand=38  util=4   assigned=1  
        lic5: cluster=emea rr_index=0   demand=14  util=2   min=0   max=4   targetted=2   assigned=2  
        lic5: cluster=apac rr_index=2   demand=15  util=1   min=0   max=4   targetted=1   assigned=1  
        lic5: cluster=amer rr_index=1   demand=15  util=0   min=0   max=2   targetted=1   assigned=1  
        lic5: global=4   demand=44  util=3   assigned=4  
        lic6: cluster=emea rr_index=0   demand=1   util=1   min=0   max=4   targetted=1   assigned=1  
        lic6: cluster=apac rr_index=2   demand=2   util=0   min=0   max=4   targetted=1   assigned=1  
        lic6: cluster=amer rr_index=1   demand=1   util=0   min=0   max=3   targetted=1   assigned=1  
        lic6: global=3   demand=4   util=1   assigned=3  
        lic7: cluster=emea rr_index=0   demand=3   util=1   min=0   max=4   targetted=3   assigned=3  
        lic7: cluster=apac rr_index=2   demand=4   util=0   min=0   max=4   targetted=4   assigned=4  
        lic7: cluster=amer rr_index=1   demand=2   util=0   min=0   max=9   targetted=2   assigned=2  
        lic7: global=9   demand=9   util=1   assigned=9  
        lic8: cluster=emea rr_index=0   demand=60  util=0   min=0   max=4   targetted=3   assigned=3  
        lic8: cluster=apac rr_index=2   demand=60  util=0   min=0   max=4   targetted=2   assigned=2  
        lic8: cluster=amer rr_index=1   demand=30  util=0   min=0   max=5   targetted=2   assigned=2  
        lic8: global=7   demand=150 util=0   assigned=7  
        lic9: cluster=emea rr_index=0   demand=24  util=0   min=0   max=4   targetted=3   assigned=3  
        lic9: cluster=apac rr_index=2   demand=24  util=0   min=0   max=4   targetted=2   assigned=2  
        lic9: cluster=amer rr_index=1   demand=12  util=0   min=0   max=8   targetted=3   assigned=3  
        lic9: global=8   demand=60  util=0   assigned=8  
        lic10: cluster=emea rr_index=0   demand=8   util=0   min=0   max=4   targetted=4   assigned=4  
        lic10: cluster=apac rr_index=2   demand=16  util=0   min=0   max=4   targetted=3   assigned=3  
        lic10: cluster=amer rr_index=1   demand=8   util=0   min=0   max=10  targetted=3   assigned=3  
        lic10: global=10  demand=32  util=0   assigned=10 
        Fri Jul  8 14:35:10 MEST 2005: ======= distribte assigned licenses =======
        emea/rsh juggler_remote assigned lic1=2,lic2=0,lic3=4,lic4=0,lic5=2,lic6=1,lic7=3,lic8=3,lic9=3,lic10=4,
        apac/rsh juggler_remote assigned lic1=2,lic2=0,lic3=4,lic4=1,lic5=1,lic6=1,lic7=4,lic8=2,lic9=2,lic10=3,
        amer/rsh juggler_remote assigned lic1=2,lic2=0,lic3=12,lic4=0,lic5=1,lic6=1,lic7=2,lic8=2,lic9=3,lic10=3,
          :
          :

please note for actual scheduling "juggler" requires "schedule" binary for the target platform be available in the juggler directory. On a Solaris machine with Sun Studio compiler installed "schedule.c" can easily be compiled using

        # cc -o schedule schedule.c

Policies

At first you should make yourself familar with the policy concepts available for global policy and cluster local policy. The contents of the example files illustrate those concepts. In the "global_policy" file

        # -----------------------------------
        # name         max
        # -----------------------------------
        lic1           8
        lic2           1
        lic3           20
        lic4           4
        lic5           4
        lic6           3
        lic7           9
        lic8           7
        lic9           8
        lic10          10
        # -----------------------------------

and in the local "policy" files

        # -----------------------------------
        # name         min         max
        # -----------------------------------
        lic1           2           4
        lic2           0           4
        lic3           2           15
        lic4           1           4
        lic5           0           2
        lic6           0           3
        lic7           1           9
        lic8           2           5
        lic9           0           8
        lic10          2           10
        # -----------------------------------

three different parameters can be specified for each license:

  • the maximum amount available enterprise-wide
  • the minimum amount that shall always be available at a local cluster irrespective of the actual load situation
  • the maximum amount as an upper limit for each local cluster

these parameters set the boundary conditions. Within those conditions the license juggler transfers licenses from/to local clusters in order to optimize license utilization at enterprise-level based on each local clusters demand.

Bridging the gap

One of the assumptions is that license juggler must work with locally dispersed self-contained and autonomous DRM systems very few can be assumed in terms of shared infrastructure amongst the DRM systems installations. For example constraining a file system be shared with these installations would be absolutely unacceptable. The constraint with license juggler POC is that from the machine where license juggler runs a rsh-alike command can be used to connect with each local cluster.

License juggler requires for each cluster a <cluster>/rsh script wrappers be available to allow "juggler_remote" be run remotely for three different purposes

  • retrieving information about current license demand
  • retrieving information about current license utilization
  • changing number of available licenses based on decision made by license juggler

the <cluster>/rsh script wrapper is used to start "juggler_remote" script in three related modes.

        # apac/rsh juggler_remote required lic1,lic2,lic3,lic4,lic5,lic6,lic7,lic8,lic9,lic10
        lic10 13
        lic1 7
        lic2 12
        lic3 4
        lic4 4
        lic5 2       
        lic6 41
        lic7 10
        lic8 7
        lic9 1
        # apac/rsh juggler_remote utilized lic1,lic2,lic3,lic4,lic5,lic6,lic7,lic8,lic9,lic10
        lic1 1
        lic2 4
        lic5 1
        lic6 4
        # apac/rsh juggler_remote assigned lic1=3,lic2=1,lic3=4,lic4=2,lic5=2,lic6=1,lic7=2,lic8=3,lic9=3,lic10=4,

the purpose of the rsh script wrappers is to hide details of each local cluster (cluster administrative host, directory where "juggler_remote" script is kept, Sun Grid Engine settings file, ...) behind an abstract interface.

Can license juggler be used with Sun Grid Engine only?

No. The only requirement is that the DRMS has a means to manage license capacities that are comparable to Sun Grid Engine global consumables. Though the sample rsh script wrappers assume Sun Grid Engine clusters be contacted but as a matter of course they could be similarly written for earlier Sun Grid Engine version or other DRMS such as LSF or PBS if also "juggler_remote" script is customized accordingly.

Using license juggler

What is required at each local cluster?

With a Sun Grid Engine cluster two things must be done before license juggler can interconnect

  • for each enterprise-wide managed license a complex entry (see centry(5)) needs to be configured to define properties of the consumable
        #name               shortcut   type        relop requestable consumable default  urgency 
        #----------------------------------------------------------------------------------------
        lic1                l1         INT         <=    YES         YES        0        0
        lic2                l2         INT         <=    YES         YES        0        0
        lic3                l3         INT         <=    YES         YES        0        0
        lic4                l4         INT         <=    YES         YES        0        0
        lic5                l5         INT         <=    YES         YES        0        0
        lic6                l6         INT         <=    YES         YES        0        0
        lic7                l7         INT         <=    YES         YES        0        0
        lic8                l8         INT         <=    YES         YES        0        0
        lic9                l9         INT         <=    YES         YES        0        0
        lic10               l10        INT         <=    YES         YES        0        0
        # >#< starts a comment but comments are not saved across edits --------

as a matter of course the consumable also must be requested using -l qsub(1) option with with jobs that depend upon the license.

  • the "juggler_remote" script must be available locally and runnable at the administrative host chosen in the <cluster>/rsh script wrapper. To be sure it works you can try running the three "juggler_remote" operations manually as shown above.

Normal operation

The "juggler" script requires three parameters be passed

  # juggler <cluster_list> <license_list> <sleep interval>

for each <cluster> in the comma-separated <cluster_list> there must be a corresponding sub-directory with "policy" and "rsh" file. Note the <cluster> directory is also used to store cluster specific temporary information.

It is assumed there is an entry in the "global_policy" file for each license specified in the comma-separated <license_list>. Similarly an entry for each license in each <cluster>/policy files should be kept. Once juggler is running it is safe to arbitrarily change policies global maximum and local minimum/maximum parameters for a license. Changing the set of clusters or licenses however requires "juggler" be restarted.

For monitoring purposes "juggler" logs relevant information into a "messages" file. The "messages" file e.g. allows to comprehend per cluster load situation and related share assignments separately for each license.

Using the <sleep_interval> one can control how long "juggler" shall be inactive after a scheduing interval. Please note a short sleep interval is good though for an immedieate license transfe but it imposes some stress upon each local cluster due to the three remote operations that are done each interval for all licenses managed at enterprise-level.

Sun Grid Engine maintainance shutdown and network outages

In case of local DRMS maintainace shutdown "juggler_remote" neither can retrieve new information about license demand/utilization nor is it possible to apply the most recent license share entitlement to the cluster. The same is true in case of partial network outages. License juggler tries to take that into account by assuming the most recent information about license demand/utilization still would be valid. Aside of this license juggler simply continues operation to ensure license sharing amongst reachable clusters is not interrupted.

Personal tools
Namespaces

Variants
Actions
GridWiki Navigation
Toolbox