Motivation - Why are we doing this?
Enterprises with extended need for compute power such as EDA companies since many years are at the forefront of Grid Computing. To satisfy their need those enterprises in many cases are used to operate self-contained and autonomous DRMS (Distributed Resource Management System) such as Sun Grid Engine spread over different buildings, locations and sometimes even geographical regions and sometimes the number DRMS goes into the dozens! The omnipresent cost-pressure and the vision of the "Global Grid" feeds more and more demand to consolidate dispersed DRMS-based infrastructures. Consolidating dispersed DRS-based infrastructure can be a non-trivial task due to a variety of often contrary objectives.
One of the most commonly raised consolidation objectives is how EDA tool software licenses can reasonably be shared amongst locally dispersed DRMS installations. In contrast to centralizing actual compute hardware putting license sharing into practice is comparably easy and anyhow helps cost cutting. The "License Juggler" described here should be perceived as a proove of concept on the road towards a related feature/product/solution/whatever that helps Sun Grid Engine adminstrators to put license sharing into practice rather than talking about it :-)
Please feel invited to provide feedback via Grid Engine users@ mailing list.
Note: This document is under Grid Engine project SISSL license.
- at least two independent Sun Grid Engine installations are required
- the license juggler package contains everything needed for license sharing amongst the Sun Grid Engine installations License juggler package
What's inside the juggler package?
The license juggler package contains an example setup where 10 licenses are shared amongst three different locations
emea/policy emea/rsh amer/policy amer/rsh apac/policy apac/rsh global_policy
and some small files (bourne shell, C program)
juggler juggler_remote schedule.c
that make the actual license juggler.
How to use the license juggler?
The following sections below will give you an overview on license juggler key concepts. Once license juggler is set-up the script "juggler" can be started to optimize enterprise-widde license utilization
# juggler emea,apac,amer lic1,lic2,lic3,lic4,lic5,lic6,lic7,lic8,lic9,lic10 3 Fri Jul 8 14:34:54 MEST 2005: ======= new interval ======= Fri Jul 8 14:34:54 MEST 2005: ======= updating cluster information ======= emea/rsh juggler_remote utilized lic1,lic2,lic3,lic4,lic5,lic6,lic7,lic8,lic9,lic10 apac/rsh juggler_remote utilized lic1,lic2,lic3,lic4,lic5,lic6,lic7,lic8,lic9,lic10 amer/rsh juggler_remote utilized lic1,lic2,lic3,lic4,lic5,lic6,lic7,lic8,lic9,lic10 emea/rsh juggler_remote required lic1,lic2,lic3,lic4,lic5,lic6,lic7,lic8,lic9,lic10 amer/rsh juggler_remote required lic1,lic2,lic3,lic4,lic5,lic6,lic7,lic8,lic9,lic10 apac/rsh juggler_remote required lic1,lic2,lic3,lic4,lic5,lic6,lic7,lic8,lic9,lic10 Fri Jul 8 14:35:10 MEST 2005: ======= scheduling ======= lic1: cluster=emea rr_index=0 demand=6 util=2 min=0 max=4 targetted=3 assigned=2 lic1: cluster=apac rr_index=2 demand=4 util=4 min=0 max=4 targetted=2 assigned=2 lic1: cluster=amer rr_index=1 demand=6 util=2 min=0 max=4 targetted=3 assigned=2 lic1: global=8 demand=16 util=8 assigned=6 lic2: cluster=emea rr_index=0 demand=30 util=0 min=0 max=4 targetted=1 assigned=0 lic2: cluster=apac rr_index=2 demand=43 util=1 min=0 max=4 targetted=0 assigned=0 lic2: cluster=amer rr_index=1 demand=45 util=0 min=0 max=4 targetted=0 assigned=0 lic2: global=1 demand=118 util=1 assigned=0 lic3: cluster=emea rr_index=0 demand=36 util=4 min=0 max=4 targetted=4 assigned=4 lic3: cluster=apac rr_index=2 demand=36 util=4 min=0 max=4 targetted=4 assigned=4 lic3: cluster=amer rr_index=1 demand=32 util=8 min=0 max=15 targetted=12 assigned=12 lic3: global=20 demand=104 util=16 assigned=20 lic4: cluster=emea rr_index=0 demand=14 util=0 min=0 max=4 targetted=2 assigned=0 lic4: cluster=apac rr_index=2 demand=10 util=4 min=0 max=4 targetted=1 assigned=1 lic4: cluster=amer rr_index=1 demand=14 util=0 min=0 max=4 targetted=1 assigned=0 lic4: global=4 demand=38 util=4 assigned=1 lic5: cluster=emea rr_index=0 demand=14 util=2 min=0 max=4 targetted=2 assigned=2 lic5: cluster=apac rr_index=2 demand=15 util=1 min=0 max=4 targetted=1 assigned=1 lic5: cluster=amer rr_index=1 demand=15 util=0 min=0 max=2 targetted=1 assigned=1 lic5: global=4 demand=44 util=3 assigned=4 lic6: cluster=emea rr_index=0 demand=1 util=1 min=0 max=4 targetted=1 assigned=1 lic6: cluster=apac rr_index=2 demand=2 util=0 min=0 max=4 targetted=1 assigned=1 lic6: cluster=amer rr_index=1 demand=1 util=0 min=0 max=3 targetted=1 assigned=1 lic6: global=3 demand=4 util=1 assigned=3 lic7: cluster=emea rr_index=0 demand=3 util=1 min=0 max=4 targetted=3 assigned=3 lic7: cluster=apac rr_index=2 demand=4 util=0 min=0 max=4 targetted=4 assigned=4 lic7: cluster=amer rr_index=1 demand=2 util=0 min=0 max=9 targetted=2 assigned=2 lic7: global=9 demand=9 util=1 assigned=9 lic8: cluster=emea rr_index=0 demand=60 util=0 min=0 max=4 targetted=3 assigned=3 lic8: cluster=apac rr_index=2 demand=60 util=0 min=0 max=4 targetted=2 assigned=2 lic8: cluster=amer rr_index=1 demand=30 util=0 min=0 max=5 targetted=2 assigned=2 lic8: global=7 demand=150 util=0 assigned=7 lic9: cluster=emea rr_index=0 demand=24 util=0 min=0 max=4 targetted=3 assigned=3 lic9: cluster=apac rr_index=2 demand=24 util=0 min=0 max=4 targetted=2 assigned=2 lic9: cluster=amer rr_index=1 demand=12 util=0 min=0 max=8 targetted=3 assigned=3 lic9: global=8 demand=60 util=0 assigned=8 lic10: cluster=emea rr_index=0 demand=8 util=0 min=0 max=4 targetted=4 assigned=4 lic10: cluster=apac rr_index=2 demand=16 util=0 min=0 max=4 targetted=3 assigned=3 lic10: cluster=amer rr_index=1 demand=8 util=0 min=0 max=10 targetted=3 assigned=3 lic10: global=10 demand=32 util=0 assigned=10 Fri Jul 8 14:35:10 MEST 2005: ======= distribte assigned licenses ======= emea/rsh juggler_remote assigned lic1=2,lic2=0,lic3=4,lic4=0,lic5=2,lic6=1,lic7=3,lic8=3,lic9=3,lic10=4, apac/rsh juggler_remote assigned lic1=2,lic2=0,lic3=4,lic4=1,lic5=1,lic6=1,lic7=4,lic8=2,lic9=2,lic10=3, amer/rsh juggler_remote assigned lic1=2,lic2=0,lic3=12,lic4=0,lic5=1,lic6=1,lic7=2,lic8=2,lic9=3,lic10=3, : :
please note for actual scheduling "juggler" requires "schedule" binary for the target platform be available in the juggler directory. On a Solaris machine with Sun Studio compiler installed "schedule.c" can easily be compiled using
# cc -o schedule schedule.c
At first you should make yourself familar with the policy concepts available for global policy and cluster local policy. The contents of the example files illustrate those concepts. In the "global_policy" file
# ----------------------------------- # name max # ----------------------------------- lic1 8 lic2 1 lic3 20 lic4 4 lic5 4 lic6 3 lic7 9 lic8 7 lic9 8 lic10 10 # -----------------------------------
and in the local "policy" files
# ----------------------------------- # name min max # ----------------------------------- lic1 2 4 lic2 0 4 lic3 2 15 lic4 1 4 lic5 0 2 lic6 0 3 lic7 1 9 lic8 2 5 lic9 0 8 lic10 2 10 # -----------------------------------
three different parameters can be specified for each license:
- the maximum amount available enterprise-wide
- the minimum amount that shall always be available at a local cluster irrespective of the actual load situation
- the maximum amount as an upper limit for each local cluster
these parameters set the boundary conditions. Within those conditions the license juggler transfers licenses from/to local clusters in order to optimize license utilization at enterprise-level based on each local clusters demand.
Bridging the gap
One of the assumptions is that license juggler must work with locally dispersed self-contained and autonomous DRM systems very few can be assumed in terms of shared infrastructure amongst the DRM systems installations. For example constraining a file system be shared with these installations would be absolutely unacceptable. The constraint with license juggler POC is that from the machine where license juggler runs a rsh-alike command can be used to connect with each local cluster.
License juggler requires for each cluster a <cluster>/rsh script wrappers be available to allow "juggler_remote" be run remotely for three different purposes
- retrieving information about current license demand
- retrieving information about current license utilization
- changing number of available licenses based on decision made by license juggler
the <cluster>/rsh script wrapper is used to start "juggler_remote" script in three related modes.
# apac/rsh juggler_remote required lic1,lic2,lic3,lic4,lic5,lic6,lic7,lic8,lic9,lic10 lic10 13 lic1 7 lic2 12 lic3 4 lic4 4 lic5 2 lic6 41 lic7 10 lic8 7 lic9 1
# apac/rsh juggler_remote utilized lic1,lic2,lic3,lic4,lic5,lic6,lic7,lic8,lic9,lic10 lic1 1 lic2 4 lic5 1 lic6 4
# apac/rsh juggler_remote assigned lic1=3,lic2=1,lic3=4,lic4=2,lic5=2,lic6=1,lic7=2,lic8=3,lic9=3,lic10=4,
the purpose of the rsh script wrappers is to hide details of each local cluster (cluster administrative host, directory where "juggler_remote" script is kept, Sun Grid Engine settings file, ...) behind an abstract interface.
Can license juggler be used with Sun Grid Engine only?
No. The only requirement is that the DRMS has a means to manage license capacities that are comparable to Sun Grid Engine global consumables. Though the sample rsh script wrappers assume Sun Grid Engine clusters be contacted but as a matter of course they could be similarly written for earlier Sun Grid Engine version or other DRMS such as LSF or PBS if also "juggler_remote" script is customized accordingly.
Using license juggler
What is required at each local cluster?
With a Sun Grid Engine cluster two things must be done before license juggler can interconnect
- for each enterprise-wide managed license a complex entry (see centry(5)) needs to be configured to define properties of the consumable
#name shortcut type relop requestable consumable default urgency #---------------------------------------------------------------------------------------- lic1 l1 INT <= YES YES 0 0 lic2 l2 INT <= YES YES 0 0 lic3 l3 INT <= YES YES 0 0 lic4 l4 INT <= YES YES 0 0 lic5 l5 INT <= YES YES 0 0 lic6 l6 INT <= YES YES 0 0 lic7 l7 INT <= YES YES 0 0 lic8 l8 INT <= YES YES 0 0 lic9 l9 INT <= YES YES 0 0 lic10 l10 INT <= YES YES 0 0 # >#< starts a comment but comments are not saved across edits --------
as a matter of course the consumable also must be requested using -l qsub(1) option with with jobs that depend upon the license.
- the "juggler_remote" script must be available locally and runnable at the administrative host chosen in the <cluster>/rsh script wrapper. To be sure it works you can try running the three "juggler_remote" operations manually as shown above.
The "juggler" script requires three parameters be passed
# juggler <cluster_list> <license_list> <sleep interval>
for each <cluster> in the comma-separated <cluster_list> there must be a corresponding sub-directory with "policy" and "rsh" file. Note the <cluster> directory is also used to store cluster specific temporary information.
It is assumed there is an entry in the "global_policy" file for each license specified in the comma-separated <license_list>. Similarly an entry for each license in each <cluster>/policy files should be kept. Once juggler is running it is safe to arbitrarily change policies global maximum and local minimum/maximum parameters for a license. Changing the set of clusters or licenses however requires "juggler" be restarted.
For monitoring purposes "juggler" logs relevant information into a "messages" file. The "messages" file e.g. allows to comprehend per cluster load situation and related share assignments separately for each license.
Using the <sleep_interval> one can control how long "juggler" shall be inactive after a scheduing interval. Please note a short sleep interval is good though for an immedieate license transfe but it imposes some stress upon each local cluster due to the three remote operations that are done each interval for all licenses managed at enterprise-level.
Sun Grid Engine maintainance shutdown and network outages
In case of local DRMS maintainace shutdown "juggler_remote" neither can retrieve new information about license demand/utilization nor is it possible to apply the most recent license share entitlement to the cluster. The same is true in case of partial network outages. License juggler tries to take that into account by assuming the most recent information about license demand/utilization still would be valid. Aside of this license juggler simply continues operation to ensure license sharing amongst reachable clusters is not interrupted.