Olesen-FLEXlm-Configuration

From GridWiki
Revision as of 09:56, 29 February 2008 by Olesen (talk | contribs) (Newly created)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Olesen FlexLM -- configuration

Configuration notes for the qlicserver program

  • This document was last updated: ----olesen 2008-02-29T11:07+0100

Preliminary Information

As of version 1.40, the command-line options and the configuration of the qlicserver have changed. The goal of the changes is to improve the ease of deployment and to increase the flexibity of the program in general. Editing the source code should no longer be required, but can be done if you either wish to operate without a configuration file, or would like to have some failsafe values hard-coded in. Although upgrading from a previous version should be straightforward, please do check if there is now a different way of doing things or if a particular option has been eliminated. The -h option usually provides a good starting point.

The most notable changes:

A configuration file can be used for most of the license configuration and operation parameters -- qlicserver.config

Resource limits controllable from a file or a directory of files -- qlicserver.limits

With supplementary XSLT sheets it allows web-based monitoring of the resources.

  • Optional qhost/qstat output

These results of these queries can be used by processed by appropriate XSLT sheets (eg, within xmlqstat) to monitor the status of jobs and hosts.

  • Silent operation by default

The previous -q (quiet) option has been dropped in favour of explicitly requesting output with the output parameter.

Many program parameters can also be set in the configuration file or via the command-line.

  • More flexible daemon modes

It is now possible to operate like a true daemon (without a parent) or to monitor a particular process or process id.


The essential license accounting logic remains unchanged from previous versions.

Command-line options and parameters

As outlined in the usage provided by the program, the command-line encompasses both options and parameters:

   usage: qlicserver [OPTION] [PARAM]

The options primarily control the general mode of the program (i.e,, setup, prolog query, daemon mode, etc.), whereas the parameters are used to provide finer-grained control.

Parameters

The parameters are supplied on the command-line with a simple keyword=value syntax. Parameters appearing on the command-line have precedence over those that may occur in a configuration file.

Command-line only parameters

A few parameters may only appear on the command-line and not in a configuration file.

config=FILE
By default, the qlicserver.config configuration file is searched in the same directory as the qlicserver program itself. The config parameter can be used to specify an alternative configuration to use. This can often be useful in testing situations.
limits=FILE
By default, the qlicserver.limits is searched in the same directory as the qlicserver program itself. The limits parameter can be used to specify alternative limits to use.

Static parameters

Some parameters are deemed static, which means that their values are used when the program initial starts, but are not updated subsequently within the daemon loop. These parameters may appear on the command-line or in a configuration file.

delay=INT
This is the waiting period in seconds (positive integer) between queries in the daemon mode (invoked by the -d option). Invalid values are ignored. The default period is documented in the program usage.
ppid=(ppid | CMD | N)
For proper and reliable operation in the daemon mode (using the option -d), the program will normally not run as a true daemon, but instead watch the parent process from where it started. This behaviour provides a means of handling controlled shutdown and avoiding communicating with a non-existent sge_qmaster. An easy way to have this behaviour, is to start the qlicserver program from a load-sensor script on the headnode. When the sge_execd is stopped or restarted, the load-sensor script will exit and the program will follow. This default behaviour is equivalent to specifying the string value ppid:

   ppid=ppid

As an alternative to starting the qlicserver program via a load-sensor, it could be started with an init-script and instructed to watch the pid of another command (process). For example, monitoring the sge_qmaster process:

   ppid=sge_qmaster

The specified process name can be any process currently running on the system (apart from a process named ppid, since that is already taken for the default). If you really wished, specifying ppid=vi will have the program exit when the vi editor closes!
For any remaining situations, a numeric process id can be specified. Specifying the process id 0 or 1 (the init process) causes the program to run as a true daemon:

   ppid=0
   ppid=1

For the default behaviour (ppid=ppid), a zero signal is sent to check if the process is still alive. For the other cases, a ps command will be used when the /proc file-system does not appear to exist.


Standard Parameters

The remaining parameters can be specified on the command-line or in a configuration file. If the modification time of a configuration file changes between daemon loops, it will be re-read and the parameter values are updated. It is thus possible to change many parameters without restarting the program. The command-line parameters maintained their precedence throughout the program life.

dir=DIR
This provides a common output directory for the output files specified by the output, qhost, qstat parameters. For example,

   dir=/srv/www/htdocs/xmlcache \
   output=qlicserver.xml \
   qhost=qhost.xml \
   qstat=qstat.xml

instead of the more verbose form:

   output=/srv/www/htdocs/xmlcache/qlicserver.xml \
   qhost=/srv/www/htdocs/xmlcache/qhost.xml \
   qstat=/srv/www/htdocs/xmlcache/qstat.xml

The dir parameter will be ignored if an absolute path has already been specified. For example,

   dir=/srv/www/htdocs \
   output=xmlcache/qlicserver.xml \
   qstat=/srv/www2/htdocs/qstat.xml

output=FILE
This parameter specifies the file to which the current query status should be saved. The query status is written in a "XML output format".
When using the -n option to conduct a dry-run, it can be useful to send the output to the stdout:

   output=-

qhost=FILE
This parameter specifies that a supplementary GridEngine qhost command be executed and the results saved to a file. This parameter is likely only useful when the generated data can be fed to some other diagnostic tool or web interface.
qstat=FILE
The GridEngine qstat command is issued as a central part of the qlicserver logic. This parameter specifies that these byproduct results also be saved to a file.
timeout=INT
The timeout in seconds (positive integer) to be used when executing system commands. Invalid values are ignored. The default timeout is documented in the program usage.
LM_LICENSE_FILE=STRING
This parameter overrides the identically named environment setting used for the FlexLM license server query.
lmutil=STRING
Instead of searching the path for the lmutil command need for the FlexLM license server query, this parameter can be used to provide a fully qualified path to lmutil command.


Input Files

Although an XML syntax has been choosen for the input files, please don't try to show off by breaking the extremely lightweight XML parser used in the program. The parser will tries to find specific XML tags and doesn't really care about much else. To make life generally easier, it does properly strip XML comments.

XML configuration file

The simplest explanation is provided by a sample qlicserver.config file with interspersed comments:

<?xml version="1.0"?>
<qlicserverConfig>
  <parameters type="overwrite">

If the optional [type] attribute for the <parameters> element has the value overwrite, any default parameters defined in the program source are cleared. The normal behaviour is to merge with existing values.

    <param name="dir">/srv/www/htdocs/xmldata</param>
    <param name="output">qlicserver.cache</param>
    <param name="LM_LICENSE_FILE">@serv1:@serv2</param>

The [name] attribute is mandatory for the <param> element.

  </parameters>
  <resources type="overwrite">

If the optional [type] attribute for the <resources> element has the value overwrite, any default resources defined in the program source are cleared. The normal behaviour is to merge with existing values.

    <resource name="abaqus" served="abaqus"/>
    <resource name="compile"/>

The [name] attribute is mandatory for the <resource> elements. The optional [served] attribute maps the license 'feature' served by the license manager to a GridEngine resource (complex) name. A resource without a [served] attribute is considered an internal resource.

    <resource name="nastran" served="NASTRAN" urgency="100"/>

The optional [urgency] attribute specifies how much extra weight to give to particular resources. This might be useful, for example, to increase the priority of more expensive licenses. It only affects the output generated by the -c option.

    <resource name="foam" total="40"/>

If the optional [total] attribute is given, this can be used to establish what the normal configuration of this internal resource should look like as noted in the section about "limits for internal resources". The [total] attribute has no meaning for served resources.


    <resource name="hexa" served="aihexa" type="track"/>

The optional [type] attribute with the value track indicates that this resource is not to be managed, but its status should be added to the output. This is primarily good for interactive licenses - to see who has used up all your licenses.

    <resource name="starp" served="hpcdomains" note="STAR-CD parallel"/>

Since the [note] attribute is not specified anywhere, we can use it to add some useful information for ourselves.

    <resource name="stars" served="starsuite" limit="2" note="STAR-CD serial"/>

The optional [limit] attribute specifies an upper limit on how many resources the GridEngine is allowed to manage. This can be used to prevent jobs from flooding the cluster and/or prevent jobs from consuming all the available licenses. If a negative value is specified, it is subtracted from the total to obtain the limit.

    <derived name="starcd">
     <element>starp</element>
     <element>stars</element>
    </derived>

Derived resources are combined from sub-elements and inherit the limits from their sub-elements.

  </resources>
</qlicserverConfig>

Resource limit files

When licenses are being used interactively and in the cluster, it can sometimes be necessary to temporarily adjust the number of managed resources. For this reason, the limits of the managed resources can also be adjusted by using an additional limits file or by a set of files.

When qlicserver.limits is a file, the limits are extracted from an XML structure like this:

<?xml version="1.0"?>
<qlicserverLimits>
  <limits>
    <limit name="gtpower" limit="7"/>
    <limit name="stars"   limit="2"/>
    <limit name="starp"   limit="20"/>
  </limits>
</qlicserverLimits>

Using a single file is fine, but what if a user should be allowed to adjust the limits on a particular resource, but not touch the limits of any other resources? The simple solution used here is to split the resource limits across several files and use file permissions (or ACLs) to control access as required. For some sites, some form of authentication program could be also be used when generating the file contents.

When qlicserver.limits is a directory, all the files in the directory that correspond to a resource name will be read. If a file does not exist, or has the incorrect permissions, the user cannot change the limits for a particular resource. For example,

   -rw-r--r--  user group   qlicserver.limits/resource1
   -rw-rw-r--  user group1  qlicserver.limits/resource2
   -rw-rw-r--  user group2  qlicserver.limits/resource3

The format of the files is very simple. A line containing a single integer (with possible whitespace) will be taken as the limit. If multiple lines match this criterion, only the final one will be used. For example,

   # "gtpower" limit modified by olesen 2008-01-31T09:00:00
   4

Removing this limit just requires any non-integer value:

   # "gtpower" limit modified by olesen 2008-01-31T09:05:00
   NONE

The qlic utility provides a simple means of specifying new limits:

   qlic resource=limit .. resource=limit  # set new limits

The -l option lists the current limits.

Resource limits

Normally, the total number of a served feature (license) reported by the server is to be managed by this program and ultimately by the GridEngine. There are, however, situations in which it is useful to restrict the number of resources to be managed:

  • Rectify buggy license server information

We've had this problem with a paricular vendor daemon getting confused by a mix of maintained and non-maintained licenses. The server reports having an extra nonexistent license!

  • Reserve some license for external use

When licenses can be used interactively or in batch mode, it can be necessary to reserve some licenses for interactive use.

  • Prevent jobs from flooding the cluster

When a large number of licenses are available, it might be desirable to only use a fraction of them, rather than filling the entire cluster. This can also be useful when managing unlicensed and unlimited resources.


When resource limits are used, it can be sometimes be useful to attach the normal limit directly to the resource within the configuration file as well as using the resource limit files. This provides a fallback value to be used when the file-based limit is removed. For internal resources, the situation is a little bit tricker. The limits given correspond directly to the resource value managed by the GridEngine. Thus the last file-based limit that was imposed will be remain in effect until a new limit is imposed. To have the internal resource restore to a particular value when the limits are removed, it is necessary to supply addtional information about the resourcee in the form of the optional [total] attribute,

Output

XML output format

The result of the query process can be saved in XML format to a file with the output parameter. The output strongly resembles the "XML configuration file" contents. A sample output is given here, with interspersed comments:

<?xml version="1.0"?>
<qlicserver release="1.42" releaseDate="2008-01-29">

The XML comment summarizes the qconf adjustment that was imposed.

  <query>
    <host>dealog01</host>
    <user>cfdadmin</user>
    
  </query>

The <query> section provides information about the host on which the program ran, the user who invoked the program. The

  <parameters>
    <env name="SGE_ROOT">/opt/n1ge6</env>
    <env name="SGE_CELL">default</env>
    <env name="SGE_ARCH">lx26-x86</env>
    <env name="SGE_BINARY_PATH">/opt/n1ge6/bin/lx26-x86</env>
    <param name="LM_LICENSE_FILE">@serv1</param>
    <param name="dir">/opt/n1ge6/default/site</param>
    <param name="output">qlicserver.cache</param>
    <param name="qhost">qhost.cache</param>
    <param name="qstat">qstat.cache</param>
  </parameters>

The <parameters> section contains both <param> elements and <env> elements. The <env> elements are only printed if they are set. If a value can be set as a parameter or from the environment (eg, LM_LICENSE_FILE), preference will be given to printing the parameter setting.

  <resources>
    <derived name="starcd">
      <element>starp</element>
      <element>stars</element>
    </derived>

For general reference, the elements of any derived resources are given first.

    <resource name="abaqus" served="abaqus" total="12" extern="5" free="7">
      <user name="user1" host="host1" type="extern">5</user>
    </resource>
    <resource name="hexa" served="aihexa" total="2" free="2" type="track"/>
    <resource name="hyper" served="HyperWorks" total="0" type="track"/>

The resource distribution is given in a readable form. From this example, the total="0" shows that the HyperWorks license daemon was suffering from a slight problem at the moment.

    <resource name="starcd" total="28" limit="26" intern="24" waiting="12" free="2">
      <user name="user2" host="host2" type="intern">6</user>
      <user name="user3" host="host3" type="intern">6</user>
      <user name="user3" host="host4" type="intern">6</user>
      <user name="user3" host="host5" type="intern">6</user>
      <user name="user4" type="waiting">6</user>
      <user name="user3" type="waiting">6</user>
    </resource>

Information is not only provided about where a resource is in used, but also about the current demand. Since the starcd resource is derived, there is no [served] attribute. Job holds and execution after a particular time are ignored when building these values.

The <resource> attributes that may be written:

   name    served  total   limit
   extern  intern  waiting free    type

Extra notes and urgencies are not retained when printing the resources.

  </resources>
</qlicserver>

Miscellaneous

  • XSLT sheets

XSLT sheets have been written to convert the qlicserver output, the qhost output and the qstat output to xhtml. They are currently being integrated with xmlqstat.

  • qlic program

The qlic utility has been updated for the XML output format.