Remediation campaigns

This page describes how the collector helps in planing and executing a remediation campaign on a module's rules.

As an example, we will consider an IT merger situation. At day 0, most infrastructure services are provided twice (dns, masters, ...). The Linux package repository services are merged first into a single common service.

A common practice would be to develop a script to execute in a ssh loop to reconfigure the servers. The drawbacks of this method are:

  • The code will be complex, as it needs to implement different identification methods of vendor, major version, minor version and architecture for Red Hat, Debian, SuSE, Ubuntu, Oracle Linux, CentOS, etc ...
  • The code must handle logging to help determine afterwards if the configuration has succeeded or failed
  • Code complexity augments the risk of producing bugs
  • The ssh loop logs will be hard to audit for errors, even with logs
  • A precise server list to feed to the ssh loop will be hard to produce : servers with the target configuration might be included, servers might be omitted
  • The ssh loop will likeky serialize the reconfiguration actions, meaning the global reconfiguration can take a long time
  • Once the servers on the list are reconfigured, you have no way to measure the drift back to old the configuration : restores, human habits, obscure configuration script not updated, ... all contribute to this drift back effect

This page presents the steps to a successful remediation campaign using the OpenSVC compliance framework in this scenario.

Ruleset design

All the servers are known to the collector, so the rulesets can be contextualized as:

+- it.sys.linux.repo (contextual ruleset, shown to all linux servers)
   |
   +- it.sys.linux.repo.apt (contextual ruleset, matching Debian and Ubuntu servers)
   |  |
   |  `- REPO_FILE_1 (file-class rule installing /etc/apt/sources.list/it.list with a content using variable substitution for OS_ARCH, OS_RELEASE, OS_UPDATE to format the repo url)
   |
   +- it.sys.linux.repo.zypper (contextual ruleset, matching SuSE servers)
   |  |
   |  `- REPO_FILE_1 (file-class rule installing /etc/zypper.d/it.repo with a content using variable substitution for OS_ARCH, OS_RELEASE, OS_UPDATE to format the repo url)
   |
   `- it.sys.linux.repo.yum (contextual ruleset, matching Red Hat, Oracle Linux and CentOS servers)
      |
      `- REPO_FILE_1 (file-class rule installing /etc/yum.repos.dl/it.repo with a content using variable substitution for OS_ARCH, OS_RELEASE, OS_UPDATE to format the repo url)

Module development

We will name the module it.sys.linux.repo.

With the above ruleset design, the module is executed with OSVC_COMP_REPO_FILE_1 set in its environment to a contextualized value.

The code is thus limited to executing the files compliance object with OSVC_COMP_REPO_FILE_ as the prefix parameter.

The ruleset can get more complicated, adding repository geo-affinity and setting additional repositories for VMware virtual machines for example, but the module code will stay that simple, unchanged.

Module deployment

The new module must be tested in a development box, commited in a tracker for auditability, and deployed in the module repository known to the OpenSVC agents through the node.repocomp node.conf parameter.

At this point the module is still not scheduled for periodic check runs by the agent, as it is not part of a moduleset.

Test on a representative server set

On a set of servers exercising all possible ruleset contextualizations, test the module using the commands:

# sudo nodemgr updatecomp

# sudo nodemgr compliance fix --module it.sys.linux.repo

Check the logs in the Compliance > Logs view or in the output of the fix command, verify that the package manager behaves as expected.

Periodic check Activation

We will consider all Linux servers have a default base moduleset attached, named it.sys.linux. This moduleset contains all the base system configuration modules : nameservers, timeservers, mailservers, printservers, internationalization settings, admin accounts, ...

Add the it.sys.linux.repo module to the it.sys.linux moduleset to activate the periodic checks. The default check period is once per week, on sundays. You can set the periodicity to once per day for more a responsive compliance system.

Remediation campaign

One period later, the collector has received all the check results of the it.sys.linux.repo module for all the Linux servers.

You can use this dataset in the Compliance > Status collector view to :

  • display only the results for the it.sys.linux.repo module
  • filter-out the servers with an already compliant check result
  • filter-out production servers
  • select the first 20 servers
  • trigger the fix action

The collector will spawn threads to execute the actions in parallel and thus minize the overall execution time.

Optionally, the action queue can be accessed by clicking on the gear icon next to the top-right seach box. In this tabular view you can see :

  • which actions are in queued/running/done state
  • the command execution stdout and stderr

Back to the Compliance > Status view, once the actions are all done, you can confirm that all the it.sys.linux.repo module check status are now compliant.

At this point if everything went as expected, you can unroll your campaign by selecting more servers and removing the scope-limiting column filters set previously.

The campaign can span multiple days, week or months. The collector will always keep track of the servers still misconfigured. Moreover, if fixed servers drift back to a non-compliant state they will return naturally in the campaign server list.