Category: Technology

Shell Script: Path To Script

We occasionally have to re-home our shell scripts, which means updating any static path values used within scripts. It’s quick enough to build a sed script to convert /old/server/path to /new/server/path, but it’s still extra work.

The dirname command works to provide a dynamic path value, provided you use the fully qualified path to run the script … but it fails spectacularly whens someone runs ./scriptFile.sh and you’re trying to use that path in, say, EXTRA_JAVA_OPTS. The “path” is just . — and Java doesn’t have any idea what to do with “-Xbootclasspath/a:./more/path/goes/here.jar”

Voila, realpath gives you the fully qualified file path for /new/server/path/scriptFile.sh, ./scriptFile.sh, or even bash scriptFile.sh … and the dirname of a realpath is the fully qualified path where scriptFile.sh resides:

#!/bin/bash
DIRNAME=`dirname $(realpath "$0")`
echo ${DIRNAME}

Hopefully next time we’ve got to re-home our batch jobs, it will be a simple scp & sed the old crontab content to use the new paths.

Switching run levels in systemd

Yeah, I know there aren’t actually run levels anymore. The systemd equivalent of running init 3 to boot into console mode is

systemctl isolate multi-user.target

And the equivalent of running init 5 to boot into GUI mode is

systemctl isolate graphical.target

This is a one-time thing, not a config change. If you want to permanently switch to console mode you’d use systemctl set-default multi-user.target

Jenkins: Creating A Build Pipeline

Prerequisites:

You will need the “Git” plugin (https://plugins.jenkins.io/git).

You will need the “GitHub” plugin (https://plugins.jenkins.io/github)

Setting Up Access Within GitHub:

Log into GitHub and navigate to your repository. Click the “Settings” tab, then select “Developer settings” from the bottom of the left-hand menu. From the Developer Settings page, select “Personal access tokens”.

Click “Generate new token” to add a token for your Jenkins integration.

Provide a description for the token and select permissions – read access to the repo is sufficient.

Save the token and copy the secret text.

Setting Up Jenkins – Configuring GitHub Integration

On your Jenkins server, select “Manage Jenkins”

Select “Configure System”

Scroll down – possibly a lot – to the GitHub section. Click on the “Add GitHub Server” drop-down and select “GitHub Server”

Provide a name, the API URL is pre-populated. Next to Credentials, click the drop-down for “Add” and select “Jenkins”.

The credential kind is “Secret text”, and “ID” is your GitHub user ID. Save the credential

Select cred from drop-down and test

Hopefully the credentials are verified, you are done.

Using Jenkins – Creating A Basic Pipeline:

Click on “New Item”, create a new Freestyle project, and give it a descriptive name.

Since this is a GitHub project, I’m adding the project URL – that’s the actual project URL, not the URL for a specific branch or the path to clone the project.

As you scroll down, the tab will change to “Source Code Management”. Select “Git” and enter the URL used to clone the repository. If you have not already added credentials, click “Add”; otherwise select the appropriate credential from the drop-down menu. If you intend to build a branch other than master, correct the branch name.

Build triggers will depend on what exactly you want to happen. You can trigger new builds based on PRs or push activity. You can schedule a nightly build.

If there are a lot of changes, you may not wish to re-build the project every single time the repo changes. Conversely if the repo rarely changes, nightly builds waste a lot of cycles), etc.

Using the hook trigger requires that your Jenkins server be Internet-accessible and as such has a non-zero risk of malicious access. You can expose your endpoint through a reverse proxy to have more control over service access. I have also experimented with using GitHub provided metadata, https://api.github.com/meta, to restrict access to certain subnets. A potential attacker could still proxy their access by attempting to register your Jenkins endpoint in their GitHub project … but that’s a narrower attack vector than “anyone who can make a web call”.

If you want to trigger builds based on changes within the GitHub project, you can configure Jenkins to automatically register webhooks or you can manually add the webhook to your project.

Manual Webhook creation: Within your project’s “Settings” tab, select “Webhooks” and then “Add webhook”.

Automatic Webhook creation: Manage Jenkins => Configure System. In the GitHub section, click the second “Advanced” button (with a notepad next to it).

Click the “Additional Actions” drop-down menu and select “Convert login and password to token”

Enter your credentials and click “Create token credentials”

A message will be displayed confirming the credential.

In this case, I will schedule a nightly build of the project. After selecting “Build periodically”, enter the cron-like expression to control when you want builds to occur. To avoid having a lot of project builds initiated at quarter-hour marks, use the modifier “H” to indicate a time range. In this example, the build will be triggered some time between 02:00 and 04:59. Since the value of H is a hash of the job name, the build time will be consistent (i.e. the time displayed below the schedule field will be the time used each cycle). This means it is still possible to have a number of builds scheduled simultaneously.

Time, by default, is relative to your Jenkins’ server JVM configuration. You can override that setting by adding a TZ directive at the beginning of the schedule field.

There are a number of pre-build and post-build actions you can take, and various add-on modules expand this functionality. You can manage builds, Docker containerization, and deployment into Kubernetes clusters from Jenkins build pipelines.

Once the job has been saved, you can run it immediately by returning to the dashboard. Click the little clock to the right of the item listing.

Once a build has been completed, the item’s workspace will contain the build and console output from the build job. If a job fails, console output is a good point to start troubleshooting.

LDAP Auth With Tokens

I’ve encountered a few web applications (including more than a handful of out-of-the-box, “I paid money for that”, applications) that perform LDAP authentication/authorization every single time the user changes pages. Or reloads the page. Or, seemingly, looks at the site. OK, not the later, but still. When the load balancer probe hits the service every second and your application’s connection count is an order of magnitude over the probe’s count … that’s not a good sign!

On the handful of sites I’ve developed at work, I have used cookies to “save” the authentication and authorization info. It works, but only if the user is accepting cookies. Unfortunately, the IT types who typically use my sites tend to have privacy concerns. And the technical knowledge to maintain their privacy. Which … I get, I block a lot of cookies too. So I’ve begun moving to a token-based scheme. Microsoft’s magic cloudy AD via Microsoft Graph is one approach. But that has external dependencies — lose Internet connectivity, and your app becomes unusable. I needed another option.

There are projects on GitHub to authenticate a user via LDAP and obtain a token to “save” that access has been granted. Clone the project, make an app.py that connects to your LDAP directory, and you’re ready.

from flask_ldap_auth import login_required, token
from flask import Flask
import sys

app = Flask(__name__)
app.config['SECRET_KEY'] = 'somethingsecret'
app.config['LDAP_AUTH_SERVER'] = 'ldap://ldap.forumsys.com'
app.config['LDAP_TOP_DN'] = 'dc=example,dc=com'
app.register_blueprint(token, url_prefix='/auth')

@app.route('/')
@login_required
def hello():
return 'Hello, world'

if __name__ == '__main__':
app.run()

The authentication process is two step — first obtain a token from the URL http://127.0.0.1:5000/auth/request-token. Assuming valid credentials are supplied, the URL returns JSON containing the token. Depending on how you are using the token, you may need to base64 encode it (the httpie example on the GitHub page handles this for you, but the example below includes the explicit encoding step).

You then use the token when accessing subsequent pages, for instance http://127.0.0.1:5000/

import requests
import base64

API_ENDPOINT = "http://127.0.0.1:5000/auth/request-token"
SITE_URL = "http://127.0.0.1:5000/"

tupleAuthValues = ("userIDToTest", "P@s5W0Rd2T35t")

tokenResponse = requests.post(url = API_ENDPOINT, auth=tupleAuthValues)

if(tokenResponse.status_code is 200):
jsonResponse = tokenResponse.json()
strToken = jsonResponse['token']
print("The token is %s" % strToken)

strB64Token = base64.b64encode(strToken)
print("The base64 encoded token is %s" % strB64Token)

strHeaders = {'Authorization': 'Basic {}'.format(strB64Token)}

responseSiteAccess = requests.get(SITE_URL, headers=strHeaders)
print(responseSiteAccess.content)
else:
print("Error requesting token: %s" % tokenResponse.status_code)

Run and you get a token which provides access to the base URL.

[lisa@linux02 flask-ldap]# python authtest.py
The token is eyJhbGciOiJIUzI1NiIsImV4cCI6MTUzODE0NzU4NiwiaWF0IjoxNTM4MTQzOTg2fQ.eyJ1c2VybmFtZSI6ImdhdXNzIn0.FCJrECBlG1B6HQJKwt89XL3QrbLVjsGyc-NPbbxsS_U:
The base64 encoded token is ZXlKaGJHY2lPaUpJVXpJMU5pSXNJbVY0Y0NJNk1UVXpPREUwTnpVNE5pd2lhV0YwSWpveE5UTTRNVFF6T1RnMmZRLmV5SjFjMlZ5Ym1GdFpTSTZJbWRoZFhOekluMC5GQ0pyRUNCbEcxQjZIUUpLd3Q4OVhMM1FyYkxWanNHeWMtTlBiYnhzU19VOg==
Hello, world

A cool discovery I made during my testing — a test LDAP server that is publicly accessible. I’ve got dev servers at work, I’ve got an OpenLDAP instance on Docker … but none of that helps anyone else who wants to play around with LDAP auth. So if you don’t want to bother populating directory data within your own test OpenLDAP … some nice folks provide a quick LDAP auth source.

Did you know … you can use mini-charts to visualize Excel data?

Using charts and images, data visualization, clearly and efficiently communicates data. But when you’re trying to visualize statistics for several items, your chart can be anything but clear and hardly efficient to read. In this example, I’ve created a line chart depicting the monthly score for eight different people. While you can pick out obvious high or low performance, there’s not a whole lot of information being communicated here.

Did you know Excel can create mini-charts, known as “sparklines” to visualize individual statistics and compare statistics across items? Select the data that you want to compare. From the Insert ribbon bar, look for the “Sparklines” section. I am going to use a “line” style sparkline.

The data range will be selected. Enter the range where you want the mini-charts to display – this can be the row under your data or the column next to your data, or it can be some completely different location.

By default, the y-axis range for each mini-chart depends on the values of the data contained in the chart. This makes comparing the charts a little difficult – the scale is different. In the example below, scores in the 30’s don’t look different than scores in the 80’s.

Click on one of the mini-charts, and a “Design” tab will appear on the ribbon bar. Select it. Under “Axis”, change the minimum and maximum values to “Same for All Sparklines”.

Now you can see how individual performance varied as well as compare individuals.

Blank values will show up as broken lines in the mini-charts. If you do not want to display a gap, return to the “Design” ribbon bar and select “Edit data”. Select “Hidden & Empty Cells”

Select what you want instead of gaps – you can treat null values as zero or have a line drawn between the values on either side of the missing value.

Using Microsoft Graph

Single Sign-On: Microsoft Graph

End Result: This will allow in-domain computers to automatically log in to web sites and applications. Computers not currently logged into the company domain will, when they do not have an active authenticated session, be presented with Microsoft’s authentication page.
Requirements: The application must be registered on Microsoft Graph.

Beyond that, requirements are language specific – I will be demonstrating a pre-built Python example here because it is simple and straight-forward. There are examples for a plethora of other languages available at https://github.com/microsoftgraph

Process – Application Development:
Application Registration

To register your application, go to the Application Registration Portal (https://apps.dev.microsoft.com/). Elect to sign in with your company credentials.

You will be redirected to the company’s authentication page

If ADSF finds a valid token for you, you will be directed to the application registration portal. Otherwise you’ll get the same logon page you see for many other MS cloud-hosted apps. Once you have authenticated, click “Add an app” in the upper right-hand corner of the page.

Provide a descriptive name for the application and click “Create”

Click “Generate New Password” to generate a new application secret. Copy it into a temporary document. Copy the “Application Id” into the same temporary document.

Click “Add Platform” and select “Web”

Enter the appropriate redirect/logout URLs (this will be application specific – in the pre-built examples, the post-authentication redirect URL is http://localhost:5000/login/authorized

Delegated permissions impersonate the signed in user, application permissions use the application’s credentials to perform actions. I use delegated permissions, although there are use cases where application permissions would be appropriate (batch jobs, for instance).

Add any permissions your app requires – for simple authentication, the default delegated permission “User.Read” is sufficient. If you want to perform additional actions – write files, send mail, etc – then you will need to click “Add” and select the extra permissions.

Profile information does not need to be entered, but I have entered the “Home page URL” for all of my applications so I am confident that I know which registered app corresponds with which deployed application (i.e. eighteen months from now, I can still figure out site is using the registered “ADSF Graph Sample” app and don’t accidentally delete it when it is still in use).

Click Save. You can return to your “My Applications” listing to verify the app was created successfully.

Application Implementation:

To use an example app from Microsoft’s repository, clone it.

git clone https://github.com/microsoftgraph/python-sample-auth.git

Edit the config.py file and update the “CLIENT_ID” variable with your Application Id and update the “CLIENT_SECRET” variable with your Application Secret password. (As they note, in a production implementation you would hash this out and store it somewhere else, not just drop it in clear text in your code … also if you publish a screen shot of your app ID & secret somewhere, generate a new password or delete the app registration and create a new one. Which is to say, do not retype the info in my example, I’ve already deleted the registration used herein.)

Install the prerequisites using “pip install -r requirements.txt”

Then run the application – in the authentication example, there are multiple web applications that use different interfaces. I am running “python sample_flask.py”

Once it is running, access your site at http://localhost:5000

The initial page will load; click on “Connect”

Enter your company user ID and click “Next”

This will redirect to the company’s sign-on page. For in-domain computers or computers that have already authenticated to ADSF, you won’t have to enter credentials. Otherwise, you’ll be asked to logon (and possibly perform the two-factor authentication verification).

Voila, the user is authenticated and you’ve got access to some basic directory info about the individual.

Process – Tenant Owner:
None! Any valid user within the tenant is able to register applications.
Implementation Recommendations:
There is currently no way to backup/restore applications. If an application is accidentally or maliciously deleted, a new application will need to be registered. The application’s code will need to be updated with a new ID and secret. Documenting the options selected when registering the application will ensure the application can be re-registered quickly and without guessing values such as the callback URL.

There is currently no way to assign ownership of orphaned applications. If the owner’s account is terminated, no one can manage the application. The application continues to function, so it may be some time before anyone realizes the application is orphaned. For some period of time after the account is disabled, it may remain in the directory — which means a directory administrator could re-enable the account and set the password to a known value. Someone could then log into the Microsoft App Registration Portal under that ID and add new owners. Even if the ID has been deleted from the directory, it exists as a tombstone and can be restored for some period of time. Eventually, though, the account ceases to exist — at which time the only option would be to register a new app under someone else’s ID and change the code to use the new ID and secret. Ensure multiple individuals are listed as the application owner helps avoid orphaned applications.

Edit the application and click the “Add Owner” button.

You can enter the person’s logon ID or their name in “last, first” format. You can enter their first name – with a unique first name, that may work. Enter “Robert” and you’re in for a lot of scrolling! Once you find the person, click “Add” to set them up as an owner of the application. Click “Save” at the bottom of the page to commit this change.

I have submitted a feature request to Microsoft both for reassigning orphaned applications within your tenant and for a mechanism to restore deleted applications — apparently their feature requests have a voting process, so it would be helpful if people would up-vote my feature request.

Ongoing Maintenance:
There is little ongoing maintenance – once the application is registered, it’s done.

Updating The Secret:

You can change the application secret via the web portal – this would be a good step to take when an individual has left the team, and can be done as a proactive security step as a routine. Within the application, select “Generate New Password” and create a new secret. Update your code with the new secret, verify it works (roll-back is to restore the old secret to the config – it’s still in the web portal and works). Once the application is verified to work with the new secret, click “Delete” next to the old one. Both the create time and first three characters of the secret are displayed on the site to ensure the proper one is removed.

Maintaining Application Owners:

Any application owner can remove other owners – were I to move to a different team, the owners I delegated could revoke my access. Just click the “X” to the far right of the owner you wish to remove.

 

 

Building A Jenkins Sandbox

You can use a pre-built docker container (the “long term support” iteration is published as jenkins/jenkins:lts) or perform a local installation from https://jenkins.io/download/, add a package repo to your package manager config (http://pkg.jenkins-ci.org/redhat-stable/jenkins.repo for RedHat-based systems), or build it from the source repo. In this sandbox example, I will be using a Docker container.

Map the /var/Jenkins_home value to something. This allows you to store user-specific data on your local drive, not within the Docker image. In my case, c: is shared in Docker and I’m using c:\docker\jenkins\jenkins_home to store the data.

I have a java cacerts file mounted to the container as well – my CA chain has been imported into this file, and the default password, changeit, is used. This will allow Java to trust internally signed certificates. The keystore password appears as part of the process (i.e. anyone who can run commands like “ps aux” or “ps -efww” will see this value, so while security best practices dictate the default password should be changed … don’t change it to something like your root password!).

We can now start the Docker container:

docker run -p 8080:8080 -p 50000:50000 -v /c/docker/jenkins/jenkins_home:/var/jenkins_home -v /c/docker/jenkins/cacerts:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/security/cacerts jenkins/jenkins:lts

Once the container is running, you can visit the management web site (http://localhost:8080) and install the modules you want – or just take the defaults (you’ll end up with ‘stuff’ you don’t need … I don’t use subversion, for instance, and don’t really need a plugin for it). For a sandbox, I accept the defaults and then use Jenkins => Manage Jenkins => Manage Plug-ins to remove obviously unnecessary ones. And add any that may be needed (e.g. if you are using Visual Studio solution files, add in the MSBuild plugin).

 

Configuring Authentication (LDAP)

First install the appropriate plug-in – referrals cause authentication problems when using AD as the LDAP authentication source, if you are using AD for authentication … use the Active Directory plugin).

Manage Jenkins => Configure Global Security. Under access control, select the radio button for “LDAP” or “Active Directory”. Configuration is implementation specific.

AD:

Click the button to expand the advanced configuration. You should not need to specify a domain controller if service records for the domain are present in DNS. The “Site” should be “UserAuth”. For the Bind DN, you can use your userid (user@domain.ccTLD or domain\uid format) with your password. Or you can create a dedicated service account – for a “real world” implementation, you would want a dedicated service account (using *your* account means you’ll need to update your Jenkins config whenever you change your password … and when you forget this update, auth fails).

A note about the group membership lookup strategy:

For some reason, Jenkins assumes recursive group memberships will be used (e.g. there is a “App XYZ DevOps Team” that is placed into the “Jenkins Users” group, and “Jenkins Users” is assigned authorizations within the system). Bit of a shame that “none” isn’t an option for cases where there isn’t hierarchical group membership being built out.

There are three lookup strategies available: recursive group queries, LDAP_MATCHING_RULE_IN_CHAIN, and Toke-Groups user attribute. There have been bugs in the “Automatic” strategy that caused timeout failures. Additionally, the group list returned by the three strategies is not identical … so it is possible to have inconsistent authorization results as different strategies are used. To ensure consistent behaviour, I select a specific strategy.

Token-Groups: If you are not using Distribution groups within Jenkins to assign authorization (and you probably shouldn’t since it’s a distribution group, not a security group), you can select the Token-groups user attribute to handle recursive group membership. Token-groups won’t work if you are using distribution groups within Jenkins, though, as only security groups show up in the token-groups attribute.

LDAP_MATCHING_RULE_IN_CHAIN: OID 1.2.840.113556.1.4.1941, LDAP_MATCHING_GROUP_IN_CHAIN is an extended matching operator (something Microsoft added back in Windows 2003 R2) that can be used in LDAP filters:

(member:1.2.840.113556.1.4.1941:=cn=Bob,ou=ResourceUsers,dc=domain,dc=ccTLD)

This operator has known issues with high fan-outs and can cause hangs while data is retrieved. It is, however, a more efficient way of handling recursive group memberships. If your Jenkins groups contain only users, you will not encounter the known issue. If you are using nested groups, my personal recommendation would be to test each option and time logon activities … but if you do not wish to perform a test, this is a good starting option.

Recursive Group Queries: Jenkins issues a new LDAP query for each group – a lot of queries, but straight-forward queries. This is my last choice – i.e. if everything else hangs and causes poor user experience, try this selection.

For Active Directory domains that experience slow authentication through the AD plug-in regardless of the selected recursion scheme, I’ve set up the LDAP plug-in (it does not handle recursive group memberships) but it’s not a straight-forward configuration.

LDAP:

Click the button to expand the advanced server configuration. Enter the LDAP directory connection details. I usually start with clear text LDAP. Once the clear text connection tests successfully, the certificate trust can be established.

You can add a group search filter, but this is not required. If you request your group names start with a specific string, e.g. my ITSS CSG organization’s Jenkins server might use groups that start with ITSS-CSG-Jenkins, you can add a cn filter here to restrict the number of groups your implementation looks through to determine authorization. My filter, for example, is cn=ITSS-CSG-Jenkins*

Once everything is working with clear text, load the Root and Web CA public keys into your Java instance’s cacerts file (if you have more than once instance of Java and don’t know which one is being used … either figure out which one is actually being used or repeat the keytool commands for each cacerts file on your machine).

In the Docker container, the file is /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/security/cacerts and I’ve mapped in from a locally maintained cacerts file that already contains our public keys for our CA chain.

Before saving your changes, make sure you TEST the connection.

Under Authorization, you can add any of your AD/LDAP groups and assign them rights (make sure your local back door account has full rights too!).

Finally, we want to set up an SSL web site. Request a certificate for your server’s hostname (make sure to include a SAN if you don’t want Chrome to call your cert invalid). Shell into the Docker instance, cd into $JENKINS_HOME, and scp the certificate pfx file.

Use the keytool command to create a JKS file from this PFX file – make sure the certificate (PFX) and keystore (JKS) passwords are the same.

Now remove the container we created earlier. Don’t delete the local files, just “docker rm <containerid>” and create It again

docker run –name jenkins -p 8443:8443 -p 50000:50000 -v /c/docker/jenkins/jenkins_home:/var/jenkins_home -v /c/docker/jenkins/cacerts:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/security/cacerts jenkins/jenkins:lts –httpPort=-1 –httpsPort=8443 –httpsKeyStore=/var/jenkins_home/jenkins.cert.file.jks –httpsKeyStorePassword=keystorepassword

Voila, you can access your server using an HTTPS URL. If you review the Jenkins documentation, they prefer leaving the Jenkins web server on http and using something like a reverse proxy to perform SSL offloading. This is reasonable in a production environment, but for a sandbox … there’s no need to bring up a sandbox Apache server just to configure a reverse proxy. Since we’re connecting our instance to the real user passwords, sending passwords around in clear text isn’t a good idea either. If only you will be accessing your sandbox (i.e. http://localhost) then there’s no need to perform this additional step. The server traffic to the LDAP / AD directory for authentication is encrypted. This encryption is just for the client communication with the web server.

 

Using Jenkins – System Admin Stuff

There are several of “hidden” URLs that can be used to control the Jenkins service (LMGTFY, basically). When testing and playing with config parameters, restarting the service was a frequent operation, so I’ve included two service restart URLs here:

   https://jenkins.domain.ccTLD:8443/safeRestart ==> enter quiet mode, wait for running builds to complete, then restart

   https://jenkins.domain.ccTLD:8443/restart ==> Restart not so cleanly

Multiple discussions about creating a more fault tolerant authentication scheme within Jenkins exist on their ‘Issues’ site. Currently, you cannot use local accounts if the directory service is unavailable. Not a big deal if you’re on the company network and using one of our highly available directory solutions. Bit of a shocker, though, if your sandbox environment is on your laptop and you try to play with the instance when not on the company network. In production implementations, this would be a DR consideration (dependency on the directory being recovered). In a cloud-hosted implementation, this creates a dependency on network connectivity into the company.

As an emergency solution, you can disable security on your Jenkins installation. I’d also get some sort of firewall rule (OS-based or hardware firewall) to restrict console access to a trusted terminal server or workstation. To disable security, stop Jenkins. Edit the config.xml file in $JENKINS_HOME, and ifnd the <useSecurity> section. Change ‘true’ to ‘false’ and start Jenkins. You’ll be able to access the console without credentials.

Updating Jenkins Image

General practice for updating an application is not to update a container. Instead, download an updated image and recreate the container with the new image. I store the container initialization command along with the folder to which image directories are mapped. My file system has /path/to/docker/storage/AppName that contains a text file with the initialization command and folder(s) that are mapped into the container. This avoids having to find the proper initialization parameters when I upgrade the container.

To update the container, pull a new image, stop the container, remove the container, and create it again. That is:

docker stop jenkins
docker pull jenkins/jenkins:lts
docker rm jenkins
<whatever you used to create the container>

Agile Methodology Is Not Anarchy

For the past several years, my employer has been moving toward an Agile development methodology. There are some challenges when mapping this methodology into operations because it’s not the same thing; but, surprisingly, those are not where I have experienced challenges. The biggest challenge during this transition is some of my coworkers seem to think the methodology is that there are no rules.

A friend of mine, a fairly eccentric history professor, used to say that a little knowledge is a dangerous thing but you’ve got to emphasize LITTLE. And it seems like we’re encountering a situation where Phil’s emphasis holds true: the only thing garnered from from Agile training is that the documentation and process from waterfall projects are no more. But breaking away from the large-scale view of a P.R.O.J.E.C.T. for Agile development is a bit like breaking a monolithic application out into microservices — it still needs to do all of the same ‘stuff’, it just does it differently. And there are still policies and procedures — even a microservice team is going to have a coding standard, a process for handling merges, a way of scheduling time off, and some basic idea of what their application needs to accomplish. Sure, the app’s design will change incrementally over time. But it’s not an emergent property like chaos/complexity theory.

Maybe the “what Agile means to me” mentality comes from failing to clearly map a development methodology into an operations framework. Maybe it’s just a good excuse to avoid components of work that they do not enjoy. To avoid “agile operations” becoming “no boring planning stuff!!!”, I’ve outlined ways in which the Scrum methodologies the company wishes to adopt can be used to streamline operations. It helps that our group is reorganizing into an operational/support group and an architecture/design group — I see a lot of places within the operations team where Scrum approaches make sense.

Backlog — prioritizing the ticket queue like a backlog and having support staff constantly pulling from the top of the list — not only is this an awesome way to avoid the guy who scans the queue for the easy jobs, but it ensures the most important problems are being resolved first. A universal set of stakeholders does not exist for the ticket queue — someone whose ticket is ranked fifteenth on the list may disagree, and they are welcome to add details explaining why the issue is more impacting that it seems on its face. But 90% or more of our tickets are “Sev3” — which basically means both “we want it done ASAP” and “it isn’t a wide-spread high impact outage”. Realistically, dozens of tickets do not have the exact same time constraint and impact. There is extra work for management in converting a ticket bucket into an ordered backlog, but the payoff is that tickets are resolved in an order that correlates to the importance of the issue. In addition to the ticket queue, routine maintenance tasks will be included in the backlog. And prioritized accordingly.

Very short sprints — while developers moving from Waterfall to Agile might start from a month (or two) long sprint and trim weeks as they evolve into the process, operations starts from the other end of the spectrum. Our norm is to grab a ticket, sort it, then look at the queue and grab another one. We are planning for hours, maybe a day or two. This means we might establish application access on Tuesday that isn’t needed until next Monday. Establish a sprint that lasts a week, and use the backlog to get tickets that have lower priority (either because the impact is lower or because resolution is not needed for a week) included in the sprint. Service interruptions, SEV1 and SEV2 tickets, will occur and should be assumed in the sprint planning (i.e. either take enough work that you think it will just get done with no service interruption tickets and accept that some tickets from the sprint will be incomplete or leave some space for service interruption tickets and have staff pull “bonus” tickets from the top of the backlog if they have no work toward the end of the sprint).

Estimation — going through the tickets and classifying each incident as a quick little task, something that will take a few hours, or a significant undertaking facilitates in sprint planning. It’s difficult to know how many tickets I can reasonably expect to include in a sprint if I cannot differentiate between a three minute config change and a three day application rollout.

Multi-tasking — Implementation, support, and ticket resolution tasks are no longer a big bucket of work that individuals attempt to multi-task to complete. There are distinct tasks that are completed in series. Some tickets require information from the user; put the ticket on hold until a response is received and move on to the next unit of work.

Velocity — historic data based on time estimates cannot be generated, but simple number of tickets per week pre- and post- can certainly be compared. And going forward, ticket counts can be weighted by estimation values.

Stand-ups are a bit of a mental sticking point for me. I can conceive the value of spending a few minutes reviewing what you’ve done, what you plan on doing, and ensuring there is a ready forum to discuss any sticking points (maybe someone else has encountered a similar situation and can offer assistance). Stand-ups could include a quick discussion of any priority shifts (escalations, service interruptions) too. *But* my experience with stand-ups has been the attendance test variety — stand-ups that were used to hurt individuals who didn’t make it to the office by 08:00. Or those who weren’t around at 16:50. I don’t think it’s reasonable to ask someone who got into an issue and worked until 7P to show up at 8A the next day. I also don’t think it is reasonable to expect someone who came in at 6A to continue working until 5P. Were a stand-up scheduled in the middle of the day, I might feel differently about them.

New Process Police

As an operational support group, we did not have a software development methodology. Doesn’t mean we didn’t develop software — one of the great things within operational support is the ability to automate day-to-day tasks to reduce workload. Why have someone check for application patches when a process can watch an RSS feed or file repository and notify us when there’s an update. Why have someone clickity-click provisioning users into groups when the user can make a web request, the group owner can approve the request, and an automated process can add the user into the group? The end result of our automation programming is, well, quite a bit of software.

And with a small number of people, informal application development worked. Wasn’t ideal, but it worked. If you want to write in Java while I use C# … not ideal, but the alternative is that one of us needs to learn a new programming language. Problem is the next guy we hired uses VBS, the next guy uses PowerShell … and I’ll use perl for simpler processes. Then someone starts tweaking my code and buggers it up … and we’ve got to figure out what happened and roll back based on some tape backup.

To get our internal software development processes organized, I developed a process. And ran a training session so everyone was familiar with both the process and the tools. Some of us have used the process well — don’t edit production code, clone the repo locally, make a branch for your edits, test it and have another group member sign off on the changes, merge your branch back into master, test more, then pull the code into production. The majority, it seems, have not followed the process at all. Changes are made to the code running in production, not incorporated into the Git repo. Six months after the new development process went in place, half of our code has improperly made changes!

To an extent, I consider this a management problem … if the department doesn’t want software development to be a free-for-all, then the department managers need to ensure their staff follows the process. If the department wants everyone to do their own thing — then get rid of the process and declare our methodology as “do whatever you want”. The challenge for managers, though, is that they don’t know that someone has edited code in production and failed to commit their changes into the repo. If only there were some way to watch for improperly edited code and alert us promptly.

Other scripts I’ve found to perform a similar function attempt to parse ‘git status’ to identify all sorts of issues — but that doesn’t address the specific problem that I’ve got. To facilitate identifying offenders, I wrote a quick Python script that searches a directory tree for git repositories and alerts us when changes have not been staged for commit. If you’ve staged the changes for commit, that won’t be identified. But the particular problem we encounter frequently … there are alerts for that.