Category: System Administration

October 24, 2018

Open Password Filter (OPF) Detailed Overview

When we began allowing users to initiate password changes in Active Directory and feed those passwords into the identity management system (IDM), it was imperative that the passwords set in AD comply with the IDM password policy. Otherwise passwords were set in AD that were not set in the IDM system or other downstream managed directories. Microsoft does not have a password policy that allows the same level of control as the Oracle IDM (OIDM) policy, however password changes can be passed to DLL programs for farther evaluation (or, as in the case of the hook that forwards passwords to OIDM – the DLL can just return TRUE to accept the password but do something completely different with the password like send it along to an external system). Search for secmgmt “password filters” (https://msdn.microsoft.com/en-us/library/windows/desktop/ms721882(v=vs.85).aspx) for details from Microsoft.

LSA makes three different API calls to all of the DLLs listed in the NotificationPackages registry hive. First, InitializeChangeNotify(void) is called when LSA loads. The only reasonable answer to this call is “true” as it advises LSA that your filter is online and functional.

When a user attempts to change their password, LSA calls PasswordFilter(PUNICODE_STRING AccountName, PUNICODE_STRING FullName, PUNICODE_STRING Password, BOOLEAN SetOperation) — this is the mechanism we use to enforce a custom password policy. The response to a PasswordFilter call determines if the password is deemed acceptable.

Finally, when a password change is committed to the directory, LSA calls PasswordChangeNotify(PUNICODE_STRING UserName, ULONG RelativeId, PUNICODE_STRING NewPassword) — this is the call that should be used to synchronize passwords into remote systems (as an example, the Oracle DLL that is used to send AD-initiated password changes into OIDM). In our password filter, the function just returns ‘0’ because we don’t need to do anything with the password once it has been committed.

Our password filter is based on the Open Password Filter project at (https://github.com/jephthai/OpenPasswordFilter). The communication between the DLL and the service is changed to use localhost (127.0.0.1). The DLL accepts the password on failure (this is a point of discussion for each implementation to ensure you get the behaviour you want). In the event of a service failure, non-compliant passwords are accepted by Active Directory. It is thus possible for workstation-initiated password changes to get rejected by the IDM system. The user would then have one password in Active Directory and their old password will remain in all of the other connected systems (additionally, their IDM password expiry date would not advance, so they’d continue to receive notification of their pending password expiry).

While the DLL has access to the user ID and password, only the password is passed to the service. This means a potential compromise of the service (obtaining a memory dump, for example) will yield only passwords. If the password change occurred at an off time and there’s only one password changed in that timeframe, it may be possible to correlate the password to a user ID (although if someone is able to stack trace or grab memory dumps from our domain controller … we’ve got bigger problems!

The service which performs the filtering has been modified to search the proposed password for any word contained in a text file as a substring. If the case insensitive banned string appears anywhere within the proposed password, the password is rejected and the user gets an error indicating that the password does not meet the password complexity requirements.

Other password requirements (character length, character composition, cannot contain UID, cannot contain given name or surname) are implemented through the normal Microsoft password complexity requirements. This service is purely analyzing the proposed password for case insensitive matches of any string within the dictionary file.

October 20, 2018

Did you know … that you can recover a deleted Teams channel?

Oh no, I didn’t mean to delete THAT!!! Sure, it asked me five times if I was sure that I was sure … and maybe that’s part of the problem – I see so many “are you sure” messages that I click OK a little too easily. Well, they say to err is human. And I must be exceptionally human ? Sometimes recovering my data requires a sheepish call to the Help Desk. But did you know you can recover deleted Teams channels?

I used the hamburger menu next to a channel to delete it. Oops!

I even read the first few words of the “are you sure” dialogue before clicking the “Delete” button. Except … oops! I didn’t want to delete that channel!

You can recover the channel immediately, all by yourself. Even if you’re not a team owner. From the hamburger menu next to the team, select “Manage team”.

On the Team management page, select “Channels”. You can expand “Deleted” and see the channel you just removed. Click “Restore”

Yet another prompt … click “Restore” again.

Voila, the channel is back. Along with all its content. Whew!

Just because channel recovery is self-service doesn’t mean no one will know that you’ve mis-clicked. The channel deletion event which appears in the “General” channel … well, it’s still there. You can up-vote a request for enhancement on Microsoft’s site … but it’s not like no one will every know about your mistake.

October 18, 2018

Did you know … You can control what members of a Microsoft Team group can do within the team?

When you create a new Team, members can create new channels, delete channels, add apps … they can do a lot of things. Did you know much of that is configurable? You can create a Team where individuals receive but cannot respond to posts. You can restrict your Team so only owners can remove channels.

From the hamburger menu next to your Team, select “Manage team”

On the Team management page, select the “Settings” tab.

Expand the “Member permissions” section. Now uncheck any permission you want to restrict to Team owners. There’s even a radio button near the bottom of this section so only Team owners can post to the “General” channel (if that’s the only channel, and members are prohibited from creating their own channels, you’ve got a broadcast-only Team space)

Scroll down and expand “Fun stuff” … you can prevent Gliphy content from being used in the Team (or change the filter used to determine which Gliphy content is appropriate), disable stickers, and disable memes.

October 17, 2018

Kernel Updates In GNOME

Since I usually do not install X11 ‘stuff’ on my Linux hosts — using the console interface — I do not have any experience installing kernel updates on “desktop” type systems. Evidently, the best practice is to drop out of the GUI into what I’d call init 3 then install the kernel updates. You can get random hangs and malfunctions when you attempt to update the kernel whilst in the graphic console.

October 12, 2018

Very specific time from journalctl log

Run:

journalctl -o short-iso-precise

Results:

2018-09-19T23:30:11.651010-0400 linux05.domain.ccTLD process[PID]: Log message here.

October 12, 2018

Recovering A Seriously Screwed Up Fedora System

The graphical interface on a Fedora 28 laptop was unavailable — buggered up video device/driver. Change to what used to be called run level 3, and we could not log in! We know the root password, but it would not take it. Single user is password protected too — and we were unable to log in there.

Normal recovery process:

Get to the grub menu, highlight the kernel you want to boot, and hit ‘e’ to edit it. Scroll down. On line that starts with linux16, change “rhgb quiet” to say “rd.break enforcing=0”
ctrl-x to boot

Once you get a shell:
mount -o remount,rw /sysroot
chroot /sysroot

Voila, you’ve got access to your files. Use vi to edit whatever has the box seriously screwed up (passwd if your problem is that you don’t know the root password) and you’re set. We reset the root password just in case. Aaaand … we still couldn’t log in on init 1 or init 3! And at this point I was feeling stubborn about getting logged into the box.

Now you can tweak up the system so it is not using sulogin when booting into single user mode but that isn’t a good way to install network-sourced packages. For some reason, we had to disable selinux before we could log into anything other than the graphical target. I’m sure there is a policy we could have tweaked, but it was far easier to disable the thing, boot into the multi-user target, sort the video driver, and then boot into the graphical target.

October 10, 2018

Debugging An Active Directory Custom Password Filter

A few years ago, I implemented a custom password filter in Active Directory. At some point, it began accepting passwords that should be rejected. The updated code is available at https://github.com/ljr55555/OpenPasswordFilter and the following is the approach I used to isolate the cause of the failure.

Technique #1 — Netcap on the loopback There are utilities that allow you to capture network traffic across the loopback interface. This is helpful in isolating problems in the service binary or inter-process communication. I used RawCap because it’s free for commercial use. There are other approaches too – or consult the search engine of your choice.

The capture file can be opened in Wireshark. The communication is done in clear text (which is why I bound the service to localhost), so you’ll see the password:

And response

To ensure process integrity, the full communication is for the client to send “test\n” then “PasswordToTest\n”, after which the server sends back either true or false.

Technique #2 — Debuggers Attaching a debugger to lsass.exe is not fun. Use a remote debugger — until you tell the debugger to proceed, the OS is pretty much useless. And if the OS is waiting on you to click something running locally, you are quite out of luck. A remote debugger allows you to use a functional operating system to tell the debugger to proceed, at which time the system being debugged returns to service.

Install the SDK debugging utilities on your domain controller and another box. Which SDK debugging tool? That’s going to depend on your OS. For Windows 10 and Windows Server 2012 R2, the Windows 10 SDK (Debugging Tools For Windows 10) work. https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/debugger-download-tools or Google it.

On the domain controller, find the PID of LSASS and write it down (472 in my example). Check the IP address of the domain controller (10.104.164.110 in my example).

From the domain controller, run:

dbgsrv.exe -t tcp:port=11235,password=s0m3passw0rd

Where port=11235 can be any un-used port and password=s0m3passw0rd can be whatever string you want … you’ve just got to use the same values when you connect from the client. Hit enter and you’ve got a debugging server. It won’t look like it did anything, but you’ll see the port bound on netstat

And the binary running in taskman

From the other box, run the following command (substituting the correct server IP, port, password, and process ID):

windbg.exe -y “srv:c:\symbols_pub*http://msdl.microsoft.com/downloads/symbols” -premote tcp:server=10.104.164.110,port=11235,password=s0m3passw0rd -p 472

This attaches your WinDBG to the debugging server & includes an internet-hosted symbol path. Don’t worry when it says “Debugee not connected” at the bottom – that just means the connection has not completed. If it didn’t connect at all (firewall, bad port number, bad password), you’d get a pop-up error indicating that the initial connection failed.

Wait for it … this may take a long time to load up, during which time your DC is vegged. But eventually, you’ll be connected. Don’t try to use the DC yet – it will just seem hung, and trying to get things working just make it worse. Once the debugger is connected, send ‘g’ to the debugger to commence – and now the DC is working again.

Down at the bottom of the command window, there’s a status (0:035> below) followed by a field where you enter commands. Type the letter g in there & hit enter.

The status will then say “Debuggee is running …” and you’re server is again responsive to user requests.

When you reach a failing test, pause the debugger with a break command (Debug=>Break, or Ctrl-Break) which will veg out the DC again. You can view the call stack, memory, etc.

To search the address space for an ASCII string use:

!for_each_module s -[1]a ${@#Base} L?${@#Size}  "bobbob"

Where “bobbob” is the password I had tested.

Alternately, run the “psychodebug” build where LARGEADDRESSAWARE is set to NO and you can search just the low 2-gig memory space (32-bit process memory space):

s -a 0 L?80000000 "bobbob"

* The true/false server response is an ASCII string, not a Boolean. *

Once you have found what you are looking for, “go” the debugger (F5, Debug=>Go, or ‘g’) to restore the server to an operational state. Break again when you want to look at something.

To disconnect, break and send “qd” to the debugger (quit and detach). If you do not detach with qd, the process being debugged terminates. Having lsass.exe terminate really freaks out the server, and it will go into an auto-recovery “I’m going to reboot in one minute” mode. It’ll come back, but detaching without terminating the process is a lot nicer.

Technique #3 – Compile a verbose version. I added a number of event log writes within the DLL (obviously, it’s not a good idea in production to log out candidate passwords in clear text!). While using the debugger will get you there eventually, half an hour worth of searching for each event (the timing is tricky so the failed event is still in memory when you break the debugger) … having each iteration write what it was doing to the event log was FAAAAAR simpler.

And since I’m running this on a dev DC where the passwords coming across are all generated from a load sim script … not exactly super-secret stuff hitting the event log.

Right now, I’ve got an incredibly verbose DLL on APP556 under d:\tempcsg\ljr\2\debugbuild\psychodebug\ … all of the commented out event log writes from https://github.com/ljr55555/OpenPasswordFilter aren’t commented out.

Stop the OpenPasswordFilter service, put the verbose DLL and executables in place, and reboot. Change some passwords, then look in the event viewer.

ERROR events are actual problems that would show up either way. INFORMATION events are extras. I haven’t bothered to learn how to properly register event sources in Windows yet 🙂 You can find the error content at the bottom of the “this isn’t registered” complaint:

You will see events for the following steps:

DLL starting CreateSocket

About to test password 123paetec123-Dictionary-1-2

Finished sendall function to test password123paetec123-Dictionary-1-2

Got t on test of paetec123-Dictionary-1-2

The final line will either say “Got t” for true or “Got f” for false.

Technique #4 – Running the code through the debugger. Whilst there’s no good way to get the “Notification Package” hook to run the DLL through the debugger, you can install Visual Studio on a dev domain controller and execute the service binary through the debugger. This allows you to set breakpoints and watch variable values as the program executes – which makes it a whole lot easier than using WinDBG to debug the production code.

Grab a copy of the source code – we’re going to be making some changes that should not be promoted to production, so I work on a temporary copy of the project and delete the copy once testing has completed.

Open the project in Visual Studio. Right-click OPFService in the “Solution Explorer” and select “Properties”

Change the build configuration to “Debug”

Un-check “Optimize code” – code optimization is good for production run, but it will wipe out variable values when you want to see them.

Set a breakpoint on execution – on the OPFDictionary.cs file, the loop checking to see if the proposed word is contained in the banned word list is a good breakpoint. The return statements are another good breakpoint as it pauses program execution right before a password test iteration has completed.

Build the solution (Build=>Build Solution). Stop the Windows OpenPasswordFilter service.

Launch the service binary through the debugger (Debug=>Start Debugging).

Because the program is being run interactively instead of through a service, you’ll get a command window that says “Press any key to stop the program”. Minimize this.

From a new command prompt, telnet to localhost on port 5995 (the telnet client is not installed by default, so you may need to use “Turn Windows features on or off” and enable the telnet client first).

Once the connection is established, use CTRL and ] to get into the telnet command prompt. Type set localecho … now you’ll be able to see what you are typing.

Hit enter again and you’ll return to the blank window that is your telnet client. Type test and hit enter. Then type a candidate password and hit enter.

Program execution will pause at the breakpoint you’ve set. Return to Visual Studio. Select Debug =>Window=>Locals to open a view of the variable values

View the locals at the breakpoint, then hit F5 if you want to continue.

If you’re set breakpoints on either of the return statements, program execution will also pause before the return … which gives you an opportunity to see which return is being used & compare the variable values again.

In this case, I submitted a password that was in the banned word list, so the program rightly evaluated line 56 to true and returns true.

October 9, 2018

Shell Script: Path To Script

We occasionally have to re-home our shell scripts, which means updating any static path values used within scripts. It’s quick enough to build a sed script to convert /old/server/path to /new/server/path, but it’s still extra work.

The dirname command works to provide a dynamic path value, provided you use the fully qualified path to run the script … but it fails spectacularly whens someone runs ./scriptFile.sh and you’re trying to use that path in, say, EXTRA_JAVA_OPTS. The “path” is just . — and Java doesn’t have any idea what to do with “-Xbootclasspath/a:./more/path/goes/here.jar”

Voila, realpath gives you the fully qualified file path for /new/server/path/scriptFile.sh, ./scriptFile.sh, or even bash scriptFile.sh … and the dirname of a realpath is the fully qualified path where scriptFile.sh resides:

#!/bin/bash
DIRNAME=`dirname $(realpath "$0")`
echo ${DIRNAME}

Hopefully next time we’ve got to re-home our batch jobs, it will be a simple scp & sed the old crontab content to use the new paths.

October 7, 2018

Switching run levels in systemd

Yeah, I know there aren’t actually run levels anymore. The systemd equivalent of running init 3 to boot into console mode is

systemctl isolate multi-user.target

And the equivalent of running init 5 to boot into GUI mode is

systemctl isolate graphical.target

This is a one-time thing, not a config change. If you want to permanently switch to console mode you’d use systemctl set-default multi-user.target

September 26, 2018

Building A Jenkins Sandbox

You can use a pre-built docker container (the “long term support” iteration is published as jenkins/jenkins:lts) or perform a local installation from https://jenkins.io/download/, add a package repo to your package manager config (http://pkg.jenkins-ci.org/redhat-stable/jenkins.repo for RedHat-based systems), or build it from the source repo. In this sandbox example, I will be using a Docker container.

Map the /var/Jenkins_home value to something. This allows you to store user-specific data on your local drive, not within the Docker image. In my case, c: is shared in Docker and I’m using c:\docker\jenkins\jenkins_home to store the data.

I have a java cacerts file mounted to the container as well – my CA chain has been imported into this file, and the default password, changeit, is used. This will allow Java to trust internally signed certificates. The keystore password appears as part of the process (i.e. anyone who can run commands like “ps aux” or “ps -efww” will see this value, so while security best practices dictate the default password should be changed … don’t change it to something like your root password!).

We can now start the Docker container:

docker run -p 8080:8080 -p 50000:50000 -v /c/docker/jenkins/jenkins_home:/var/jenkins_home -v /c/docker/jenkins/cacerts:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/security/cacerts jenkins/jenkins:lts

Once the container is running, you can visit the management web site (http://localhost:8080) and install the modules you want – or just take the defaults (you’ll end up with ‘stuff’ you don’t need … I don’t use subversion, for instance, and don’t really need a plugin for it). For a sandbox, I accept the defaults and then use Jenkins => Manage Jenkins => Manage Plug-ins to remove obviously unnecessary ones. And add any that may be needed (e.g. if you are using Visual Studio solution files, add in the MSBuild plugin).

Configuring Authentication (LDAP)

First install the appropriate plug-in – referrals cause authentication problems when using AD as the LDAP authentication source, if you are using AD for authentication … use the Active Directory plugin).

Manage Jenkins => Configure Global Security. Under access control, select the radio button for “LDAP” or “Active Directory”. Configuration is implementation specific.

AD:

Click the button to expand the advanced configuration. You should not need to specify a domain controller if service records for the domain are present in DNS. The “Site” should be “UserAuth”. For the Bind DN, you can use your userid (user@domain.ccTLD or domain\uid format) with your password. Or you can create a dedicated service account – for a “real world” implementation, you would want a dedicated service account (using *your* account means you’ll need to update your Jenkins config whenever you change your password … and when you forget this update, auth fails).

A note about the group membership lookup strategy:

For some reason, Jenkins assumes recursive group memberships will be used (e.g. there is a “App XYZ DevOps Team” that is placed into the “Jenkins Users” group, and “Jenkins Users” is assigned authorizations within the system). Bit of a shame that “none” isn’t an option for cases where there isn’t hierarchical group membership being built out.

There are three lookup strategies available: recursive group queries, LDAP_MATCHING_RULE_IN_CHAIN, and Toke-Groups user attribute. There have been bugs in the “Automatic” strategy that caused timeout failures. Additionally, the group list returned by the three strategies is not identical … so it is possible to have inconsistent authorization results as different strategies are used. To ensure consistent behaviour, I select a specific strategy.

Token-Groups: If you are not using Distribution groups within Jenkins to assign authorization (and you probably shouldn’t since it’s a distribution group, not a security group), you can select the Token-groups user attribute to handle recursive group membership. Token-groups won’t work if you are using distribution groups within Jenkins, though, as only security groups show up in the token-groups attribute.

LDAP_MATCHING_RULE_IN_CHAIN: OID 1.2.840.113556.1.4.1941, LDAP_MATCHING_GROUP_IN_CHAIN is an extended matching operator (something Microsoft added back in Windows 2003 R2) that can be used in LDAP filters:

(member:1.2.840.113556.1.4.1941:=cn=Bob,ou=ResourceUsers,dc=domain,dc=ccTLD)

This operator has known issues with high fan-outs and can cause hangs while data is retrieved. It is, however, a more efficient way of handling recursive group memberships. If your Jenkins groups contain only users, you will not encounter the known issue. If you are using nested groups, my personal recommendation would be to test each option and time logon activities … but if you do not wish to perform a test, this is a good starting option.

Recursive Group Queries: Jenkins issues a new LDAP query for each group – a lot of queries, but straight-forward queries. This is my last choice – i.e. if everything else hangs and causes poor user experience, try this selection.

For Active Directory domains that experience slow authentication through the AD plug-in regardless of the selected recursion scheme, I’ve set up the LDAP plug-in (it does not handle recursive group memberships) but it’s not a straight-forward configuration.

LDAP:

Click the button to expand the advanced server configuration. Enter the LDAP directory connection details. I usually start with clear text LDAP. Once the clear text connection tests successfully, the certificate trust can be established.

You can add a group search filter, but this is not required. If you request your group names start with a specific string, e.g. my ITSS CSG organization’s Jenkins server might use groups that start with ITSS-CSG-Jenkins, you can add a cn filter here to restrict the number of groups your implementation looks through to determine authorization. My filter, for example, is cn=ITSS-CSG-Jenkins*

Once everything is working with clear text, load the Root and Web CA public keys into your Java instance’s cacerts file (if you have more than once instance of Java and don’t know which one is being used … either figure out which one is actually being used or repeat the keytool commands for each cacerts file on your machine).

In the Docker container, the file is /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/security/cacerts and I’ve mapped in from a locally maintained cacerts file that already contains our public keys for our CA chain.

Before saving your changes, make sure you TEST the connection.

Under Authorization, you can add any of your AD/LDAP groups and assign them rights (make sure your local back door account has full rights too!).

Finally, we want to set up an SSL web site. Request a certificate for your server’s hostname (make sure to include a SAN if you don’t want Chrome to call your cert invalid). Shell into the Docker instance, cd into $JENKINS_HOME, and scp the certificate pfx file.

Use the keytool command to create a JKS file from this PFX file – make sure the certificate (PFX) and keystore (JKS) passwords are the same.

Now remove the container we created earlier. Don’t delete the local files, just “docker rm <containerid>” and create It again

docker run –name jenkins -p 8443:8443 -p 50000:50000 -v /c/docker/jenkins/jenkins_home:/var/jenkins_home -v /c/docker/jenkins/cacerts:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/security/cacerts jenkins/jenkins:lts –httpPort=-1 –httpsPort=8443 –httpsKeyStore=/var/jenkins_home/jenkins.cert.file.jks –httpsKeyStorePassword=keystorepassword

Voila, you can access your server using an HTTPS URL. If you review the Jenkins documentation, they prefer leaving the Jenkins web server on http and using something like a reverse proxy to perform SSL offloading. This is reasonable in a production environment, but for a sandbox … there’s no need to bring up a sandbox Apache server just to configure a reverse proxy. Since we’re connecting our instance to the real user passwords, sending passwords around in clear text isn’t a good idea either. If only you will be accessing your sandbox (i.e. http://localhost) then there’s no need to perform this additional step. The server traffic to the LDAP / AD directory for authentication is encrypted. This encryption is just for the client communication with the web server.

Using Jenkins – System Admin Stuff

There are several of “hidden” URLs that can be used to control the Jenkins service (LMGTFY, basically). When testing and playing with config parameters, restarting the service was a frequent operation, so I’ve included two service restart URLs here:

https://jenkins.domain.ccTLD:8443/safeRestart ==> enter quiet mode, wait for running builds to complete, then restart

https://jenkins.domain.ccTLD:8443/restart ==> Restart not so cleanly

Multiple discussions about creating a more fault tolerant authentication scheme within Jenkins exist on their ‘Issues’ site. Currently, you cannot use local accounts if the directory service is unavailable. Not a big deal if you’re on the company network and using one of our highly available directory solutions. Bit of a shocker, though, if your sandbox environment is on your laptop and you try to play with the instance when not on the company network. In production implementations, this would be a DR consideration (dependency on the directory being recovered). In a cloud-hosted implementation, this creates a dependency on network connectivity into the company.

As an emergency solution, you can disable security on your Jenkins installation. I’d also get some sort of firewall rule (OS-based or hardware firewall) to restrict console access to a trusted terminal server or workstation. To disable security, stop Jenkins. Edit the config.xml file in $JENKINS_HOME, and ifnd the <useSecurity> section. Change ‘true’ to ‘false’ and start Jenkins. You’ll be able to access the console without credentials.

Updating Jenkins Image

General practice for updating an application is not to update a container. Instead, download an updated image and recreate the container with the new image. I store the container initialization command along with the folder to which image directories are mapped. My file system has /path/to/docker/storage/AppName that contains a text file with the initialization command and folder(s) that are mapped into the container. This avoids having to find the proper initialization parameters when I upgrade the container.

To update the container, pull a new image, stop the container, remove the container, and create it again. That is:

docker stop jenkins
docker pull jenkins/jenkins:lts
docker rm jenkins
<whatever you used to create the container>