Category: System Administration

PowerShell: Mass Active Directory Password Changes

We have a bunch of accounts that function as extra mailboxes — all conveniently housed in on OU. The following PowerShell command sets the password for all of the accounts in one go. Not terribly useful for “real world” use … but useful for testing (and probably something I’ll end up using again)

$OUpath = ‘ou=Mail Aliases,dc=example,dc=com’
$strNewPassword = “What3v3rYu0W@nt1tT0B3”

Get-ADUser -Filter * -SearchBase $OUpath | Set-ADAccountPassword -Reset -NewPassword (ConvertTo-SecureString -AsPlainText $strNewPassword -Force)

ZFS Compression Ratio

We’ve got several PostgreSQL servers using ZFS file system for the database, and I needed to know how compressed the data is. Fortunately, there appears to be a zfs command that does exactly that: report the compression ratio for a zfs file system. Use zfs get compressratio /path/to/mount

Web Proxy Auto Discovery (WPAD) DNS Failure

I wanted to set up automatic proxy discovery on our home network — but it just didn’t work. The website is there, it looks fine … but it doesn’t work. Turns out Microsoft introduced some security idea in Windows 2008 that prevents Windows DNS servers from serving specific names. They “banned” Web Proxy Auto Discovery (WPAD) and Intra-site Automatic Tunnel Addressing Protocol (ISATAP). Even if you’ve got a valid wpad.example.com host recorded in your domain, Windows DNS server says “Nope, no such thing!”. I guess I can appreciate the logic — some malicious actor can hijack all of your connections by tunnelling or proxying your traffic. But … doesn’t the fact I bothered to manually create a hostname kind of clue you into the fact I am trying to do this?!?

I gave up and added the proxy config to my group policy — a few computers, then, needed to be manually configured. It worked. Looking in the event log for a completely different problem, I saw the following entry:

Event ID 6268

The global query block list is a feature that prevents attacks on your  network by blocking DNS queries for specific host names. This feature has caused the DNS server to fail a query with error code NAME ERROR for wpad.example.com. even though data for this DNS name exists in the DNS database. Other queries in all locally authoritative zones for other names
that begin with labels in the block list will also fail, but no event will be logged when further queries are blocked until the DNS server service on this computer is restarted. See product documentation for information about this feature and instructions on how to configure it.

The oddest bit is that this appears to be a substring ‘starts with’ query — like wpadlet or wpadding would also fail? A quick search produced documentation on this Global Query Blocklist … and two quick ways to resolve the issue.

(1) Change the block list to contain only the services you don’t want to use. I don’t use ISATAP, so blocking isatap* hostnames isn’t problematic:

dnscmd /config /globalqueryblocklist isatap

View the current blocklist with:

dnscmd /info /globalqueryblocklist

– Or –

(2) Disable the block list — more risk, but it avoids having to figure this all out again in a few years when a hostname starting with isatap doesn’t work for no reason!

dnscmd /config /enableglobalqueryblocklist 0

 

DIFF’ing JSON

While a locally processed web tool like https://github.com/zgrossbart/jdd can be used to identify differences between two JSON files, regular diff can be used from the command line for simple comparisons. Using jq to sort JSON keys, diff will highlight (pipe bars between the two columns, in this example) where differences appear between two JSON files. Since they keys are sorted, content order doesn’t matter much — it’s possible you’d have a list element 1,2,3 in one and 2,1,3 in another, which wouldn’t be sorted.

[lisa@fedorahost ~]# diff -y <(jq --sort-keys . 1.json) <(jq --sort-keys . 2.json )
{                                                               {
  "glossary": {                                                   "glossary": {
    "GlossDiv": {                                                   "GlossDiv": {
      "GlossList": {                                                  "GlossList": {
        "GlossEntry": {                                                 "GlossEntry": {
          "Abbrev": "ISO 8879:1986",                                      "Abbrev": "ISO 8879:1986",
          "Acronym": "SGML",                                  |           "Acronym": "XGML",
          "GlossDef": {                                                   "GlossDef": {
            "GlossSeeAlso": [                                               "GlossSeeAlso": [
              "GML",                                                          "GML",
              "XML"                                                           "XML"
            ],                                                              ],
            "para": "A meta-markup language, used to create m               "para": "A meta-markup language, used to create m
          },                                                              },
          "GlossSee": "markup",                                           "GlossSee": "markup",
          "GlossTerm": "Standard Generalized Markup Language"             "GlossTerm": "Standard Generalized Markup Language"
          "ID": "SGML",                                                   "ID": "SGML",
          "SortAs": "SGML"                                    |           "SortAs": "XGML"
        }                                                               }
      },                                                              },
      "title": "S"                                                    "title": "S"
    },                                                              },
    "title": "example glossary"                                     "title": "example glossary"
  }                                                               }
}                                                               }

Tableau: Upgrading from 2022.3.x to 2023.3.0

A.K.A. I upgraded and now my site has no content?!? Attempting to test the upgrade to 2023.3.0 in our development environment, the site was absolutely empty after the upgrade completed. No errors, nothing indicating something went wrong. Just nothing in the web page where I would expect to see data sources, workbooks, etc. The database still had a lot of ‘stuff’, the disk still had hundreds of gigs of ‘stuff’. But nothing showed up. I have experienced this problem starting with 2022.3.5 or 2022.3.11 and upgrading to 2023.3.0. I could upgrade to 2023.1.x and still have site content.

I wasn’t doing anything peculiar during the upgrade:

  • Run TableauServerTabcmd-64bit-2023-3-0.exe to upgrade the CLI
  • Run TableauServer-64bit-2023-3-0.exe to upgrade the Tableau binaries
  • Once installation completes, run open a ​new​ command prompt with ​Run as Administrator and launch “.\Tableau\Tableau Server\packages\scripts.20233.23.1017.0948\upgrade-tsm.cmd” –username username

The upgrade-tsm batch upgrades all of the components and database content. At this point, the server will be stopped. Start it. Verify everything looks OK – site is online, SSL is right, I can log in. Check out the site data … it’s not there!

Reportedly this is a known bug that only impacts systems that have been restored from backup. Since all of my servers were moved from Windows 2012 to Windows 2019 by backing up and restoring the Tableau sites … that’d be all of ’em! Fortunately, it is easy enough to make the data visible again. Run tsm maintenance reindex-search to recreate the search index. Refresh the user site, and there will be workbooks, data, jobs, and all sorts of things.

If reindexing does not sort the problem, tsm maintenance reset-searchserver should do it. The search reindex sorted me, though.

Linux – High Load with CIFS Mounts using Kernel 6.5.5

We recently updated our Fedora servers from 36 and 37 to 38. Since the upgrade, we have observed servers with very high load averages – 8+ on a 4-cpu server – but the server didn’t seem unreasonably slow. On the Unix servers I first used, Irix and Solaris, load average counts threads in a Runnable state. Linux, however, includes both Runnable and Uninterruptible states in the load average. This means processes waiting – on network calls using mkdir to a mounted remote server, local disk I/O – are included in the load average. As such, a high load average on Linux may indicate CPU resource contention but it may also indicate I/O contention elsewhere.

But there’s a third possibility – code that opts for the simplicity of the uninterrupted sleep without needing to be uninterruptible for a process. In our upgrade, CIFS mounts have a laundromat that I assume cleans up cache – I see four cifsd-cfid-laundromat threads in an uninterruptible sleep state – which means my load average, when the system is doing absolutely nothing, would be 4.

2023-10-03 11:11:12 [lisa@server01 ~/]# ps aux | grep " [RD]"
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1150 0.0 0.0 0 0 ? D Sep28 0:01 [cifsd-cfid-laundromat]
root 1151 0.0 0.0 0 0 ? D Sep28 0:01 [cifsd-cfid-laundromat]
root 1152 0.0 0.0 0 0 ? D Sep28 0:01 [cifsd-cfid-laundromat]
root 1153 0.0 0.0 0 0 ? D Sep28 0:01 [cifsd-cfid-laundromat]
root 556598 0.0 0.0 224668 3072 pts/11 R+ 11:11 0:00 ps aux

Looking around the Internet, I see quite a few bug reports regarding this situation … so it seems like a “ignore it and wait” problem – although the load average value is increased by these sleeping threads, it’s cosmetic. Which explains why the server didn’t seem to be running slowly even through the load average was so high.

https://lkml.org/lkml/2023/9/26/1144

Date: Tue, 26 Sep 2023 17:54:10 -0700
From: Paul Aurich 
Subject: Re: Possible bug report: kernel 6.5.0/6.5.1 high load when CIFS share is mounted (cifsd-cfid-laundromat in"D" state)

On 2023-09-19 13:23:44 -0500, Steve French wrote:
>On Tue, Sep 19, 2023 at 1:07 PM Tom Talpey <tom@talpey.com> wrote:
>> These changes are good, but I'm skeptical they will reduce the load
>> when the laundromat thread is actually running. All these do is avoid
>> creating it when not necessary, right?
>
>It does create half as many laundromat threads (we don't need
>laundromat on connection to IPC$) even for the Windows server target
>example, but helps more for cases where server doesn't support
>directory leases.

Perhaps the laundromat thread should be using msleep_interruptible()?

Using an interruptible sleep appears to prevent the thread from contributing
to the load average, and has the happy side-effect of removing the up-to-1s delay
when tearing down the tcon (since a7c01fa93ae, kthread_stop() will return
early triggered by kthread_stop).

~Paul

 

Redis Continually Receiving SIGTERM

I brought up a redis cluster — three servers which all logged basically nothing apart from the fact they were about to shut down. The service status showed as “Activating” — never started — and the server wasn’t doing anything useful.

The redis log reads:

2920940:signal-handler (1694019281) Received SIGTERM scheduling shutdown...
2921151:signal-handler (1694019374) Received SIGTERM scheduling shutdown...
2921518:signal-handler (1694019468) Received SIGTERM scheduling shutdown...
2921726:signal-handler (1694019561) Received SIGTERM scheduling shutdown...
2922133:signal-handler (1694019655) Received SIGTERM scheduling shutdown...
2922410:signal-handler (1694019748) Received SIGTERM scheduling shutdown...
2923173:signal-handler (1694019842) Received SIGTERM scheduling shutdown...
2923537:signal-handler (1694019935) Received SIGTERM scheduling shutdown...
2923747:signal-handler (1694020029) Received SIGTERM scheduling shutdown...
2924110:signal-handler (1694020122) Received SIGTERM scheduling shutdown...
2924319:signal-handler (1694020216) Received SIGTERM scheduling shutdown...
2924687:signal-handler (1694020309) Received SIGTERM scheduling shutdown...
2924900:signal-handler (1694020403) Received SIGTERM scheduling shutdown...
2925266:signal-handler (1694020496) Received SIGTERM scheduling shutdown...
2925467:signal-handler (1694020590) Received SIGTERM scheduling shutdown...

Turns out this is a hazard of copy/pasting a unit file from an older server — evidently redis cannot use a service type of “Forking” with systemd. To resolve the issue, edit /etc/systemd/system/redis.service and updating the type to “simple”. Use systemctl daemon-reload and then systemctl restart redis to launch redis with the new config … voila, I’ve got a cluster of three servers that are started and communicating.

Tableau: Workbooks and Views Created or Modified By a Specific Individual

I had a manager looking to locate a ‘something in Tableau’ that was created by a specific individual — in this case, it was a terminated employee so “just ask the person” was not a viable solution. I put together a query to list all workbooks owned by or modified by an individual:

SELECT w.id, w.name, w.description, w.owner_id, w.modified_by_user_id, owner_system_users.email AS owner_email, modified_system_users.email AS modifier_email
     FROM  public.workbooks AS w
      LEFT OUTER JOIN public.users AS owner_users on w.owner_id = owner_users.id
      LEFT OUTER JOIN public.users AS modified_users ON w.owner_id = modified_users.id
      LEFT OUTER JOIN public.system_users AS owner_system_users ON owner_system_users.id = owner_users.system_user_id
		LEFT OUTER JOIN public.system_users AS modified_system_users ON modified_system_users.id = modified_users.system_user_id
      WHERE owner_system_users.name = 'UserLogonID';
--      WHERE owner_system_users.email LIKE '%Smith%' OR modified_system_users.email = '%Smith%'
		;

As well as a query to identify all views owned by an individual:

SELECT views.*, owner_system_users.email AS owner_email
     FROM  public.views 
      LEFT OUTER JOIN public.users AS owner_users on views.owner_id = owner_users.id
      LEFT OUTER JOIN public.system_users AS owner_system_users ON owner_system_users.id = owner_users.system_user_id

      WHERE owner_system_users.name = 'UserLogonID';
--      WHERE owner_system_users.email LIKE '%Smith%' OR modified_system_users.email = '%Smith%'
		;

The email address based search is most reasonable — our email addresses are algorithmically based on our names, so we always know what the address would have been. Many contractors, however, don’t have Office 365 licenses or mailboxes … so I have to fall back to finding their logon ID in those cases.

Redis High Availability — Options

The two primary approaches to high availability with Redis are Redis Sentinel and Redis Cluster. There are also third-party solutions, but the provided budget is zero dollars. That … limiting.

Sentinel is the official high availability solution provided by Redis. It monitors Redis instances, detecting failures, and automatically handling failover to a replica. It also provides monitoring/alerting to advise administrators when a problem has been detected.

Sentinel does not provide much in the way of scalability (it adds additional ‘read only’ copies, but there is a single master) but this architecture better ensures consistency (i.e. the same data is present on all nodes). It does, however, promote a read replica to master in the event the master fails, so high availability is achieved.

More than half of the Sentinels need to consider a master down to invoke failover (quorum) – so we would want at least three nodes. We experienced issues with two-node Microsoft quorum-based clustering when the two nodes were unable to communicate. Each node considered its partner to be ‘down’ and decided to be the server in charge. And having two servers in charge corrupts data. With three nodes, should they all become separated … they cannot reach a quorum of two servers agreeing on states.

Cluster automatically distributes data across multiple Redis nodes (called shards). Doing so allows more data to be processed in parallel. Redis Cluster also supports replication and automatic failover.

Since clustering provides both high availability and scaling, if the write load is a consideration, this may be a preferred option; but distributed data means inconsistent data values may be encountered. If data consistency is paramount, clustering may be undesirable. Additionally, not all Redis clients support communicating with a clustered environment. We would need to have our vendor confirm that the application could use a clustered solution.

The minimum recommended environment for production is larger – six servers. This constitutes three master nodes and three replica nodes.

Tableau – Data Source Connection Info and Workbooks

I think I finally have a query that links workbooks where data sources are used and the connection information from the data_connections table!

-- Query to find all data sources and where they are used
select system_users.email
, datasources.id, datasources.name, datasources.created_at, datasources.updated_at, datasources.db_class, datasources.db_name, datasources.site_id
, data_connections.server, data_connections.dbclass
, sites.name as SiteName, projects.name as ProjectName, workbooks.name as WorkbookName
from datasources
left outer join data_connections on data_connections.datasource_id = datasources.id
left outer join users on users.id = datasources.owner_id
left outer join system_users on users.system_user_id = system_users.id
left outer join sites on datasources.site_id = sites.id
left outer join projects on datasources.project_id = projects.id
left outer join workbooks on datasources.parent_workbook_id = workbooks.id
order by datasources.name
;