Category: System Administration

June 23, 2023

Java Heap Stats with JStat

While there are plenty of third-party utilities for looking at the java heap space, I just use jstat (in OpenJDK, this means installing java-<Version>-openjdk-devel

JStat will display the following columns:

--------------------------------------------------------------------------------
S0C: Survivor space 0 size in K
S1C: Survivor space 1 size in K

S0U: Survivor space 0 usage in K
S1U: Survivor space 1 usage in K

--------------------------------------------------------------------------------

EC: Eden space size in K
EU: Eden space usage in K

--------------------------------------------------------------------------------

OC: Old space size in K
OU: Old space usage in K

--------------------------------------------------------------------------------

MC: Meta space size in K
MU: Meta space usage in K

--------------------------------------------------------------------------------

CCSC: CodeCache size in K
CCSU: CodeCache usage in K

--------------------------------------------------------------------------------

YGC: Young generation garbage collection count
YGCT: Young generation garbage collection total time in seconds

FGC: Full garbage collection count
FGCT: Full garbage collection total time in seconds

CGC: Concurrent garbage collection count
CGCT: Concurrent garbage collection time in seconds
GCT: Total garbage collection time in seconds

--------------------------------------------------------------------------------

https://stackoverflow.com/questions/13660871/jvm-garbage-collection-in-young-generation/13661014#13661014 does a good job of explaining the nomenclature & how stuff gets moved around in the heap space

Sample output — this command is for java PID 19356 and will list 100 lines 2 seconds apart (2000 ms)

server01:bin # jstat -gc 19356 2000 100
S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT
68096.0 68096.0 0.0 64207.5 545344.0 319007.2 30775744.0 19221750.2 137452.0 124322.4 18860.0 15380.6 324697 14589.985 228 45.830 14635.815
68096.0 68096.0 0.0 64207.5 545344.0 386674.5 30775744.0 19221750.2 137452.0 124322.4 18860.0 15380.6 324697 14589.985 228 45.830 14635.815
68096.0 68096.0 0.0 64207.5 545344.0 457055.4 30775744.0 19221750.2 137452.0 124322.4 18860.0 15380.6 324697 14589.985 228 45.830 14635.815
68096.0 68096.0 0.0 64207.5 545344.0 485538.8 30775744.0 19221750.2 137452.0 124322.4 18860.0 15380.6 324697 14589.985 228 45.830 14635.815
68096.0 68096.0 0.0 64207.5 545344.0 505893.4 30775744.0 19221750.2 137452.0 124322.4 18860.0 15380.6 324697 14589.985 228 45.830 14635.815

And this is a time where a third-party tool would be helpful but I never really ‘get’ what is and what is not OK to install on servers, so try not to install things — because the *useful* bit of information for any of this is really the usage / size percent utilization value.

That last grouping of stuff — I look at those v/s how long the pid has been running. If you’ve gotten a billion GC’s and the PID has only been running for eight seconds, that is a crazy amount of I/O. If I’ve only had 3 GCs and the pid has been running for seven years, it hasn’t been doing anything. In between? I don’t really find the numbers useful unless I’ve got a baseline from normal operation.

June 8, 2023

Maintaining an /etc/hosts record

I encountered an oddity at work — there’s a server on an internally located public IP space. Because it’s public space, it is not allowed to communicate with the internal interface of some of our security group’s servers. It has to use their public interface (not technically, just a policy on which they will not budge). I cannot just use a DNS server that resolves the public copy of our zone because then we’d lose access to everything else, so we are stuck making an /etc/hosts entry. Except this thing changes IPs fairly regularly (hey, we’re moving from AWS to Azure; hey, let’s try CloudFlare; nope, that is expensive so change it back) and the service it provides is application authentication so not something you want randomly falling over every couple of months.

So I’ve come up with a quick script to maintain the /etc/hosts record for the endpoint.

# requires: dnspython, subprocess

import dns.resolver
import subprocess

strHostToCheck = 'hostname.example.com' # PingID endpoint for authentication
strDNSServer = "8.8.8.8"         # Google's public DNS server
listStrIPs = []

# Get current assignement from hosts file
listCurrentAssignment = [ line for line in open('/etc/hosts') if strHostToCheck in line]

if len(listCurrentAssignment) >= 1:
        strCurrentAssignment = listCurrentAssignment[0].split("\t")[0]

        # Get actual assignment from DNS
        objResolver = dns.resolver.Resolver()
        objResolver.nameservers = [strDNSServer]
        objHostResolution = objResolver.query(strHostToCheck)

        for objARecord in objHostResolution:
                listStrIPs.append(objARecord.to_text())

        if len(listStrIPs) >= 1:
                # Fix /etc/hosts if the assignment there doesn't match DNS
                if strCurrentAssignment in listStrIPs:
                        print(f"Nothing to do -- hosts file record {strCurrentAssignment} is in {listStrIPs}")
                else:
                        print(f"I do not find {strCurrentAssignment} here, so now fix it!")
                        subprocess.call([f"sed -i -e 's/{strCurrentAssignment}\t{strHostToCheck}/{listStrIPs[0]}\t{strHostToCheck}/g' /etc/hosts"], shell=True)
        else:
                print("No resolution from DNS ... that's not great")
else:
        print("No assignment found in /etc/hosts ... that's not great either")

May 11, 2023

Tableau Query — Data Sources and the Workbooks Where They Are Used

I have found Tableau’s views of data sources to be … lacking. To provide a report of data sources, the database type, and where it is being used, I put together a query that locates all data sources (or filters by database type — specifically, I was trying to see who was using Snowflake) and lists the site, project, and workbook using the data source.

-- Data sources and what workbook they are used in
SELECT system_users.email , datasources.id, datasources.name, datasources.created_at,  datasources.updated_at, datasources.db_class, datasources.db_name
, datasources.site_id, sites.name AS SiteName, projects.name AS ProjectName, workbooks.name AS WorkbookName
FROM datasources 
LEFT OUTER JOIN users ON users.id = datasources.owner_id
LEFT OUTER JOIN system_users ON users.system_user_id = system_users.id
LEFT OUTER JOIN sites ON datasources.site_id = sites.id
LEFT OUTER JOIN projects ON datasources.project_id = projects.id
LEFT OUTER JOIN workbooks ON datasources.parent_workbook_id = workbooks.id
-- WHERE datasources.db_class = 'snowflake' 
ORDER BY datasources.created_at;

May 1, 2023

Web Redirection Based on Typed URL

I have no idea why I am so pleased with this simple HTML code, but I am! My current project is to move all of our Tableau servers to different servers running a newer version of Windows. When I first got involved with the project, it seemed rather difficult (there was talk of manually recreating all of the permissions on each item!!) … but, some review of the vendors documentation let me to believe one could build a same-version server elsewhere (newer Windows, out in magic cloudy land, but the same Tableau version), back up the data from the old server, restore it to the new one, and be done. It’s not quite that simple — I had to clear out the SAML config & manually reconfigure it so the right elements get added into the Java keystore, access to the local Postgresql database needed to be manually configured, a whole bunch of database drivers needed to be installed, and the Windows registry of ODBC connections needed to be exported/imported. But the whole process was a lot easier than what I was first presented.

Upgrading the first production server was mostly seamless — except users appear to have had the server’s actual name. Instead of accessing https://tableau.example.com, they were typing abcwxy129.example.com. And passing that link around as “the link” to their dashboard. And, upon stopping the Tableau services on the old server … those links started to fail. Now, I could have just CNAMED abcwxy129 over to tableau and left it at that. But letting users continue to do the wrong thing always seems to come back and haunt you (if nothing else, the OS folks own the namespace of servers & are allowed to re-use or delete those hostnames at will). So I wanted something that would take whatever https://tableau.example.com/#/site/DepartMent/workbooks/3851/Views kind of URL a user provided and give them the right address. And, since this was Windows, to do so with IIS without the hassle of integrating PHP or building a C# project. Basically, I wanted to do it within basic HTML. Which meant JavaScript.

And I did it — using such a basic IIS installation that the file is named something like iisstart.htm so I didn’t have to change the default page name. I also redirected 404 to / so any path under the old server’s name will return the redirection landing page.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
	<head>
		<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
		<title>This Tableau server has moved</title>
		
		<style type="text/css">
			<!--
			body {
				color:#000000;
				background-color:#eeeeee;
				margin:0;
			}
			-->
		</style>
	</head>
	<body>
		<P><ul>
		<h2>The Tableau server has moved. </h2>
		<P>The location you accessed, <span style="white-space: nowrap" id="oldurl"></span>, is no longer available.<br><br> Please update your link to use <span style="white-space: nowrap"  id="newurl"></span></p>
		</ul></P>
	
		<script>
			let strOldURL = window.location.href;

			let strNewURL = strOldURL.replace(/hostname.example.com/i,"tableau.example.com");
			strNewURL = strNewURL.replace(/otherhostname.example.com/i,"tableau.example.com");

			document.getElementById("oldurl").innerHTML = window.location.href;
			document.getElementById("newurl").innerHTML = "<a href=" + strNewURL + ">" + strNewURL + "</a>";
		</script>
	
	</body>
</html>

April 27, 2023

Exchange Disaster Recovery

I guess with everyone moving to magic cloudy pay-per-month Exchange, this isn’t such a concern anymore … but for those still running on-premise Exchange:

(1) Before you can restore your AD system state, you’ve got to build a server & bring up a temporary domain. There’s a “System Configuration” program that lets you select to restart in safe mode / directory services restore mode without having to time the F8 key or anything.

(2) The system state backup of a domain controller backs up a lot of stuff — including the registry which tells the server what software is installed and services. This means it is not possible to just run the Exchange setup.exe with the disaster recovery option. Fortunately, I was able to copy the Exchange folder from program files off of a backup. Unfortunately, the Exchange services wouldn’t start because DLLs couldn’t register. Did a diff between old server backup & new one — copied any missing stuff from c:\windows\system32 and c:\windows\syswow64 and, voila, Exchange is starting. Couldn’t mount the ebd file, though …

(3) Which brings me to eseutil an attempt to replay the transaction logs (eseutil /r) and then repair the database as much as possible (eseutil /p) got me an EDB file that the Exchange server could mount.

April 20, 2023

Grafana — SSO With PingID (OAuth)

I enabled SSO in our development Grafana system today. There’s not a great user experience with SSO enabled because there is a local ‘admin’ user that has extra special rights that aren’t given to users put into the admin role. If you just enable SSO, there is a new button added under the logon dialogue that users can use to initiate an SSO authentication. That’s not great, though, since most users really should be using the SSO workflow. And people are absolutely going to be putting their login information into that really obvious set of text input fields.

Grafana has a configuration to bypass the logon form and just always go down the OAUTH authentication:

# Set to true to attempt login with OAuth automatically, skipping the login screen.
# This setting is ignored if multiple OAuth providers are configured.
oauth_auto_login = true

Except, now, the rare occasion we need to use the local admin account requires us to set this to false, restart the service, do our thing, change the setting back, and restart the service again. Which is what we’ll do … but it’s not a great solution either.

Config to authenticate Grafana to PingID using OAUTH

#################################### Generic OAuth ##########################
[auth.generic_oauth]
name = PingID
enabled = true
allow_sign_up = true
client_id = 12345678-1234-4567-abcd-123456789abc
client_secret = abcdeFgHijKLMnopqRstuvWxyZabcdeFgHijKLMnopqRstuvWxyZ
scopes = openid profile email
email_attribute_name = email:primary
email_attribute_path =
login_attribute_path = user
role_attribute_path =
id_token_attribute_name =
auth_url = https://login.example.com/as/authorization.oauth2
token_url = https://login.example.com/as/token.oauth2
api_url = https://login.example.com/idp/userinfo.openid
allowed_domains =
team_ids =
allowed_organizations =
tls_skip_verify_insecure = true
tls_client_cert =
tls_client_key =
tls_client_ca =

April 19, 2023

Mounting DD Raw Image File

And a final note from my disaster recovery adventure — I had to use ddrescue to copy as much data from a corrupted drive as possible (ddrescue /dev/sdb /mnt/vms/rescue/backup.raw –try-again –force –verbose) — once I had the image, what do you do with it? Fortunately, you can mount a dd file and copy data from it.

# Mounting DD image
2023-04-17 23:54:01 [root@fedora /]# kpartx -l backup.raw
loop0p1 : 0 716800 /dev/loop0 2048
loop0p2 : 0 438835200 /dev/loop0 718848

2023-04-17 23:55:08 [root@fedora /]# mount /dev/mapper/loop0p2 /mnt/recovery/ -o loop,ro
mount: /mnt/recovery: cannot mount /dev/loop1 read-only.
       dmesg(1) may have more information after failed mount system call.

2023-04-17 23:55:10 [root@fedora /]# mount /dev/mapper/loop0p2 /mnt/recovery/ -o loop,ro,norecovery

2023-04-18 00:01:03 [root@fedora /]# ll /mnt/recovery/
total 205G
drwxr-xr-x  2 root root  213 Jul 14  2021 .
drwxr-xr-x. 8 root root  123 Apr 17 22:38 ..
-rw-r--r--. 1 root root 127G Apr 17 20:35 ExchangeServer.qcow2
-rw-r--r--. 1 qemu qemu  10G Apr 17 21:42 Fedora.qcow2
-rw-r--r--. 1 qemu qemu  15G Apr 17 14:05 FedoraVarMountPoint.qcow2

April 19, 2023

Repairing Corrupt Partitions

After the power came back on, I had to recover ext and xfs partitions —

# FSCK to clean up bad superblock
lsblk
fsck /dev/sdb
e2fsck -b 32768 /dev/sdb1
mount /dev/sdb1 /var

# xfs_repair for XFS FS on LVM 
lvscan # get dev for LVM
mount /dev/fedora/root /mnt/oh
xfs_repair -L /dev/fedora/root
mount /dev/fedora/root /mnt/oh

April 19, 2023

Mounting a QCOW File

We had a power outage on Monday that took out the drive that holds our VMs. There are backups, but the backup drive copies had superblock errors and all sorts of issues. To recover our data, I learned all sorts of new things — firstly that you can mount a QCOW file and copy data out. First, you have to connect a network block device to the file. Once it is connected, you can use fdisk to list the partitions on the drive and mount those partitions. In this example, I had a partition called nbd0p1 that I mounted to /mnt/data_recovery

modprobe nbd max_part=2
qemu-nbd --connect=/dev/nbd0 /path/to/server_file.qcow2
fdisk /dev/nbd0 -l
mount /dev/nbd0p1 /mnt/data_recovery

Once you are done, unmount it and disconnect from the network block device.

umount /mnt/data_recovery
qemu-nbd --disconnect /dev/nbd0
rmmod nbd

April 18, 2023

ISC Bind – Converting Secondary Zone to Primary

Our power went out on Monday and, unfortunately, the SSD on the server with all of our VMs got corrupted. The main server has ISC Bind configured to host all of our internal DNS zones as secondaries … but, a day after the primary DNS server went down, those copies fell over. Luckily, you can convert a secondary zone to primary. The problem is that the cached copy of the zone was … funky binary stuff.

Luckily there’s an executable to convert this into a text zone file — named-compilezone

-f raw -F text -o output_file_name zone_name input_file_name

So, to covert my rushworth.us zone:

named-compilezone -f raw -F text -o rushworth.us.db rushworth.us rushworth.us.db.bin

Then, in the named.conf file, change the zone type to “master” and remark out the line indicating which the masters are. Change the “files” line to the newly created file. If you haven’t already done so, add “allow-query {any; };” so clients can actually query the zone.