Category: Technology

Logstash – Key Value Parsing

The KV filter plug-in is a quick way to split key/value pairs from message data. An example syslog message where there is some prefix information followed by key/value pairs. In this case, each pair is separated by a semicolon and they keys and values are separated by a colon.

<140>1 2023-04-13T17:43:00+01:00 DEVICENAME5@10.1.2.3 EVENT 2693 [meta sequenceId="33"]"time-stamp":2023-04-13T17:43:00+01:00;"session-id":;"user-name":;"id":0;"type":CREATE;"entity":not-alarmed-event-notification

The first thing you need to do is to parse the message so the key/value pair data is in a single field.

"message" => "<%{POSINT:syslog_pri}>%{NUMBER:stuff} %{DATA:syslog_timestamp}+%{DATA:syslog_timestamp_offset} %{SYSLOGHOST:logsource}@%{DATA:sourceip} %{DATA:log_type} %{NUMBER:event_id} \[meta sequenceId=\"%{DATA:meta_sequence_id}\"\] %{GREEDYDATA:kvfields}"

Now that the data is available in kvfields, the kv filter can be used to parse the data. Indicate which character splits fields, which character splits the key and value, and what field is the source of the key/value pair data. Additionally, if you need to trim data from keys (trim_key) or values (trim_value), you can do so. In this case, each of the keys is quoted. I do not wish to carry the quotes through on the field name, so I am trimming the double-quote character from keys.

kv {
     field_split => ";"
     value_split  => ":"
     trim_key  => '"'
     source  => "kvfields"
}

You can recursively parse data, if needed, and the key/values parsed from a value will be sub-elements of the parent key.

Ruby

Sometimes more advanced logic is required to parse message content. There is a ruby filter plugin that allows you to run ruby code. As an example, the “attributes” key contains key/value pairs but the same delimiter is used for both key/value and the list of pairs.

<140>1 2023-04-13T17:57:00+01:00 DEVICENAME5@10.1.2.3 EVENT 2693 [meta sequenceId="12"] "time-stamp":2023-04-13T17:57:00+01:00;"session-id":;"user-name":;"id":0;"type":CREATE;"entity":not-alarmed-event-notification;"attributes":"condition-type;T-BE-FEC;condition-description;Bit Error Forward Error Correction HT = 325651545656;location;near-end;direction;ingress;time-period;1min;service-affect;NSA;severity-level;cleared;fm-entity;och-os-1/2/2;fm-entity-type;OCH-OS;occurrence-date-time;2023-04-13T17:55:55+01:00;alarm-condition-type;standing;extension-description;;last-severity-level;not-applicable;alarm-id;85332F351D9EA5FC7BB52C1C75F85B5527251155;"

If you break the string into an array on the delimiter, even elements are the key and the +1 odd element is the corresponding value.

ruby {
     code => "
          strattributes = event.get('[attributes]')
          arrayattributebreakout = strattributes.split(';')
          if arrayattributebreakout.count > 0
               arrayattributebreakout.each_with_index do |element,index|
               if index.even?
                    event.set(arrayattributebreakout[index], arrayattributebreakout[index+1])
               end
          end
       end
       "
}

 

 

Predicting the Future

That didn’t take longhttps://www.engadget.com/three-samsung-employees-reportedly-leaked-sensitive-data-to-chatgpt-190221114.html

Leaking data is obviously a big problem if the user base is “anyone with an internet connection”, but potentially not great even for an internal implementation of an AI chatbot.

Content management platforms, in the early days, had a big problem with search because the indexing engine had super-user rights – so searching for “acquisition” would give you links that you couldn’t read. Even if the titles didn’t tell you anything (does “Project OPUS” or “Project Golden Falcon” have any meaning to you?), the dates & authors told you something (hey, there’s a bunch of new docs the C-levels are creating about acquisitions this past few weeks … sure that doesn’t mean anything!). Eventually any halfway decent content management platform understood permissions and at least attempts to filter results based on what you have permission to view.

AI is different, unfortunately in a way that makes implementing that type of security more difficult. Other than individualizing the trained AIs for each user (so info you feed in is only going to be reflected in your future results) or not training based on user input (only use stuff that’s openly readable already) … it would be rather challenging to filter an implementation so it knows stuff it’s been told but doesn’t convey that information to unauthorized individuals.

Increasing Kibana CSV Report Max Size

The default size limit for CSV reports in Kibana is 10 meg. Since that’s not enough for some of our users, I’ve been testing increases to the xpack.reporting.csv.maxSizeBytes value.

We’re still limited by the ES http.max_content_length value — which the documentation seems pretty confident shouldn’t be increased because the system can become unstable. Increasing the max Kibana report size to 100mb just yields a different error because ES doesn’t like it. 75 exhausted the JavaScript heap (?) – which I could get around by setting  NODE_OPTIONS=–max_old_space_size=4096 … but that just led to the server abending whenever a report was run (in fact, I had to remove the reports I tried to run from the server to get everything back into a working state). Increasing the limit to 50 meg, though, didn’t do anything unreasonable in dev. So somewhere between 50 and 75 meg is our upper limit, and 50 seemed like a nice round number to me.

Notes on resource usage – Data is held in memory as a report is created. We’d see an increase in memory/CPU usage while the report is being generated (or, I guess more accurately, a longer time during which the memory/CPU usage is increased because if a 10 meg report takes 30 seconds to run then a 50 meg report is going to take 2.5 minutes to run … and the memory/CPU usage is pretty much the same during the “a report is running” period).

Then, though, the report is stashed in ElasticSearch for user(s) to retrieve within .reporting* indicies. And that’s where things get a little silly — architecturally, this is just another index; it ages off with a lifecycle policy if one exists. But it looks like they never created a lifecycle management policy. So you can still retrieve reports run a little over two years ago! We will certainly want to set up a policy to clean up old reports … just have to decide how long is reasonable.

 

Using Excel to turn week of month and day of week into actual dates

Our patching schedules are algorithmic – the 1st Tuesday of the month, the 3rd Wednesday of the month, etc. But that’s not particularly useful for notifying end users or for us to verify functionality after patching.

Graphical user interface, table Description automatically generated

Long term, I think we can pull the source data from a database and create appointment items each month for whatever list of servers will be patched that month based on a relative date (so no one has to add new servers or remove decommissioned servers). But, short term? I really wanted a way to see what date a server would be patched. So I created a but of a convoluted spreadsheet to produce this information based on a list of servers and patching schedule patterns.

There are two “extra” tabs used – “Dates” used to say what month and year I want the patching dates for

Graphical user interface, application, table, Excel Description automatically generated

And “ServerData” which provides a cross-reference between the server names and a useful description.

Graphical user interface, application, table, Excel Description automatically generated

There are then a series of formulae used to add columns to our source data. First, the “Function” is populated in column G with a VLOOKUP =VLOOKUP(B2,ServerData!A:B,2,FALSE)

Columns I and J break the “1st Saturday” into the two components – week of month and day of week –

I =LEFT(C2,3)
J =RIGHT(C2,LEN(C2)-4)

Columns K and L then map these components into numeric values I can use in a formula:

K =IF(I2=”1st”,1,IF(I2=”2nd”,2,IF(I2=”3rd”,3,IF(I2=”4th”,4,”Unscheduled”))))
L =IF(J2=”Sunday”,1,IF(J2=”Monday”,2,IF(J2=”Tuesday”,3,IF(J2=”Wednesday”,4,IF(J2=”Thursday”,5,IF(J2=”Friday”,6,IF(J2=”Saturday”,7,”Unscheduled”)))))))

And finally a formula in column H that turns the week of month and day of week values into an actual date within the month and year on the “Dates” tab:

H =DATE(Dates!$B$2,Dates!$A$2,1+7*K2)-WEEKDAY(DATE(Dates!$B$2,Dates!$A$2,8-L2))

Voila – I have a spreadsheet that says we should expect to see this specific list of servers being patched tonight.

Graphical user interface, application, table, Excel Description automatically generated

 

K8s 1.24.12 Upgrade

Trying to upgrade our dev Kubernetes environment to 1.24.12 … and we encountered what seems to be a fairly common error — unknown service runtime.v1alpha2.RuntimeService

kubeserver:~ # kubeadm init
I0323 13:53:26.492921   55320 version.go:256] remote version is much newer: v1.26.3; falling back to: stable-1.24
[init] Using Kubernetes version: v1.24.12
[preflight] Running pre-flight checks
        [WARNING Firewalld]: firewalld is active, please ensure ports [6443 10250] are open or your cluster may not function correctly
error execution phase preflight: [preflight] Some fatal errors occurred:
        [ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
        [ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
        [ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
        [ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
        [ERROR CRI]: container runtime is not running: output: E0323 13:53:26.741684   55340 remote_runtime.go:948] "Status from runtime service failed" err="rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService"
time="2023-03-23T13:53:26-05:00" level=fatal msg="getting status of runtime: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService"
, error: exit status 1
        [ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

We found a lot of people online with the same issue who (1) removed the config.toml and tried again, (2) changed the SystemdCGroup setting in the config, or uninstalled and reinstalled some/all of the components until it worked. Unfortunately, removing or modifying the config didn’t help. And removing and reinstalling everything wasn’t particularly appealing. However, we noticed that the same error was reported directly from containerd:

kubeserver:~ # crictl ps
E0323 13:53:07.061777   55228 remote_runtime.go:557] "ListContainers with filter from runtime service failed" err="rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService" filter="&ContainerFilter{Id:,State:&ContainerStateValue{State:CONTAINER_RUNNING,},PodSandboxId:,LabelSelector:map[string]string{},}"
FATA[0000] listing containers: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService

Looking at the plugins, there were some in an error state

kubeserver:~ # ctr plugins ls
TYPE                                  ID                       PLATFORMS      STATUS
io.containerd.content.v1              content                  -              ok
io.containerd.snapshotter.v1          aufs                     linux/amd64    skip
io.containerd.snapshotter.v1          btrfs                    linux/amd64    skip
io.containerd.snapshotter.v1          devmapper                linux/amd64    error
io.containerd.snapshotter.v1          native                   linux/amd64    ok
io.containerd.snapshotter.v1          overlayfs                linux/amd64    error
io.containerd.snapshotter.v1          zfs                      linux/amd64    skip

So … it seemed reasonable to look for errors in the messages log from containerd. And, yeah, we had all sorts of errors. Including a rather scary one about reformatting the file system!

Mar 23 13:24:51 kubeserver containerd: time="2023-03-23T13:24:51.726984260-05:00" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.overlayfs" error="/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs does not support d_type. If the backing filesystem is xfs, please reformat with ftype=1 to enable d_type support"

That would do it — we have a dedicated partition for the k8s stuff … and that volume is formatted the right way — xfs_info confirmed ftype=1

kubeserver:~ # xfs_info /kubernetes/
meta-data=/dev/mapper/kubernetes-kubernetes isize=512    agcount=4, agsize=131071744 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0 spinodes=0
data     =                       bsize=4096   blocks=524286976, imaxpct=5
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=255999, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

However containerd doesn’t really know anything about this volume, does it? The default location that containerd wants to use isn’t set up to support d_type. Editing /etc/containerd/config.toml, root now tells containerd to use our special partition for ‘stuff’ …

And we were able to run kubeadm init without error. Everything came up as it should have, and our k8s server was upgraded!

How AI is like Social Media

I’ve said about social media type platforms – if you are making an informed decision to trade privacy for convenience / information / entertainment / comradery … if you feel that you are getting a good “deal” in that trade? Then social media type platforms are awesome. If you only think you are getting something in the deal – unaware of what the platform owners are getting from you – then I find that problematic.

The same is true for the public AI models – some of what they are getting from you is just language training & beta testing. If I had a dozen people type questions, I have not gotten a robustly representative sample of how ‘people’ talk. Getting a few million people to provide samples of their linguistic quirks is absolutely an important part of producing universally useful natural language processing. The models are also asking you to flag anything odd / wrong / unsettling. Again, this is the public providing numbers to testing that has already been done.

But … if an AI is being trained by inputs, information you provide is being incorporated into the underlying patterns. Which isn’t to say they are saving the exact text I provided in a transaction. Even so, the information provided can, in a more general sense, become part of the AI’s knowledge base. And, if that knowledge base is used to serve results to the general public? Then it is possible for the AI to “leak” confidential information that I fed into transactions.

Tableau — Installing after Control Panel Uninstall

When attempting to install Tableau, I got an error telling me that “an older version of tableau is installed but the tsm administrative services are not running” … which, as I didn’t own the server until recently, was quite possibly true. But it was also uninstalled from the control panel, cleaned up on disk, and really not there to start.

I discovered that Tableau registers some environment variables — and the mere presence of these variables will cause the “it’s already installed, so I’m not going any farther” error. Deleting the environment variables allowed me to re-install the current Tableau version without problem.

To remove an environment variable, go to Control Panel > System > Advanced system settings — highlight the variable you want to remove (all of the TABLEAU* ones in this case) and click ‘Delete’.

Artificial Limits on Artificial Intelligence

I cannot say it surprises me that a company that considered “restart it” and “reboot it” to be suitable solutions to a whole host of memory management issues has decided that clearing the history after n interactions with their AI chatbot is the solution to not having it out looking for nuclear codes. From a functional standpoint, I can see where one or two “turns” is probably sufficient for the search engine to produce some salient information or relevant links. But, from a technological standpoint, it seems that an AI that becomes “confused” five or six exchanges into a conversation is vastly problematic; and introducing some artificial limit on the front-end to reset the history (1) eliminates a key feature of learning algorithms by preventing it from “learning” you and (2) is akin to turning your back on a tiger and considering the problem sorted.

Maple Mapping

For the last few years, we’ve talked about mapping out our maple trees — we track which ones we tap, when we tap them, and occasionally try to track how much sap the tree produced. Which is difficult when the tree is labeled as “second down from planter on driveway” or “the next, next one by neighbor”. It seemed like we should be able to use our phones to tag each location — ideally while there are still leaves on the trees so we could denote them as sugar, red, etc.

We settled on an open source app that uses Open StreetMap — https://github.com/osmandapp/OsmAnd/ — there’s no convenient way for Scott and I to simultaneously edit the data set, but we can export the file on one phone and import it onto the other so we are both looking at the same points. Each tree is numbered, and there is a note with the type of tree and how many taps.

Now we know we are at tree #27 (the phone’s location will show up as a blue dot).

Python Script: Checking Certificate Expiry Dates

A long time ago, before the company I work for paid for an external SSL certificate management platform that does nice things like clue you into the fact your cert is expiring tomorrow night, we had an outage due to an expired certificate. One. I put together a perl script that parsed output from the openssl client and proactively alerted us of any pending expiration events.

That script went away quite some time ago — although I still have a copy running at home that ensures our SMTP and web server certs are current. This morning, though, our K8s environment fell over due to expired certificates. Digging into it, the certs are in the management platform (my first guess was self-signed certificates that wouldn’t have been included in the pending expiry notices) but were not delivered to those of us who actually manage the servers. Luckily it was our dev k8s environment and we now know the prod one will be expiring in a week or so. But I figured it was a good impetus to resurrect the old script. Unfortunately, none of the modules I used for date calculation were installed on our script server. Seemed like a hint that I should rewrite the script in Python. So … here is a quick Python script that gets certificates from hosts and calculates how long until the certificate expires. Add on a “if” statement and a notification function, and we shouldn’t come in to failed environments needing certificate renewals.

from cryptography import x509
from cryptography.hazmat.backends import default_backend
import socket
import ssl
from datetime import datetime, timedelta

# Dictionary of hosts:port combinations to check for expiry
dictHostsToCheck = {
"tableau.example.com": 443       # Tableau 
,"kibana.example.com": 5601      # ELK Kibana
,"elkmaster.example.com": 9200   # ELK Master
,"kafka.example.com": 9093       # Kafka server
}
for strHostName in dictHostsToCheck:
    iPort = dictHostsToCheck[strHostName]

    datetimeNow = datetime.utcnow()

    # create default context
    context = ssl.create_default_context()

    # Do not verify cert chain or hostname so we ensure we always check the certificate
    context.check_hostname = False
    context.verify_mode = ssl.CERT_NONE

    with socket.create_connection((strHostName, iPort)) as sock:
        with context.wrap_socket(sock, server_hostname=strHostName) as ssock:
            objDERCert = ssock.getpeercert(True)
            objPEMCert = ssl.DER_cert_to_PEM_cert(objDERCert)
            objCertificate = x509.load_pem_x509_certificate(str.encode(objPEMCert),backend=default_backend())

            print(f"{strHostName}\t{iPort}\t{objCertificate.not_valid_after}\t{(objCertificate.not_valid_after - datetimeNow).days} days")