Category: Technology

SSSD LDAP Schema

I lost access to all of my Linux servers at work. And, unlike the normal report where nothing changed but xyz is now failing, I knew exactly what happened. A new access request had been approved about ten minutes previously. Looking at my ID, for some reason adding a new group membership changed account gid number to that new group. Except … that shouldn’t have actually dropped my access. If I needed the group to be my primary ID, I should have been able to use newgrp to switch contexts. Instead, I got prompted for a group password (which, yes, is a thing. No, no one uses it).

The hosts were set up to authenticate to AD using LDAP, and very successfully let me log in (or not, if I mistyped my password). They, however, would only see me as a member of my primary group. Well, today, I finally got a back door with sufficient access to poke around.

Turns out I was right — something was improperly configured so groups were not being read from the directory but rather implied from the gid value. I added the configuration parameter ldap_schema to instruct the server to use member instead of memberUid for memberships. I used rfc2307bis as that’s the value I was familiar with. I expect “AD” could be used as well, but figured we were well beyond AD 2008r2 and didn’t really want to dig farther into the nuanced differences between the two settings.

From https://linux.die.net/man/5/sssd-ldap

ldap_schema (string)

Specifies the Schema Type in use on the target LDAP server. Depending on the selected schema, the default attribute names retrieved from the servers may vary. The way that some attributes are handled may also differ.

Four schema types are currently supported:

  • rfc2307
  • rfc2307bis
  • IPA
  • AD

The main difference between these schema types is how group memberships are recorded in the server. With rfc2307, group members are listed by name in the memberUid attribute. With rfc2307bis and IPA, group members are listed by DN and stored in the member attribute. The AD schema type sets the attributes to correspond with Active Directory 2008r2 values.

 

Sumo Logic: Running Queries via API

This is my base script for using the Sumo Logic API to query logs and analyze data. This particular script finds hosts sending syslog data successfully through our firewall, looks who owns the netblock (they weren’t all internal!), and checks our configuration management database (cmdb) to see if we have a host registered with the destination IP address of the syslog traffic.

import requests
from requests.auth import HTTPBasicAuth
import time
from collections import defaultdict
import cx_Oracle
import pandas as pd
import ipaddress
from datetime import datetime
from ipwhois import IPWhois
from ipwhois.exceptions import IPDefinedError

# Import credentials from a config file
from config import access_id, access_key, oracle_username, oracle_password

# Initialize Oracle Client
cx_Oracle.init_oracle_client(lib_dir=r"C:\Oracle\instantclient_21_15")
oracle_dsn = cx_Oracle.makedsn('cmdb_db.example.com', 1521, service_name='cmdb_db.example.com')

# Function to query Oracle database
def query_oracle_cmdb(strIPAddress):
    with cx_Oracle.connect(user=oracle_username, password=oracle_password, dsn=oracle_dsn) as connection:
        cursor = connection.cursor()
        query = """
            SELECT HOSTNAME, FRIENDLYNAME, STATUS, COLLECTIONTIME, RETIREDBYDISPLAYNAME, 
                    RETIREDDATETIME, SERVERAPPSUPPORTTEAM, SERVERENVIRONMENT
            FROM NBIREPORT.CHERWELL_CMDBDATA_FULL
            WHERE IPADDRESS = :ipaddy
        """
        cursor.execute(query, [strIPAddress])
        result = cursor.fetchone()
        cursor.close()
        return result if result else ("",) * 8

# Function to determine IP ownership
def get_ip_ownership(ip):
    # Define internal IP ranges
    internal_networks = [
        ipaddress.IPv4Network("10.0.0.0/8"),
        ipaddress.IPv4Network("172.16.0.0/12"),
        ipaddress.IPv4Network("192.168.0.0/16")
    ]
    
    # Check if the IP is internal
    ip_obj = ipaddress.IPv4Address(ip)
    if any(ip_obj in network for network in internal_networks):
        return "INTERNAL"
    
    # For external IPs, use ipwhois to get ownership info
    try:
        obj = IPWhois(ip)
        result = obj.lookup_rdap(depth=1)
        ownership = result['network']['name']
    except IPDefinedError:
        ownership = "Reserved IP"
    except Exception as e:
        print(f"Error looking up IP {ip}: {e}")
        ownership = "UNKNOWN"
    
    return ownership

# Base URL for Sumo Logic API
base_url = 'https://api.sumologic.com/api/v1'

# Define the search query
search_query = '''
(dpt=514)
AND _sourcecategory = "observe/perimeter/firewall/logs"
| where !(act = "deny")
| where !(act = "timeout")
| where !(act = "ip-conn")
| where (proto=17 or proto=6)
| count dst, act
'''

# Function to create and manage search jobs
def run_search_job(start_time, end_time):
    search_job_data = {
        'query': search_query,
        'from': start_time,
        'to': end_time,
        'timeZone': 'UTC'
    }

    # Create a search job
    search_job_url = f'{base_url}/search/jobs'
    response = requests.post(
        search_job_url,
        auth=HTTPBasicAuth(access_id, access_key),
        json=search_job_data
    )

    if response.status_code != 202:
        print('Error starting search job:', response.status_code, response.text)
        return None

    # Get the search job ID
    job_id = response.json()['id']
    print('Search Job ID:', job_id)

    # Poll for the search job status
    job_status_url = f'{search_job_url}/{job_id}'
    while True:
        response = requests.get(job_status_url, auth=HTTPBasicAuth(access_id, access_key))
        status = response.json().get('state', None)
        print('Search Job Status:', status)
        if status in ['DONE GATHERING RESULTS', 'CANCELLED', 'FAILED']:
            break
        time.sleep(5)  # Reasonable delay to prevent overwhelming the server

    return job_id if status == 'DONE GATHERING RESULTS' else None

# Function to retrieve results of a search job
def retrieve_results(job_id):
    dst_counts = defaultdict(int)
    results_url = f'{base_url}/search/jobs/{job_id}/messages'
    offset = 0
    limit = 1000

    while True:
        params = {'offset': offset, 'limit': limit}
        try:
            response = requests.get(results_url, auth=HTTPBasicAuth(access_id, access_key), params=params, timeout=30)
            if response.status_code == 200:
                results = response.json()
                messages = results.get('messages', [])
                
                for message in messages:
                    message_map = message['map']
                    dst = message_map.get('dst')
                    if dst:
                        dst_counts[dst] += 1
                
                if len(messages) < limit:
                    break

                offset += limit
            else:
                print('Error retrieving results:', response.status_code, response.text)
                break
        except requests.exceptions.RequestException as e:
            print(f'Error during request: {e}')
            time.sleep(5)
            continue

    return dst_counts

# Main execution
if __name__ == "__main__":
    # Prompt for the start date
    start_date_input = input("Enter the start date (YYYY-MM-DD): ")
    try:
        start_time = datetime.strptime(start_date_input, "%Y-%m-%d").strftime("%Y-%m-%dT00:00:00")
    except ValueError:
        print("Invalid date format. Please enter the date in YYYY-MM-DD format.")
        exit()

    # Use today's date as the end date
    end_time = datetime.now().strftime("%Y-%m-%dT00:00:00")

    # Create a search job
    job_id = run_search_job(start_time, end_time)
    if job_id:
        # Retrieve and process results
        dst_counts = retrieve_results(job_id)

        # Prepare data for Excel
        data_for_excel = []

        print("\nDestination IP Counts and Oracle Data:")
        for dst, count in dst_counts.items():
            oracle_data = query_oracle_cmdb(dst)
            ownership = get_ip_ownership(dst)
            # Use only Oracle data columns
            combined_data = (dst, count, ownership) + oracle_data
            data_for_excel.append(combined_data)
            print(combined_data)

        # Create a DataFrame and write to Excel
        df = pd.DataFrame(data_for_excel, columns=[
            "IP Address", "Occurrence Count", "Ownership",
            "CMDB_Hostname", "CMDB_Friendly Name", "CMDB_Status", "CMDB_Collection Time", 
            "CMDB_Retired By", "CMDB_Retired Date", "CMDB_Support Team", "CMDB_Environment"
        ])

        # Generate the filename with current date and time
        timestamp = datetime.now().strftime("%Y%m%d-%H%M")
        output_file = f"{timestamp}-sumo_oracle_data.xlsx"
        df.to_excel(output_file, index=False)
        print(f"\nData written to {output_file}")
    else:
        print('Search job did not complete successfully.')

AD passwordLastSet Times

I’m doing “stuff” in AD again, and have again come across Microsoft’s wild “nanoseconds elapsed since 1601” reference time. AKA “Windows file time”. In previous experience, I was just looking to calculate deltas (how long since that password was set) so figuring out now, subtracting then, and converting nanoseconds elapsed into something a little less specific (days, for example) was fine. Today, though, I need to display a human readable date and time in Excel. Excel, which has its own peculiar way of storing date time values. Fortunately, I happened across a formula that works

=((C2-116444736000000000)/864000000000)+DATE(1970,1,1)

Voila!

Quick sed For Sanitizing Config Files

When sending configuration files to other people for reference, I like to redact any credential-type information … endpoints that allow you to post data without creds, auth configs, etc. Sometimes I replace the string with REDACTED and sometimes I just drop the line completely.

Make a copy of the config files elsewhere, then run sed


# Retain parameter but replace value with REDACTED
sed -i 's|http_post_url: "https://.*"|post_url: "REDACTED"|' *.yaml

# Remove line from config
sed -i '/authorization: Basic/d' *.yaml

QR Code Generation

I put together a quick program that creates a “fancy” QR code to a specified URL with the specified color and drops the desired “logo” file into the center of the code.

import qrcode
from PIL import Image

def generate_qr_code_with_custom_color_and_logo():
    url = input("Please enter the URL for which you want to generate a QR code: ")

    rgb_input = input("Please enter the RGB values for the QR code color (e.g. 0,0,0 for black): ")
    
    try:
        rgb_color = tuple(map(int, rgb_input.split(',')))
        if len(rgb_color) != 3 or not all(0 <= n <= 255 for n in rgb_color):
            raise ValueError("Invalid RGB color value.")
    except Exception:
        print("Error parsing RGB values. Please make sure to enter three integers separated by commas.")
        return

    qr = qrcode.QRCode(
        version=1,  # controls the size of the QR Code
        error_correction=qrcode.constants.ERROR_CORRECT_H,  # high error correction for image insertion
        box_size=10,
        border=4,
    )
    qr.add_data(url)
    qr.make(fit=True)

    # Generate the QR code with the specified RGB color
    img = qr.make_image(fill_color=rgb_color, back_color="white")

    # Load the logo image
    logo_image_path = input("Please enter the logo for the center of this QR code: ")

    try:
        logo = Image.open(logo_image_path)
    except FileNotFoundError:
        print(f"Logo image file '{logo_image_path}' not found. Proceeding without a logo.")
        img.save("qr_code_with_custom_color.png")
        print("QR code has been generated and saved as 'qr_code_with_custom_color.png'.")
        return

    # Resize the logo image to fit in the QR code
    img_width, img_height = img.size
    logo_size = int(img_width * 0.2)  # The logo will take up 20% of the QR code width
    logo = logo.resize((logo_size, logo_size), Image.ANTIALIAS)

    position = ((img_width - logo_size) // 2, (img_height - logo_size) // 2)

    img.paste(logo, position, mask=logo.convert("RGBA"))

    img.save("qr_code_with_custom_color_and_logo.png")

    print("QR code with a custom color and a logo image has been generated and saved as 'qr_code_with_custom_color_and_logo.png'.")

if __name__ == "__main__":
    generate_qr_code_with_custom_color_and_logo()

Voila!

Outlook Web Joyful Animations

I have gotten a few messages at work where it seems like someone went through extra effort to highlight the word “congratulations” and set a onMouseOver trigger that throws digital confetti.

After a while, I wondered how people did that. What other animations can you trigger? And it turns out the answer is … they didn’t! Microsoft has a setting called “Joyful Animations” that identifies a few phrases within messages you receive and sets these triggers.

JavaScript: Extracting Web Content You Cannot Copy

There are many times I need to copy “stuff” from a website that is structured in such a way that simply copy/pasting the table data is impossible. Screen prints work, but I usually want the table of data in Excel so I can add notations and such. In these cases, running JavaScript from the browser’s developers console lets you access the underlying text elements.

Right click on one of the text elements and select “Inspect”

Now copy the element’s XPath

Read the value — we don’t generally want just this one element … but the path down to the “tbody” tag looks like a reasonable place to find the values within the table.

/html/body/div[1]/div/div/div[2]/div[2]/div[2]/div/div[3]/div/div/div[3]/div/div/div/table/tbody/a[4]/td[2]/div/span[2]

Use JavaScript to grab all of the TD elements under the tbody:

// Define the XPath expression to select all <td> elements within the specific <tbody>
const xpathExpression = "/html/body/div[1]/div/div/div[2]/div[2]/div[2]/div/div[3]/div/div/div[3]/div/div/div/table/tbody//td";

// Use document.evaluate to get all matching <td> nodes
const nodesSnapshot = document.evaluate(xpathExpression, document, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);

// Log the number of nodes found (for debugging purposes)
console.log("Total <td> elements found:", nodesSnapshot.snapshotLength);

// Iterate over the nodes and log their text content
for (let i = 0; i < nodesSnapshot.snapshotLength; i++) {
    let node = nodesSnapshot.snapshotItem(i);
    if (node) {
        const textContent = node.textContent.trim();
        if (textContent) { // Only log non-empty content
            console.log(textContent);
        }
    }
}

Voila! I redacted some data below, but it’s just a list of values, one per line.

Kafka Metadata Mismatch

I’ll prefix this with a caveat — when Zookeeper and the Kafka metadata properties topic ID values don’t match up … there’s something wrong beyond “hey, make this string the same as that string and try again”. Look for communication issues, disk issues, etc that might lead to bad data. And, lacking any cause, the general recommendation is to drop and recreate the topic.

But! “Hey, I am going to delete all the data in this Kafka topic. Everyone good with that?” is seldom answered with a resounding “of course, we don’t actually want to keep the data we specifically configured a system to keep”. And you need to keep the topic around even though something has clearly gone sideways. Fortunately, you can manually update either the zookeeper topic ID or the one on disk. I find it easier to update the one on disk in Kafka because this change can be reverted. Stop all of the Kafka servers. The following script runs on each Kafka server – identifies all of the folders within the Kafka data for the partition and changes the wrong topic ID to the right one. Then start the Kafka servers again, and the partition should be functional. For now.

#!/bin/bash

# Function to check if Kafka is stopped
check_kafka_stopped() {
    if systemctl is-active --quiet kafka; then
        echo "Error: Kafka service is still running. Please stop Kafka before proceeding."
        exit 1
    else
        echo "Kafka service is stopped. Proceeding with backup and update."
    fi
}

# Define the base directory where Kafka stores its data
DATA_DIR="/path/todata-kafka"

# Define the topic prefix to search for
TOPIC_PREFIX="MY_TOPIC_NAME-"

# Old and new topic IDs
OLD_TOPIC_ID="2xSmlPMBRv2_ihuWgrvQNA"
NEW_TOPIC_ID="57kBnenRTo21_VhEhWFOWg"

# Date string for backup file naming
DATE_STR=$(date +%Y%m%d)

# Check if Kafka is stopped
check_kafka_stopped

# Find all directories matching the topic pattern and process each one
for dir in $DATA_DIR/${TOPIC_PREFIX}*; do
    # Construct the path to the partition.metadata file
    metadata_file="$dir/partition.metadata"

    # Check if the metadata file exists
    if [[ -f "$metadata_file" ]]; then
        # Backup the existing partition.metadata file
        backup_file="$dir/partmetadata.$DATE_STR"
        echo "Backing up $metadata_file to $backup_file"
        cp "$metadata_file" "$backup_file"

        # Use sed to replace the line containing the old topic_id with the new one
        echo "Updating topic_id in $metadata_file"
        sed -i "s/^topic_id: $OLD_TOPIC_ID/topic_id: $NEW_TOPIC_ID/" "$metadata_file"
    else
        echo "No partition.metadata file found in $dir"
    fi
done

echo "Backup and topic ID update complete."

How do you get the Kafka metadata and Zookeeper topic IDs? In Zookeeper, ask it:

kafkaserver::fixTopicID # bin/zookeeper-shell.sh $(hostname):2181
get /brokers/topics/MY_TOPIC_NAME
{"removing_replicas":{},"partitions":{"2":[251,250],"5":[249,250],"27":[248,247],"12":[248,251],"8":[247,248],"15":[251,250],"21":[247,250],"18":[249,248],"24":[250,248],"7":[251,247],"1":[250,249],"17":[248,247],"23":[249,247],"26":[247,251],"4":[248,247],"11":[247,250],"14":[250,248],"20":[251,249],"29":[250,249],"6":[250,251],"28":[249,248],"9":[248,249],"0":[249,248],"22":[248,251],"16":[247,251],"19":[250,249],"3":[247,251],"10":[251,249],"25":[251,250],"13":[249,247]},"topic_id":"57kBnenRTo21_VhEhWFOWg","adding_replicas":{},"version":3}

Kafka’s is stored on disk in each folder for a topic partition:

kafkaserver::fixTopicID # cat /path/to/data-kafka/MY_TOPIC_NAME-29/partition.metadata
version: 0
topic_id: 2xSmlPMBRv2_ihuWgrvQNA

Once the data is processed, it would be a good idea to recreate the topic … but ensuring the two topic ID values match up will get you back online.

Python: Getting Active Directory Subnets

Like my script that pulls the AD site information – this lets me see what subnets are defined and which sites are assigned to those subnets. I was able to quickly confirm that the devices that had problems communicating with Active Directory don’t have a site defined. Way back in 2000, we created a “catch all” 10.0.0.0/8 subnet and assigned it to the user authentication site. New networks on a whole different addressing scheme don’t have a site assignment. It should still work, but the application in question has historically had issues with going the “Ok, list ’em all” route.

from ldap3 import Server, Connection, ALL, SUBTREE, Tls
import ssl
import getpass

# Attempt to import USERNAME and PASSWORD from config.py
try:
    from config import USERNAME, PASSWORD
except ImportError:
    USERNAME, PASSWORD = None, None

# Define constants
LDAP_SERVER = 'ad.example.com'
LDAP_PORT = 636

def get_subnets_and_sites(username, password):
    # Set up TLS configuration
    tls_configuration = Tls(validate=ssl.CERT_REQUIRED, version=ssl.PROTOCOL_TLSv1_2)

    # Connect to the LDAP server
    server = Server(LDAP_SERVER, port=LDAP_PORT, use_ssl=True, tls=tls_configuration, get_info=ALL)
    connection = Connection(server, user=username, password=password, authentication='SIMPLE', auto_bind=True)

    # Define the search base for subnets
    search_base = 'CN=Subnets,CN=Sites,CN=Configuration,DC=example,DC=com'  # Change this to match your domain's DN
    search_filter = '(objectClass=subnet)'  # Filter to find all subnet objects
    search_attributes = ['cn', 'siteObject']  # Retrieve the common name and site object references

    # Perform the search
    connection.search(search_base, search_filter, SUBTREE, attributes=search_attributes)

    # Extract and return subnets and their site assignments
    subnets_sites = []
    for entry in connection.entries:
        subnet_name = entry.cn.value
        site_dn = entry.siteObject.value if entry.siteObject else "No site assigned"
        subnets_sites.append((subnet_name, site_dn))

    return subnets_sites

def print_subnets_and_sites(subnets_sites):
    if subnets_sites:
        print("\nSubnets and their Site Assignments:")
        for subnet, site in subnets_sites:
            print(f"Subnet: {subnet}, Site: {site}")
    else:
        print("No subnets found in the domain.")

def main():
    # Prompt for username and password if not available in config.py
    username = USERNAME if USERNAME else input("Enter your LDAP username: ")
    password = PASSWORD if PASSWORD else getpass.getpass("Enter your LDAP password: ")

    subnets_sites = get_subnets_and_sites(username, password)
    print_subnets_and_sites(subnets_sites)

if __name__ == "__main__":
    main()

Python: Get Active Directory Sites

One down side of not administering the Active Directory domain anymore is that I don’t have the quick GUI tools that show you how “stuff” is set up. Luckily, the sites are all reflected in AD objects that can be read by authenticated users:

from ldap3 import Server, Connection, ALL, SIMPLE, SUBTREE, Tls
import ssl
import getpass

# Attempt to import USERNAME and PASSWORD from config.py
try:
    from config import USERNAME, PASSWORD
except ImportError:
    USERNAME, PASSWORD = None, None

# Define constants
LDAP_SERVER = 'ad.example.com'
LDAP_PORT = 636

def get_all_sites(username, password):
    # Set up TLS configuration
    tls_configuration = Tls(validate=ssl.CERT_REQUIRED, version=ssl.PROTOCOL_TLSv1_2)

    # Connect to the LDAP server
    server = Server(LDAP_SERVER, port=LDAP_PORT, use_ssl=True, tls=tls_configuration, get_info=ALL)
    connection = Connection(server, user=username, password=password, authentication='SIMPLE', auto_bind=True)

    # Define the search base for sites
    search_base = 'CN=Sites,CN=Configuration,DC=example,DC=com'  # Update to match your domain's DN structure
    search_filter = '(objectClass=site)'  # Filter to find all site objects
    search_attributes = ['cn']  # We only need the common name (cn) of the sites

    # Perform the search
    connection.search(search_base, search_filter, SUBTREE, attributes=search_attributes)

    # Extract and return site names
    site_names = [entry['cn'].value for entry in connection.entries]
    return site_names

def print_site_names(site_names):
    if site_names:
        print("\nAD Sites:")
        for site in site_names:
            print(f"- {site}")
    else:
        print("No sites found in the domain.")

def main():
    # Prompt for username and password if not available in config.py
    username = USERNAME if USERNAME else input("Enter your LDAP username: ")
    password = PASSWORD if PASSWORD else getpass.getpass("Enter your LDAP password: ")

    site_names = get_all_sites(username, password)
    print_site_names(site_names)

if __name__ == "__main__":
    main()