Author: Lisa

Postgresql with File System Compression – VDO and ZFS

Our database storage is sizable. To reduce the financial impact of storing so much data, we opted to use a compressed file system. This allows us to maintain, for example, 8TB of data in under 2TB of space. Unfortunately, the ZFS file system we use to compress our data is no longer “built in” with newer version of RedHat.

There are alternatives. BTRFS is a long-standing option, however it’s got reliability issues (we piloted BTRFS on one of the read only replicas, and the compression ratio is nowhere near as good — the 2TB of ZFS data filled the 10TB BTRFS disk even using the better compression option. And I/O was so slow there was a continual replication backlog). RedHat introduced Virtual Disk Optimizer to replace ZFS. In theory, it’s better since it also deduplicates data (e.g. if every one of us saved the same PPT presentation to the disk, only one copy would actually be stored). That’s great for email and file shares where a lot of people are likely to store the same information. Not so useful on a database server where there’s little to de-duplicate. It does, however, compress data … so we decided to try it out.

The results, unfortunately, are not spectacular. VDO does not allow you to do much customization of the compression. It’s on or off. I’ve found some people tweaking it up in unsupported ways, but the impetus behind trying VDO was that it’s supported by RedHat. Making unsupported changes to it defeats that purpose. And the compression that we’re seeing is far less than we get in ZFS. Our existing servers run between 4.5x and 6x compression

In VDO, however, we don’t even get a 2x compression factor. 11TB of information is stored in 8TB of space! That’s 1.4x

So, while we found the performance of VDO to be satisfactory and it’s really easy to use in newer RedHat releases … we’d have to increase our 20TB LUNs to 80TB to continue storing the data we store today. That seems like A Really Bad Idea(tm).

Seems like I’m going to have to sort out using OpenZFS on the new servers.

OpenSearch User’s Guide

Below is the user’s guide I put together for individuals who use the ElasticSearch system I support at work to help them transition to the OpenSearch platform we’ll be moving to later this year.

 

Overview

The SYSLOG platform is a set of applications commonly known as ELK (ElasticSearch, Logstash, and Kibana). We will be replacing ElasticSearch with OpenSearch in the upcoming months. Currently, the development platform has been upgraded and is available for anyone who wants to check it out. The new website is called “OpenSearch Dashboards”. While maintaining a similar “look and feel” to the current Kibana platform, moving to OpenSearch provides us with new features – machine learning for data analytics, rollup indices.

What is ElasticSearch?

ElasticSearch, based on the Lucene search software, is a distributed search and analytics application which ingests, stores, and indexes data. Kibana is a web-based front-end providing user access to data stored within ElasticSearch.

What is OpenSearch?

In short, it’s the same but different. OpenSearch is also based on the Lucene search software, is designed to be a distributed search and analytics application, and ingests/stores/indexes data. If it’s essentially the same thing, why does OpenSearch exist? ElasticSearch was initially licensed under the open-source Apache 2.0 license – a rather permissive free software license. ElasticCo did not agree with how their software was being used by Amazon; and, in 2021, the license for ElasticSearch was changed to Server Side Public License (SSPL). One of the requirements of SSPL is that anyone who implements the software and sells their implementation as a service needs to publish their source code under the SSPL license – not just changes made to the original program but all other software a user would require to run the software-as-a-service environment for themselves. Amazon used ElasticSearch for their Amazon Elasticsearch Service offering, but was unable/unwilling to continue doing so under the new license terms. In April of 2021, Amazon Web Services created a fork of ElasticSearch as the basis for OpenSearch.

Differences Between OpenSearch and ElasticSearch

After the OpenSearch fork was created, the product roadmap for ElasticSearch was driven by ElasticCo and the roadmap for OpenSearch was community driven (with significant oversight and input from Amazon) – this means the products are not identical although they provide the same core functionality. Elastic publishes a list of features unique to ElasticSearch, and the underlying machine learning algorithms are different. However, the important components of the “unique” feature list have been implemented in OpenSearch over time.

Requesting Access

Access to the OpenSearch environment can be requested via a Microsoft Forms form.

Managers can request access for their direct reports at https://forms.microsoft.com/Pages/ResponsePage.aspx?id=wbRnJe2w9UCu41jXxfPisjf18JL-t61BkoGkoNxr24lUN1laRDhaU0lRMzRHVzNZRkdCSTlXQUFNNSQlQCN0PWcu

Individual can request access, to be approved by their manager, at https://forms.microsoft.com/Pages/ResponsePage.aspx?id=wbRnJe2w9UCu41jXxfPisjf18JL-t61BkoGkoNxr24lUMk0zRVJIQURTTFc0RkpKOTdINkREM0s4MiQlQCN0PWcu

Logging In

You will be able to log into the OpenSearch Dashboards site using the same https://syslogging.vip.windstream.net:5601/ URL. For now, though, the DEV OpenSearch installation is available at https://ltrkarkvm1577.mgmt.windstream.net:5601 – log in through PingID

And you will be at the OpenSearch Dashboards home page

To navigate around the site, click on the hamburger button in the upper left-hand corner of the page

Switching Tenants

Data from different sources are sorted into “tenants” – you may only have access to a single tenant. In the upper right-hand corner of the page, there will be a circle with a letter – click that circle. The current tenant will be displayed under your username (the tenant shown here is ‘admin_tenant’). On the same menu, there is an option to “Switch tenants”

Clicking “Switch tenants” will bring up a new pane – you can chose the tenant you want from the drop-down. Click “Confirm” to switch to the selected tenant.

Searching

Searching: Discover Overview

To perform ad hoc queries against stored data, use the hamburger button to expand the left-hand navigation menu and select “Discover”.

“Discover” lets you perform ad hoc queries against the stored data using the same query language as was used in ElasticSearch

You will notice that “documents” stored in OpenSearch have key:value pairs (field names and data, categories and info, etc) – in this example, “fields” such as “log.offset”, “agent.hostname”, “@timestamp”, “source”, “tags”, “hostname”, “input.type”, “index”, “Sourcetype”, “host.hostname”, “host.architecture”, etc exist. Each field has a value — “@timestamp” is August 17, 2022 at 16:21:20.518 – and you can use the search to find documents matching your criteria.

Search Tips

Selecting the Time Range

When you search stored data, the default is to search the last 15 minutes of data. To change this time range, click on the calendar icon next to the “Last 15 minutes” text and select the time range you want to display

To search for a specific time range,click on the “Show dates” text

There will now be two fields for the start and end range of the time span. You can click on either the start or end time to set those values. Clicking on one, you will see three options – “Now” means the range start/end will be set to whatever the current date/time is when the search is performed or refreshed. “Relative” allows you to specify an interval (I know this happened in the last three hours, so I can search from 3 hours ago to now).

Absolute allows you to chose a date and time

Viewing Data

When viewing log data, there are several parts of the screen. The top right quadrant has a histogram showing how many matching documents have been found over the time interval.

Hovering your mouse over a column will show you the exact count and time interval – in this example, each column represents a 10 minute period, and 2,788 documents matched the search

Below the histogram, you will see the records – the default display has the document timestamp and the entire message (Source). To view a single record, click the magnifying glass icon next to the record.

A new pane will appear on the right-hand side with the document details.

Clicking the “+” before a field value will filter the records to just those where the field has the value displayed

In this example, we now have all records where “Action” is “Login”

And finally, the left-hand column displays the fields found within the documents. Clicking the magnifying glass next to a field will display the top 5 frequently occurring values for that field – 7.7% of the records have “TELNET” as the port value in this example.

Clicking on the + next to the TELNET value will filter the searched records to just those where Port is TELNET – clicking the minus would filter records to everything where Port is not TELNET

Selecting Displayed Field

When viewing the field list, you can click the plus icon next to a field to change the values displayed in the table data.

The fields displayed in the table will be listed in “Selected Fields” at the top of the fields list

Clicking the red X next to a selected field will remove it from the data table.

Basic Search

To search for a full string, indicate the field name that you want to search followed by a colon. Then enter what you want to find. As an example, finding records from the BNG named bng04.lncl01-ne.us.windstream.net means entering

hostname: bng04.lncl01-ne.us.windstream.net

Wildcard Search

You can use ? to match any single character – this is useful when you are searching for a number of hosts simultaneously (dns??.example.com will match dns01.example.com though dns99.example.com).

You can use * to match zero or more characters – this means dns*.example.com would match dns.example.com, dns1.example.com, and dns1234568793.example.com

Substring Search

To search for a substring, use * around your search string – looking for documents from any of the devices with inmn01-sc in their name, search for:

"source": *inmn01-sc.us*

Search for a String That Contains Word Boundaries

When your search string contains something the tokenizer considers a word boundary (the “.” characters in hostnames, for example), you may find the results contain more than you want. Basically, searching for foo.bar will return any document matching foo and any document matching bar

You can add * as a wildcard around your substring – to find any of the lncl01-ne.us devices, search for

hostname: *lncl01-ne.us*

Searching for text that contains colons (:)

If you are searching for a string that contains colons (for example a MAC address), you need to add quotes around the string

message: "94:1c:56:1c:5b:11" and hostname: *lncl01-ne.us*

Excluding Data From a Search

If your query is returning too many records and there are some that aren’t relevant (in this case, we’re getting a lot of log data and don’t really care about the “stuff” from /var/log/messages which is basically OS related logs). To exclude data, add “not” to your query. As an example:

hostname: *lncl01-ne.us* and not source:"/var/log/messages"

Inspect

Inspect will show you information about the data presented in the search –

Clicking on ‘Request” will show you the API search request that has been made – this is useful for people building programmatic access to data. You can copy/paste the JSON text and submit it to the OpenSearch search API.

New Discover

In addition to the search that matches Kibana, OpenSearch also includes a new search interface. Click “Try new Discover” to switch (you can always click “Use legacy Discover” to switch back!)

The query language and basic functionality is the same, but information is presented in a different format.

Saving Searches

Clicking “Save” will bring up a new pane that allows you to save a query for future use.

Exporting Report Data

Saving a query allows you to click on “Reporting” to export data

This will bring up a menu that allows you to generate a CSV file with the matching records.

Accessing Saved Queries

To access a saved query, select “Open”

On the new menu, select the saved query you wish to view

Creating Visualizations

General

To create a new visualization, use the hamburger button to expand the left-hand navigation menu and click “Visualize”. You will be able to view any existing visualizations.

Click “Create visualization” to create a new visualization.

You’ll need to select the type of visualization you wish to create.

TSVB (Time Series Visualization Builder)

The Time Series Visualization Pipeline is a GUI visualization builder to create graphs from time series data. This means the x-axis will be datetime values and the y-axis will the data you want to visualize over the time period. To create a new visualization of this type, select “TSVB” on the “New Visualization” menu.

Scroll down and select “Panel options” – here you specify the index you want to visualize. Select the field that will be used as the time for each document (e.g. if your document has a special field like eventOccuredAt, you’d select that here). I generally leave the time interval at ‘auto’ – although you might specifically want to present a daily or hourly report.

Once you have selected the index, return to the “Data” tab. First, select the type of aggregation you want to use. In this example, we are showing the number of documents for a variety of policies.

The “Group by” dropdown allows you to have chart lines for different categories (instead of just having the count of documents over the time series, which is what “Everything” produces) – to use document data to create the groupings, select “Terms”.

Select the field you want to group on – in this case, I want the count for each unique “policyname” value, so I selected “policyname.keyword” as the grouping term.

Voila – a time series chart showing how many documents are found for each policy name. Click “Save” at the top left of the chart to save the visualization.

Provide a name for the visualization, write a brief description, and click “Save”. The visualization will now be available for others to view or for inclusion in dashboards.

 

Map Visualizations

Before we can use map details in OpenSearch visualizations, we need to add fields with the geographic information. The first few steps are something the ELK admin staff will need to do in order to map source and/or destination IPs to geographic information. Once GeoIP information is available in the index pattern, select the “Maps” visualization

Leave the road map layer there (otherwise you won’t see the countries!)

Select ‘Documents’ as the data source to link in ElasticSearch data

 

Select the index pattern that contains your data source (if your index pattern does not appear, then Kibana doesn’t recognize the pattern as containing geographic fields … I’ve had to delete and recreate my index pattern so the geographic fields were properly mapped … but refreshing the fields should be sufficient).

And select the field(s) that contain geographic details:

You can name the layer

And add a tool tip that will include the country code or name

Under “Term joins”, add a new join. Click on “Join –select–” to link a field from the map to a field in your dataset.

In this case, I am joining the two-character country codes —

Normally, you can leave the “and use metric count” in place (the map is color coded by the number of requests coming from each country). If you want to add a filter, you can click the “where — add filter –” link to edit the filter.

In this example, I don’t want to filter the data, so I’ve left that at the default.

Click “Save & close” to save the changes to the map visualization. To view your map, you won’t find it under Visualizations – instead, click “Maps” along the left-hand navigation menu.

Voila – a map where the shading on a country gets darker the more requests have come from the country.

TimeLine

TimeLine can be used to build time series graphs.

 

This visualization type is a little cryptic – you need to enter Timelion expression — just change the “.es(…)” to “.opensearch(…)” to retrieve data from OpenSearch

If there is null data at a time value, TimeLine will draw a discontinuous line. You can modify this behavior by specifying a fit function.

Note that you’ll need to click “Update” to update the chart before you are able to save the visualization.

Vega

Vega is an experimental visualization type.

This is, by far, the most flexible but most complex approach to creating a visualization. I’ve used it to create the Sankey visualization showing the source and destination countries from our firewall logs. Both Vega and Vega-Lite grammars can be used. ElasticCo provides a getting started guide, and there are many example online that you can use as the basis for your visualization.

Learning Vega

Both Vega and Vega-Lite are … not the best documented grammars I’ve ever encountered. This means there’s a lot of trial and error involved in getting a chart set up. There’s a Vega web editor that allows you to see the data tables that are being used to create graphs – this lets you confirm your transformations and such are functional.

Debugging Vega Graphs

If you open the browser’s developer console, you can access debugging information. This works when you are editing a visualization as well as when you are viewing one. To see a list of available functions, type VEGA_DEBUG. and a drop-down will show you what’s available. The command “VEGA_DEBUG.vega_spec” outputs pretty much everything about the chart.

To access the data set being graphed with the Vega Lite grammar, use “VEGA_DEBUG.view.data(“source_0)” — if you are using the Vega grammar, use the dataset name that you have defined.

Vega Sample – Horizontal Line with Baseline

This is the graph that’s used for the OTDR scans – someone identifies a baseline “as good as this circuit is going to get” loss value that is retained forever, and that value is used as a comparison for future scans. Sounded easy enough – draw a horizontal line at this y-value. Finding this y-value required using an array of data elements instead of a single data element. Drawing the line, however, was challenging.

There’s a “rule” mark that draws a straight line between two points. You cannot just say “draw a line at y from 0 to some large value that’s going to be off the graph. You cannot just get the max value of the axis.

The joinaggregate transformation method — which appends the value to each element of the data set — was essential because I needed to know the largest datetime value that would appear in the chart.

, {“type”: “joinaggregate”, “fields”: [“transformedtimestamp”], “ops”: [“max”], “as”: [“maxtime”]}

The lookup transformation method — which can access elements from other data sets — allowed me to get that maximum timestamp value into the baseline data set. Except … lookup needs an exact match in the search field. Luckily, it does return a random (I presume either first or last … but it didn’t matter in this case because all records have the same max date value) record when multiple matches are found.

So I used a formula transformation method to add a constant to each record as well

, {“type”: “formula”, “as”: “pi”, “expr”: “PI”}

Now that there’s a record to be found, I can add the max time from our scan data into our baseline data

, {“type”: “lookup”, “from”: “scandata”, “key”: “pi”, “fields”: [“pi”], “values”: [“maxtime”], “as”: [“maxtime”]}

Voila — a chart with a horizontal line at the baseline loss value. Yes, I randomly copied a record to use as the baseline and selected the wrong one (why some scans are below the “good as it’s ever going to get” baseline value!). But … once we have live data coming into the system, we’ll have reasonable looking graphs.

The Vega spec for this graph:

{
“$schema”: “https://vega.github.io/schema/vega/v4.json“,
“description”: “Scan data with baseline”,
“padding”: 5,


    "title": {
        "text": "Scan Data",
        "frame": "bounds",
        "anchor": "start",
        "offset": 12,
        "zindex": 0
      },
    "data": [
    {
        "name": "scandata",
        "url": {
            "%context%": true,
            "%timefield%": "@timestamp",
            "index": "traces-*",
            "body": {
            "sort": [{
                "@timestamp": {
                    "order": "asc"
                }
            }],
            "size": 10000,
            "_source":["@timestamp","Events.Summary.total loss"]
            }
        }
        ,"format": { "property": "hits.hits"}
        ,"transform":[
            {"type": "formula", "expr": "datetime(datum._source['@timestamp'])", "as": "transformedtimestamp"}
            , {"type": "joinaggregate", "fields": ["transformedtimestamp"], "ops": ["max"], "as": ["maxtime"]}
            , {"type": "formula", "as": "pi", "expr": "PI"}
        ]
    }
  ,
   {
        "name": "baseline",
        "url": {
            "%context%": true,
            "index": "baselines*",
            "body": {
                "sort": [{
                    "@timestamp": {
                        "order": "desc"
                    }
                }],
                "size": 1,
                "_source":["@timestamp","Events.Summary.total loss"]
            }
        }
        ,"format": { "property": "hits.hits" }
        ,"transform":[
                {"type": "formula", "as": "pi", "expr": "PI"}
                , {"type": "lookup", "from": "scandata", "key": "pi", "fields": ["pi"], "values": ["maxtime"], "as": ["maxtime"]}
        ]
  }
]      
,
    "scales": [
      {
        "name": "x",
        "type": "point",
        "range": "width",
        "domain": {"data": "scandata", "field": "transformedtimestamp"}
      },
      {
        "name": "y",
        "type": "linear",
        "range": "height",
        "nice": true,
        "zero": true,
        "domain": {"data": "scandata", "field": "_source.Events.Summary.total loss"}
      }
    ],
        "axes": [
      {"orient": "bottom", "scale": "x"},
      {"orient": "left", "scale": "y"}
    ],
     "marks": [
                {
            "type": "line",
            "from": {"data": "scandata"},
            "encode": {
              "enter": {
                "x": { "scale": "x", "field": "transformedtimestamp", "type": "temporal",
      "timeUnit": "yearmonthdatehourminute"},
                "y": {"scale": "y",       "type": "quantitative","field": "_source.Events.Summary.total loss"},
                "strokeWidth": {"value": 2},
                "stroke": {"value": "green"}
              }
            }
          }
                 ,        {
            "type": "rule",
            "from": {"data": "baseline"},
            "encode": {
              "enter": {
                "stroke": {"value": "#652c90"},
                "x": {"scale": "x", "value": 0},
                "y": {"scale": "y",      "type": "quantitative","field": "_source.Events.Summary.total loss"},
                "x2": {"scale": "x","field": "maxtime", "type": "temporal"},
                "strokeWidth": {"value": 4},
                "opacity": {"value": 0.3}
              }
            }
          }
     ]         
}

Other Types:

Area

Controls allow you to add a control bar to a dashboard – this enables the user to select date ranges, add filters, and otherwise control the data being displayed in the dashboard.

Coordinate Map

Data Table

Gantt Chart

Gauge

Goal

Heat Map

Horizontal Bar

Line

Markdown

Metric

PPL

Pie

Region Map

Tag Cloud

Vertical Bar

VisBuilder

Creating a Dashboard

General

To create a dashboard, use the hamburger button to expand the left-hand navigation menu and select “Dashboards”

Click “Create” to create a new dashboard

Select the type of dashboard you wish to create

Dashboard

Click “Add” to add existing visualizations to the dashboard (or create a new one).

Select the dashboards you want added, then click “Save” to save your dashboard.

Provide a name and brief description, then click “Save”.

Observability Dashboard

Provide a name for the dashboard

Use the “Add visualization” button to create new visualizations or link existing ones into the dashboard.

More details about the Observability plugin can be found at https://github.com/opensearch-project/dashboards-observability/wiki and https://github.com/opensearch-project/dashboards-observability/wiki

Tableau — Who Deleted That Workbook?!?

While Tableau doesn’t have anything nice like a ‘dumpster’ from which you can restore a deleted workbook, it does at least keep tables for historic events like workbook deletion. The following query finds records where a workbook with FOOBAR in its name was deleted. It lists all of the event info as well as info on the user who deleted it. Near as I can tell, the “created” date for the historical_events table is the date the workbook was deleted (from my restore, I know the workbook itself was created last year!)

SELECT historical_events.*, hist_workbooks.*, hist_users.*
FROM historical_events
left outer join historical_event_types on historical_event_types.type_id = historical_events.historical_event_type_id 
left outer join hist_workbooks on hist_workbooks.id = historical_events.hist_workbook_id 
left outer join hist_users on hist_users.id = historical_events.hist_actor_user_id 
WHERE historical_event_types.name = 'Delete Workbook'
and hist_workbooks.name like '%FOOBAR%'
;

Grilled Lamb

For Mother’s Day, we grilled lamb steaks. Anya made a garlic sauteed corn, and we had some cherry whiskey (since there was cherry wood in the wood fire … figured that would match). Quick salt and pepper on the steaks, then seared over the fire and cooked for about 20 minutes. Rested inside, and we had a delicious meal.

Oracle SQLNET.ORA Active Directory Anonymous Binds

A very, very long time ago (2002-ish), we moved to using AD to store our Oracle connections — it’s far easier to edit the one connection entry in Active Directory than to distribute the latest connection file to every desktop and server in the company. Frankly, they never get to the servers. Individuals enter the connections they need … and update them when something stops working and they find the new host/port/etc. Unfortunately, Oracle used an anonymous connection to retrieve the data. So we’ve had anonymous binds enabled in Active Directory ever since. I no longer support AD, so haven’t really kept up with it … until a coworker asked why this huge security vulnerability was specifically configured for our domain. And I gave him the whole history. While we were chatting, a quick search revealed that Oracle 21c and later clients actually can use a wallet for credentials in the sqlnet.ora file:

MES.LDAP_AUTHENTICATE_BIND = TRUE
NAMES.LDAP_AUTHENTICATE_BIND_METHOD = LDAPS_SIMPLE_AUTH
WALLET_LOCATION = (SOURCE = (METHOD = FILE)
(METHOD_DATA =  (DIRECTORY = /path/to/wallet.file)  )

From https://www.oracle.com/a/otn/docs/database/oracle-net-active-directory-naming.pdf

 

A Bird in your pocket?

I had clothes drying on the line, and Anya grabbed them just before sunset. She quickly tossed everything into a basket and brought it into the house. A short time later, I got the clothes out of the basket to fold. I took out a pair of Scott’s jeans and … out flies this little bird! No idea how she got in there, and I’m really glad she didn’t seem to be hurt in any way. We turned off all the interior lights, opened the front door, and turned out the outside light. A little following her around and trying to persuade her to go in the right direction and swoop, she flew outside.