We finally saw the new kittens, well kitten, on the back patio. Anya put a trap out, and the curious little one walked right in. She put another trap out, and we caught the mommy cat too! We’ll get her into the vet shortly, and the barn cats won’t sprout more kittens.
Month: May 2023
NEO4J: Relational Databases
I always found it odd that relational databases didn’t really have relationships as part of the stored data. Technically, they could if you had a view or stored procedure with a bunch of JOIN’s in there. But relational databases stored data about which you could build relationships. And people using the data may not even know about relationships that other people saw within that data. Some of our databases at work have amazing documentation — a hundred meg of PDF files detailing the relationships that the database creator wanted people to use. Other databases? No such luck!
To me, that’s the big advantage of a Graph database — the relationships are stored within the data, so anyone viewing it for the first time doesn’t need to poke around, see where values could be correlated across tables, and create their own JOIN statements to build out the relationship. Recording data in the database means defining those relationships. It would still be good to document a graph model (we have ‘people’ nodes who ‘act in’, ‘cast’, ‘produce’, or ‘direct’ ‘movie’ nodes), but you could figure all of those things out by perusing the database content.
Creamy Tomato Sauce for Pasta
I needed a quick meal last night — so I made a carrot-y, creamy tomato sauce for pasta.
Ingredients
- 1/4 cup olive oil
- 2 carrots, diced into small pieces
- 1/2 large onion, diced into small pieces
- 3-4 cloves of garlic, diced
- 3 oz tomato paste
- 1/3 cup plain Greek yogurt
- salt, pepper, Italian herbs
Method
- Heat olive oil in a pan. Add diced carrots and cook for a few minutes. Add garlic and onions and cook for a few more minutes until the onions become translucent.
- Add salt, pepper, and herbs. Then add tomato paste and stir to incorporate oil. Cook for a few minutes.
- Add pasta cooking water to thin to a reasonable consistency.
- Remove from heat and stir in yogurt. Add salt/pepper/herbs to taste.
NEO4J: Applications
I remember a friend of mine who taught introductory computer programming classes in University. One test question he always used was essentially ‘do something that people can do easily’. The exact details would change year to year, but the important thing is that the instructions simply stated “Sort the following list in alphabetical order”. Not write a program that sorts an arbitrary list into alphabetical order — sort this list. And, sure, you could write a program to do it. A program that would successfully accomplish the goal got an A grade. But so did someone who just took the list, sorted it alphabetically in their head, and wrote the list in alphabetical order. That answer? Would get extra credit. Because knowing when not to program is really important too. From a business standpoint, it’s a waste of money for someone to write a program to perform some easy one-off task that is never going to be done again.
I think about that a lot when I see “new” technologies getting picked up. I call it the “CIO magazine” approach to technology adoption. You see some new thing, have a very basic understanding of how it works, and decide we need to use one of these. And fail to consider if your use case is reasonable or if you’re willing to do the extra work for the “new” thing. The biggest example I experience of “not reasonable” is the prevalence of Java programming. If I am selling software, cross-platform compilation is A Very Good Thing. If I can maintain a single pipeline and a single release that all of my customers can use? Score! Internally developed software, though? We requisition specific platforms. Our Linux servers are not going to be Windows servers next Tuesday. I can write code, compile it for Linux, and be fine even if it doesn’t run under Windows. Because it doesn’t need to run under Windows. The extra work — our internal support groups adopted Agile. Except no one was willing to prioritize the ticket queue — so there’s no prioritized list of work to select from. Everyone has two “issues” for their sprint — “incident support” and “admin time”. A few people might get involved in a specific project and add “dns decom” or “nextgen vpn testing”. But sprint planning is repetitive (I’ve got incident support & admin time, too!), daily standups were a joke (I did tickets yesterday & had a meeting), and techs still didn’t know what ticket was the priority that should be pulled next. Which doesn’t make Java or Agile a bad idea — it just makes them overly complicated for the situation in which they are being used.
I think of all of this when I see Graph databases being implemented — step #0 is does a graph database make sense for my data? What would that mean? It would mean that the data elements are interconnected somehow — if you’d be using a lot of JOIN queries to interact with the stored data, then a graph model might make sense. If you’re just selecting items from individual tables where values are whatever? Then a graph database is a complex way of storing and accessing that data.
Turkey Hatchlings v/s Turkeys in the Mail
The first time we bought baby poultry, we picked them up from a local(ish) hatchery. The chicks hatched overnight, were sorted in the morning, and we picked them up in the afternoon. Happy, healthy chicks. The second time, the hatchery was halfway across the country but offered overnight shipping. That’s not a cheap option, but the birds were still happy and healthy when they arrived. Then we wanted to raise turkeys.
We ordered from a well known hatchery, and the only option was “shipping”. They shipped once a week. And USPS shipping was amazingly slow. So very slow. The USPS employee at the local central depot rang us on Saturday morning to see if we could come pick the birds up because he didn’t think they would survive until they were delivered on what would probably be Tuesday. We did, but only one of the birds survived even though we spent the weekend nursing sick birds.
Last year, we tried again — ordered from another well known hatchery. I couldn’t find a hatchery that offered overnight or two-day shipping. But I was able to find one willing to let me pay a little extra to have additional food added to the shipping box. The chicks arrived, but they were still not super spry.
This year, we hatched our first turkey poults. It’s amazing how much easier it is to get them eating and drinking when you start at day zero! The little guys spent about 12 hours in the incubator drying off, then they spent another 12+ hours sleeping under the heater. Then they were hopping around, investigating everything, and being birds. After sprinkling moistened food on the floor and adding tiny bits of plants (clover and dandelion greens) to the top of the water, the little guys were eating and drinking. And, when would find food or water … all of the other poults rush over to investigate.
Greenhouse Turkeys
We have a greenhouse full of turkeys! It was turkey liberation day at the farm — all of the poults that hatched last week are now in a larger brooder in the greenhouse. They’ve got a larger “turkey toaster” (a plate heater) for the evenings, but they are running around, exploring, and happily eating green leafy things.
Using Graph Database to Track Plant Hybridization
Graph databases are designed to both store data and record relationships between data elements. I wondered if this would be useful in tracking cross-breeding projects – essentially building “family trees” of the entity being cross-bred. The data model would have nodes with the hybrid with notes on it. Relationships for PARENT_M and PARENT_F (male and female parent of the hybrid) would be used to associate nodes.
Graph databases have a concept of pathing – what nodes do we need to traverse to get from A to B – but to create a lineage for the plants, you need to know the starting point. Which is great if you want to play six degrees of separation and find a path between two known people, but not great if I just want to know what the lineage is of Tomato #198. To make pathing possible, I needed to add a common root node to all of heirloom seedstocks – PLANT0
This allows me to take any plant and find the paths from PLANT0 to it
MATCH p=(Tomato0:TOMATO {name: 'PLANT0'})-[*]->(Tomato8:TOMATO{name: 'Tomato0000008'}) RETURN p
And visualize the genetic heritage of the hybrid.
Neo4J — Setting Up and Basic Record Management
Setting up a Neo4J Database
Ostensibly, you can create a new database using “create database somethingorother”. However, that is if you are using the enterprise edition. Running the community edition, you can only run one database. Attempting to create a new database will produce an error indicating Neo.ClientError.Statement.UnsupportedAdministrationCommand
To use a database with a custom name, I need to edit neo4j.conf and set initial.dbms.default_database
Then create a Docker container – I am mapping /data to an external directory to persist my data and /var/lib/neo4j/conf to an external directory to persist configuration
docker run -dit --name neo4j --publish=7474:7474 --publish=7687:7687 --env=NEO4J_AUTH=none --volume=/docker/neo4j/data:/data --volume=/docker/neo4j/conf:/var/lib/neo4j/conf neo4j
Listing the databases using “show databases” will show my custom database name
Switch to our database with the “:use” instruction
Create single nodes
CREATE (:PLANTS {name: ‘Black Krim’, year: 1856, color: ‘deep red’, flavor: ‘sweet’, notes: ‘Heirloom seedstock’})
Note: After I started using my data, I realized that “PLANTS” is a silly label to use since they will all be plants. I recreated all of my data with nodes labeled “TOMATO” so I can also track peppers, daffodils, and any other plants we start hybridizing.
Load of all records:
CREATE(:TOMATO {flavor: "air",notes: "hypothetical",color: "invisible",year: "1",name: "PLANT0"}); CREATE(:TOMATO {flavor: "acidic",notes: "Heirloom seedstock",color: "purple",year: 1890,name: "Cherokee Purple"}); CREATE(:TOMATO {flavor: "sweet",notes: "Heirloom seedstock",color: "bright red",name: "Whittemore"}); CREATE(:TOMATO {flavor: "sweet",notes: "Heirloom seedstock",color: "deep red",year: 1856,name: "Black Krim"}); CREATE(:TOMATO {name: 'Kellogg', color: 'bright red', flavor: 'sweet', year: '1900', notes: 'beautiful and tasty'}); CREATE(:TOMATO {name: 'Brandywine', color: 'bright red', flavor: 'sweet', year: '1900', notes: 'very tasty'}); CREATE(:TOMATO {name: 'Japanese Trifele Black', color: 'dark purple red', flavor: 'sweet', year: '1900', notes: 'nice acidic flavor'}); CREATE(:TOMATO {name: 'Sweet Apertif', color: 'bright red', flavor: 'sweet', year: '1900', notes: 'cherry'}); CREATE(:TOMATO {name: 'Eva Purple', color: 'dark purple red', flavor: 'sweet', year: '1900', notes: 'did not grow well'}); CREATE(:TOMATO {name: 'Mortgage Lifter', color: 'bright red', flavor: 'sweet', year: '1900', notes: 'huge but lacking flavor and lots of bad spots'}); CREATE(:TOMATO {flavor: "sweet",notes: "",color: "deep red",year: "2021",name: "Tomato0000001"}); CREATE(:TOMATO {flavor: "bland",notes: "small tomatoes with little flavor",color: "red",year: "2021",name: "Tomato0000002"}); CREATE(:TOMATO {flavor: "watery",notes: "not much acid",color: "pinkish",year: "2022",name: "Tomato0000003"}); CREATE(:TOMATO {flavor: "sweet",notes: "sweet, slightly acidic",color: "red",year: "2022",name: "Tomato0000004"}) ; CREATE (:TOMATO {name: 'Tomato0000005', color: 'bright red', flavor: 'sweet', year: '2022', notes: 'amazing'}); CREATE (:TOMATO {name: 'Tomato0000006', color: 'bright red', flavor: 'sweet', year: '2023', notes: 'beautiful and tasty but no bigger than parent'}); CREATE (:TOMATO {name: 'Tomato0000007', color: 'bright red', flavor: 'sweet', year: '2023', notes: 'beautiful and tasty but no bigger than parent'}); CREATE (:TOMATO {name: 'Tomato0000008', color: 'bright red', flavor: 'sweet', year: '2023', notes: 'beautiful and tasty, slightly larger than parent'});
Show records with MATCH
The search starts with the verb “MATCH”. In parenthesis, we add the matching rule. This begins with an object name variable – you can have anonymous nodes (no variable names assigned) by omitting this string and just typing the colon. This is followed by the label that we want to match – basically the type of node we are looking for. Then, in curly braces, a filter – in this case, I am looking for nodes where the “name” field has the value “Black Krim”. Finally, there’s a return statement that indicates that we want to output the matched results.
You can include relationships in the query – parenthesis around nodes and square brackets around relationships.
(placeholdername:nodes)-[:RELATIONSHIP_CONNECTION_TYPE]->(anotherplaceholdername:otherNodes)
This is what makes graph databases interesting for tracking hybridization – we can easily produce the lineage of the plants we develop.
Deleting a record
Deleting a record is used in conjunction with match — use the DELETE verb on the collections of objects returned into your variable name. Here, the variable is ‘x’:
MATCH (x:PLANTS{name: 'Black Krim'}) DELETE x
Deleting a record and relationships by ID
When deleting a record, you can include relationship matches:
MATCH (p:PLANTS) where ID(p)=1 OPTIONAL MATCH (p)-[r]-() DELETE r,p
Create a relationship
To create a relationship, we first need to match two objects – here I am finding a plant named PLANT0 and all of the “heirloom seedstock” plants to which I assigned year 1900 – and create parent/child relationships. Since there is both a male and female parent, that is included in the relationship name:
MATCH (a:TOMATO), (b:TOMATO) WHERE a.name = 'PLANT0' AND b.year = '1900' CREATE (a)-[r:PARENT_M]->(b) MATCH (a:TOMATO), (b:TOMATO) WHERE a.name = 'PLANT0' AND b.year = '1900' CREATE (a)-[r:PARENT_F]->(b)
Create records with parent/child relationships
You can create records and relationships in a single command, too:
CREATE p = (:PLANTS {name: 'Black Krim', year: 1856, color: 'deep red', flavor: 'sweet', notes: 'Heirloom seedstock'})-[:PARENT_M]->(:PLANTS {name: 'Tomato0000001', color: 'deep red', flavor: 'sweet', year: '2023', notes: ''})<-[:PARENT_F]-(:PLANTS {name: 'Cherokee Purple', color: 'purple', flavor: 'acidic', year: 1890, notes: 'Heirloom seedstock'}) RETURN p
Viewing Records with Relationships
When you match records, you will also get their relationships:
Bulk importing data
LOAD CSV WITH HEADERS FROM ‘https://www.rushworth.us/lisa/ljr_plant_history.csv’
NEO4J: Basics
When I first encountered the idea of labeled property graph data storage, I thought about the Dirk Gently Holistic Detective Agency series — where a belief in the fundamental interconnectedness of things plays a central role to the plot. Traditional SQL storage stored information and relationships were defined by cross-referencing a field (or more) between tables. The creators of graph databases sought to store the relationships along with data elements.
A labeled property graph database store nodes – the circles below – and define relationships – the lines – between nodes. Relationships are a way to show the interconnectedness of all things — people who purchased items, individuals who acted or directed films. I am looking at graph databases to maintain plant hybridization records — relationships build the family tree.
Nodes can have a label – essentially a classification, here I have “Candidate” nodes and “Election” nodes. A node can have multiple labels — maybe a political party affiliation as well as candidate. Nodes can include relationships to themselves (I’ve met me?)
Relationships have a type – WON or LOST here. Relationships are also directional – an election didn’t win a dude, a dude won an election.
Both nodes and relationships can have properties – additional information about the entity. In this example, each election node stores a year in which the election took place along with the vote totals both popular vote and electoral college results.
Tableau Query — Data Sources and the Workbooks Where They Are Used
I have found Tableau’s views of data sources to be … lacking. To provide a report of data sources, the database type, and where it is being used, I put together a query that locates all data sources (or filters by database type — specifically, I was trying to see who was using Snowflake) and lists the site, project, and workbook using the data source.
-- Data sources and what workbook they are used in
SELECT system_users.email , datasources.id, datasources.name, datasources.created_at, datasources.updated_at, datasources.db_class, datasources.db_name
, datasources.site_id, sites.name AS SiteName, projects.name AS ProjectName, workbooks.name AS WorkbookName
FROM datasources
LEFT OUTER JOIN users ON users.id = datasources.owner_id
LEFT OUTER JOIN system_users ON users.system_user_id = system_users.id
LEFT OUTER JOIN sites ON datasources.site_id = sites.id
LEFT OUTER JOIN projects ON datasources.project_id = projects.id
LEFT OUTER JOIN workbooks ON datasources.parent_workbook_id = workbooks.id
-- WHERE datasources.db_class = 'snowflake'
ORDER BY datasources.created_at;