On this page
Photo by Caspar Camille Rubin on Unsplash.

Recently I wanted to go through the bunch of facts from different parts of the infrastructure I maintain. I’ve been already collecting Ansible facts in Redis to speed things up and then I noticed that they are actually still stored in JSON. I always had a soft spot for SQLite and I toyed with the idea of having Ansible facts in a queryable form. The missing piece? Something to transfer these JSON facts from Redis to SQLite. Let’s script it!

Redis to SQLite

This is the kind of task that Python really shines at:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import json
import redis
import sqlite3


def main(redis_match, db_file, db_table):
    """Grab data from Redis."""
    r = redis.Redis()
    data = r.mget(r.scan_iter(match=redis_match))

    """Prepare table in SQLite."""
    conn = sqlite3.connect(db_file)
    c = conn.cursor()
    c.execute(f"""
      CREATE TABLE IF NOT EXISTS {db_table}
      (host varchar(255) PRIMARY KEY, data json)
      """)
    conn.commit()

    """Import data from Redis to SQLite."""
    for host in data:
        host_json = json.loads(host)
        c.execute(
          f"REPLACE INTO {db_table} values (?, ?)",
          [host_json["ansible_fqdn"], json.dumps(host_json)],
        )
        conn.commit()
    conn.close()


if __name__ == "__main__":
    """
    Pass redis key for matching, SQLite file name and table name for the facts.
    """
    main("af_*", "./facts.db", "facts")

Line 38 defines the naming that is then applied – be sure to edit this part to your needs. You may also need to include some authentication or different address, port and/or DB number to access Redis in the first place – as I run it locally on my laptop with default configuration I use no such thing. You may be wondering that does af_* stands for and you are absolutely correct – it stands for Ansible Facts. Here’s how I query Redis directly:

redis-cli keys 'af_*'
1) "af_beastie.cloud.local"
*** IMAGINE MANY MORE, HUNDREDS OF HOSTS BELOW ***

The af_ part can be set to anything, including nothing. Be sure to checkthese docs for more details. OK, so how about getting that JSON file I talked about earlier on?

redis-cli GET 'af_beastie.cloud.local'
*** LONG JSON OUTPUT WITH ALL THE FACTS ***

You may wanna pipe it to some python -m json.tool or jq for pretty printing.

This way I have redis_match defined as af_*. It will query and catch all the hosts with this prefix set. That’s precisely what this function does:

"""Grab data from Redis."""
r = redis.Redis()
data = r.mget(r.scan_iter(match=redis_match))

All of these hosts are then assigned to data and we can then carry on to store these JSON files in the SQLite. The cool thing is that SQLite have a built-in support for JSON – I had no idea that’s the case until I read this fabulous article, SQLite is not a toy database.

Next step was to create table for storing the data, but only when it’s not already there (hence CREATE TABLE IF NOT EXISTS) and then the loop kicks in to store or replace the data (hence REPLACE INTO ). I’m using hostname as unique key (taken from the facts directly via [host_json["ansible_fqdn"]) and then I push the entire JSON into the data column. It’s very simple and ultra fast.

Once the data is in the SQLite, you may query it like so:

select
  json_extract(data, '$.ansible_hostname') as hostname,
  json_extract(data, '$.ansible_default_ipv4.address') as ip,
  json_extract(data, '$.ansible_lsb.description') as os,
  json_extract(data, '$.ansible_kernel') as kernel
from facts limit 4;

test11|10.0.0.11|Ubuntu 21.04|5.11.0-11-generic
test12|10.0.0.12|Ubuntu 18.04.5 LTS|5.4.0-48-generic
test13|10.0.0.13|Ubuntu 18.04.5 LTS|5.4.0-58-generic
test14|10.0.0.14|Ubuntu 18.04.3 LTS|5.4.0-71-generic

Anything that is stored as a fact can be queried. What’s even better is that it can also save the output query in different ways – some of them are super cool, like html or markdown! Here’s an example workflow of a query from four random hosts storing it in a markdown formatted file:

.mode markdown
.output table.md
select
  json_extract(data, '$.ansible_hostname') as hostname,
  json_extract(data, '$.ansible_default_ipv4.address') as ip,
  json_extract(data, '$.ansible_lsb.description') as os,
  json_extract(data, '$.ansible_kernel') as kernel
from facts limit 4;

There are many more outputs available – to list them while having SQLite open, simply type .help mode. Note that different versions have different modes provided. The .output table.md specifies the output file in the current directory that the data will be stored in. Here’s how the resulting file looks like:

| hostname |      ip      |         os         |      kernel      |
|----------|--------------|--------------------|------------------|
| test11   | 10.0.0.11    | Ubuntu 20.04.2 LTS | 5.8.0-49-generic |
| test12   | 10.0.0.12    | Ubuntu 18.04.5 LTS | 5.4.0-47-generic |
| test13   | 10.0.0.13    | Ubuntu 18.04.5 LTS | 5.4.0-47-generic |
| test14   | 10.0.0.14    | Ubuntu 18.04.5 LTS | 5.4.0-48-generic |

That’s neat.

Another nice thing is that there are also some GUI apps available for SQLite. One I really like is DB Browser for SQLite which I use to prepare queries by simply pretty printing the JSON file stored in the data column.

Alternatives

At some point I realized that storing facts in JSON files is quite common so I decided to see how much effort would it be to grab the Ansible facts from them instead of Redis. It turned out not hard at all.

JSON files to SQLite

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import json
import os
import sqlite3


def main(facts_path, db_file, db_table):
    """Prepare table in SQLite."""
    conn = sqlite3.connect(db_file)
    c = conn.cursor()
    c.execute(f"""
      CREATE TABLE IF NOT EXISTS {db_table}
      (host varchar(255) PRIMARY KEY, data json)
      """)
    conn.commit()

    """Import data from JSON files to SQLite."""
    with os.scandir(facts_path) as path:
      for file in path:
          if file.name.endswith(".json") and file.is_file():
              with open(file.path) as f:
                  host_json = json.load(f)
                  c.execute(
                      f"REPLACE INTO {db_table} values (?, ?)",
                      [host_json["ansible_fqdn"], json.dumps(host_json)],
                  )
                  conn.commit()
    conn.close()


if __name__ == "__main__":
    """
    Pass facts path, SQLite file name and table name.
    """
    main("./facts", "./facts.db", "facts")

The changes are rather slight. Everything that was related to Redis was ripped and replaced with os.scandir with a loop over the facts_path (it assumes the files to be imported to have .json extension). The rest is essentially untouched.

Import to PostgreSQL

After SQLite I also wanted to check whether any of the “big” DBMS also support JSON and sure enough they do. I picked Postgres as I already had entire stack in place (shoutout to the great Postgres.app).

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import json
import psycopg2
import redis


def main(redis_match, db_table):
    """Grab data from Redis."""
    r = redis.Redis()
    data = r.mget(r.scan_iter(match=redis_match))

    """Prepare table in PostgreSQL."""
    conn = psycopg2.connect(
        database='dbname',
        user='dbuser',
        password='dbpass',
        host='localhost',
        port='5432',
    )
    c = conn.cursor()
    c.execute("""
        CREATE TABLE IF NOT EXISTS {}
        (host varchar(255) PRIMARY KEY, data jsonb)""".format(db_table))
    conn.commit()

    """Import data from Redis to PostgreSQL."""
    for host in data:
        host_json = json.loads(host)
        query = """
            INSERT INTO {} (host, data)
            VALUES (%s, %s)
            ON CONFLICT (host) DO UPDATE SET
            data = EXCLUDED.data;
        """.format(db_table)
        c.execute(query, [host_json["ansible_fqdn"], json.dumps(host_json)])
        conn.commit()
    conn.close()


if __name__ == "__main__":
    """
    Pass redis key for matching and table name for the facts.
    """
    main("af_*", "facts")

Obviously things like dbname, dbuser and dbpass etc. need to be aligned with your setup. Same goes for changes related to querying Redis and name of the table.

Postgres doesn’t provide REPLACE INTO as SQLite (which is super handy in such use case), but this can be circumvented with ON CONFLICT [...] DO UPDATE so that if the entry with the given primary key already exists, its data is going to be updated/replaced.

So, to replicate the query that printed hostname, ip, os release and kernel version:

select
  jsonb_path_query(data, '$.ansible_hostname') as hostname,
  jsonb_path_query(data, '$.ansible_default_ipv4.address') as ip,
  jsonb_path_query(data, '$.ansible_lsb.description') as os,
  jsonb_path_query(data, '$.ansible_kernel') as kernel
from facts limit 4;

hostname   |        ip     |          os          |       kernel
-----------+---------------+----------------------+--------------------
 "test11"  | "10.0.0.11"   | "Ubuntu 18.04.5 LTS" | "5.4.0-47-generic"
 "test12"  | "10.0.0.12"   | "Ubuntu 18.04.5 LTS" | "5.4.0-47-generic"
 "test13"  | "10.0.0.13"   | "Ubuntu 18.04.5 LTS" | "5.4.0-48-generic"
 "test14"  | "10.0.0.14"   | "Ubuntu 18.04.4 LTS" | "5.3.0-40-generic"
(4 rows)

Summary

I find it very handy to be able to query data from all the facts gathered from the entire infrastructure I maintain. Especially the part where I need to create some quick reports and I can simply do this with (almost) ordinary SQL query, not to mention the absolutely awesome possibility to export that kind of data to variety of different formats.

With that all said, there’s a project called Ansible-CMDB and there’s a huge chance that it does even more than what I’m doing here with SQLite – be sure to give it a good go. That would be all.