Other articles

Published: Thu 22 February 2024
sql databases sqlite

NULLS NOT DISTINCT in SQLite Unique Indexes

Recently I had the problem that I needed a unique index in an SQLite database, but some fields of my table had null values, which I needed to be considered equal, or NOT DISTINCT. Some database systems allow to specify the behaviour when creating indexes, but SQLite only supports the NULLS DISTINCT behaviour. In this article, we explore two ways to express the NULLS NOT DISTINCT behaviour in SQLite.

Not considering null values, two tuples are equal iff each of their components are equal. (1, 2, 3) is equal to (1, 2, 3). There are usually two ways to view null in databases: the absence of a value, or a value that exists but is not known.

In "absence of a value", we would consider (1, 2, null) to be equal to (1, 2, null), as we actually mean something akin to (1, 2), i.e. NULLS NOT DISTINCT. In "unknown value", we would consider (1, 2, null) to not be equal to (1, 2, null), as null represents an unknown but possibly or probably different value in either case, i.e. NULLS DISTINCT.

In our example we will consider an index on the table t with columns c1, c2, c3, where c1 is never null, but the other two columns can be.
Indexes on Expressions

One possible way (which I ultimately chose) is using indexes on expressions. Per column we choose a value which the column should never normally have. As the columns are untyped in SQlite, this is quite easy. You could choose a very large (or negative) number, or a random string. Again, some value you're fairly certain, your actual data is never going to be equal to. As an example, we choose -1, as my numbers are usually all positive. If you want to prevent accidentally using this value, you may exclude it using a CHECK-constraint.

Now instead of creating our index as
```
CREATE UNIQUE INDEX ON t (c1, c2, c3);
```
we create an index on coerced values:
```
CREATE UNIQUE INDEX ON t (c1, COALESCE(c2, -1), COALESCE(c3, -1));
```
The COALESCE function is variadic and returns its first non-null` argument. This means, our unique index does not compare the ``null values to each other (which are not equal), but our choses non-null representative (-1 in this case).

Problem 1: We do need a value per column that is not allowed. Often times it is very easy to find such a value. You could always use something that would be a type mismatch, or something that is malformed. But sometimes it might not be that easy and you might have to choose a different value for each column. Also, the value will be stored in the index and it should not be very large.

Problem 2: SQLite doesn't actually use this index for queries and operations other than insert. It will not make your SELECT statements faster. So you probably want to create another index, or multiple, depending on your access patterns, that are used for retrieval. Here you can just use the original index without the UNIQUE constraint:
```
CREATE INDEX ON t (c1, c2, c3);
```
Our uniqueness constraint is checked through the other index.
Partial Indexes

The second way is to use partial unique indexes for every null/not null combination. In this way we never allow null values to sneak into our index(es) and we are back to our base case.
```
CREATE UNIQUE INDEX ON t (c1) WHERE c2 IS NULL and c3 IS NULL;
CREATE UNIQUE INDEX ON t (c1, c2) WHERE c2 IS NOT NULL and c3 IS NULL;
CREATE UNIQUE INDEX ON t (c1, c3) WHERE c2 IS NULL and c3 IS NOT NULL;
CREATE UNIQUE INDEX ON t (c1, c2, c3) WHERE c2 IS NOT NULL and c3 IS NOT NULL;
```
These indexes will be used by your SELECT queries and we do not need a particular value per column to trick the index checker.

The problem with this method is obvious though, as we create \(2^n\) indexes for \(n\) nullable columns.

So what you really want depends on your use case. Using partial indexes is probably generally faster and less error-prone, and less memory-intensive. But you do need to manage multiple (exponentially many) indexes. If you're not worried about speed and your queries are already supported by the other indexes, and you only need a simple unique constraint over nullable columns, the indexed expressions might be the variant to use.
Published: Sat 27 June 2020
gnome shell extension oauth python

OAuth in GNOME Shell Extensions

As you might know, I am the maintainer of the GNOME Twitchlive shell extensions. The Twitchlive panel allows you to see whether your favourite Twitch streamers are online or not.

A few months ago, Twitch started to require OAuth authentication for its endpoints and things started to go sideways. Now there are two ways to communicate with Twitch's API:
1. Authenticate your requests with a pre-shared application secret.
2. Obtain a client secret through user authentication and use this secret.
The first variant is only possible for server-to-server applications, as you're not allowed to distribute the application secret to users. So the second variant it is and we need to authenticate the user.

In the OAuth process you open a webpage (of the oauth provider) with a callback url. That webpage contains a login form and if you enter valid credentials, the webpage will redirect you to your given callback url, passing as additional information the authorization token you can use to make a valid API call.

To receive the token you need a webserver that you can redirect to. Even though GNOME's supposed to have a generic OAuth implementation it uses for its online services but it's not generic at all and you can't hook into that. You also can't create a webserver otherwise from within the GNOME Shell. Until a few weeks ago:

I've implemented a small mechanism by starting a python-based webserver through the extension that has just enough capabilities to receive the OAuth token and write it to a file. As far as I know, that's a world first. If you base OAuth of you extension on this work, give me a shout on twitter.

And now to the code which I think I boiled down right to the essentials (you'll also find it on github):

We need to open a browser with the callback within the extension:
```
function trigger_oauth() {
  const url = "https://id.twitch.tv/oauth2/authorize?response_type=token&client_id=" + client_id + "&redirect_uri=http://localhost:8877&scope=";
  GLib.spawn_command_line_async("xdg-open " + url);
  GLib.spawn_sync(null, ["python3", oauth_receiver,  oauth_token_path], null, GLib.SpawnFlags.SEARCH_PATH, null);
}
```
And the python3 reciver code:
```
from http.server import *;
from urllib.parse import *;
import sys;
import os.path;

page = """<html>
<head><title>Twitchlive GNOME Shell extension OAuth</title></head>
<body><script>var tokens=document.location.hash.substring(1);
document.write("<a href=\\"/tokens?" + tokens + "\\"> To finish OAuth-Process click here</a>");
</script></body>
"""

class handler(BaseHTTPRequestHandler):
  def log_requests(self):
      pass

  def do_GET(self):
      print(self.path)
      if self.path == '/':
          # initial call from twitch
          self.send_response(200)
          self.send_header("Content-Type", "text/html")
          self.end_headers()

          self.wfile.write(page.encode())
      elif self.path.startswith('/tokens'):
          # our own call
          code = parse_qs(urlparse(self.path).query)['access_token'][0]
          open(sys.argv[1], 'w').write(code)

          self.send_response(200)
          self.send_header("Content-Type", "text/plain")
          self.end_headers()

          self.wfile.write(b"Thank You. You can close this page.")
          sys.exit(0)
```
Note that the OAuth-Token is passed as the url fragment that doesn't show up in the actual query so you really need a browser to read that value.

And for now ... it works.
Published: Thu 23 January 2020
podcast python xml

Podcast Time Machine

Recently one of my favourite podcasts ended after seven years. I have only been listening to this podcast for two years though. There is five year backlog for me to listen to. But there are many other podcasts I want to listen to and I do like the anticipation of waiting for a new episode.

So I built the Podcast Time Machine (github) where you can enter a podcast url and a delay and you will get a new link where every podcast episode is delayed by that many days and all episodes that would lie in the future are filtered out. So stuff happpening a week in podcast time are happpening in a week of real time.

I am looking forward to my first episode of harmontown, when it comes out after a seven-year delay. Feel free to subscribe to your own delayed podcasts. I do not keep access logs so listen to your favorite stuff again (and again).
Published: Sun 19 January 2020
raspberry pi hardware cluster raspbian

I bought a Cluster Hat

I recently bought and installed a Cluster Hat.

Looks interesting

I don't know what I will do with it. All my plans I had before hinged on Haskell supporting ARMv6, which it doesn't. Let's see what happens in the future. I am currently looking into installing Docker on it and maybe host some service through my VPS from home.

Page 1 / 8 »

Other articles

NULLS NOT DISTINCT in SQLite Unique Indexes

Indexes on Expressions

Partial Indexes

OAuth in GNOME Shell Extensions

Podcast Time Machine

I bought a Cluster Hat

links

social