Locks, Indexes and Cardinality

How to create a good index for performance?

Feb 22, 2025

Let’s talk about locks first.

Postgres Locks

There are different kinds of locks in Postgres, and if I try to explain all of those here, I might have to convert this blog post into a 300-page book, so we are going to focus on 1 specific kind of lock today, advisory lock.

Advisory Locks

So, what’s an advisory lock? For all the definition-oriented people -

Advisory locks in PostgreSQL are user-defined locks that allow you to control concurrent access to resources in a way that is independent of database transactions. Unlike row or table locks, advisory locks are explicitly managed by the application

But for all normal people, here’s what it is.

Let’s say you have some resources in your database, and they are in such a way that multiple other resources might get impacted if there is some change in your resource.

How will you handle concurrency in such scenarios? Well, that’s where advisory locks come in. They let you as a user control all the concurrency-related stuff the way you want it to.

Types of Advisory Locks

Session-Level Locks - Persists for the duration of the session and are released when the session ends.
Transaction-Level Locks - Automatically released when the transaction ends.

Now, earlier I always had this thought that once you acquire a write lock on an object, you can’t acquire another write lock on it.
But turns out, when taking transaction-level advisory locks, you can take multiple write locks on the same object, as long as they are within the same transaction.

But if you try to take a lock on the same object in a different transaction, it will fail. Fascinating stuff indeed!

Moving on, let’s talk about indexes in a database.

Indexes

Let’s assume we have the following table -

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    status TEXT
);

And we want to search for all the orders of a particular customer_id. What are the different ways we can search?

The most naive way, go through all the rows of the table, check if customer_id matches and add it to our result.
Boringg! Who has so much time to loop through all the millions of rows in the database and check for each of them? Certainly not the database.
What if we have some kind of mapping, to let us know where exactly to look for these customer_id?
That’s what an index is. It’s kind of a map for the column you create it for so that when you query for an indexed column, the database knows exactly where those columns are. And the database is happy and doesn’t have to loop forever.

What I described above is a normal index. It is most useful when we want exact matches(=), or range-based comparisons (>, <, BETWEEN).

CREATE INDEX idx_customer_id ON orders(customer_id);

Composite Indexes

Now, let’s assume you want to search for all the orders for a customer_id after a particular date.

Well, we already have customer_id as an index, right? So that should be enough? NO.

The database knows where all those customer_id are, but it will again have to go through all those rows and check if the date is the required one.

We can make it better. How about we create a combined index? Such that the combination of the 2 gives us an idea that would result in better query time.

We call it a composite index, and we can create a composite index of customer_id and order_date. This way we can reduce the query time significantly.

CREATE INDEX idx_orders ON orders(customer_id, order_date);

Partial Indexes

Let’s assume another table -

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name TEXT,
    email TEXT UNIQUE,
    active BOOLEAN
);

Now, let’s say we have a lot of queries where we get only the active users. How do we do that?

Well, from what we saw above, we can again create an index of the is_active field, and whenever we query it knows where all the is_active users are and returns us the result.

Let’s see 1 thing, we only query active users, but we created an index for the is_active column and as a result, all the rows will get indexed.

Also since is_active doesn’t have a lot of unique values (it will only have true/false), so making it an index doesn’t make sense, it takes more memory (also remember index makes writing to a database slow). We can do something better.

Postgres has a concept of partial indexes. These are the indexes that only apply when certain conditions are met.

In our scenario, we can create a partial index on customer_id ONLY when is_active is true. Something like this -

CREATE INDEX idx_active_users ON users (email)
WHERE active = true;

This will not only reduce our execution time but also our storage (since we only index the rows where is_active = true).

Wow, that’s some crazy indexes we talked about. Let’s understand 1 final thing - Cardinality and how it affects our indexes.

Cardinality

Cardinality put simply is the no. of unique values of a particular column.

email column: 1M rows → nearly unique per user → high cardinality
City column: 1M rows → 500 unique cities → moderate cardinality
is_active column: 1M rows → only “true” or “false” → low cardinality

Good indexes are those with high to moderate cardinality (that’s why the indexing is_active column was a bad index above).

For low cardinality indexes, PostgreSQL might ignore the index and scan the table instead.

Let’s look at the following table -

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    status TEXT
);

Now let’s look at these 2 queries -

SELECT * FROM orders WHERE customer_id=’id’ AND status=’transit’;
SELECT * FROM order WHERE status=’transit’ AND customer_id=’id’;

They look identical, except for the order of the WHERE conditions. But does that impact performance? Turns out, not really.

PostgreSQL doesn’t execute queries based on how you write them. Instead, it uses the best possible execution plan based on indexes, column cardinality, and data distribution.

So, what affects performance? Indexes (and their ordering)!

If we have an index on customer_id, PostgreSQL will likely use it, no matter the order in WHERE.
If we have an index on status, it will use it only if filtering by status='transit' significantly reduces the number of rows.
If both customer_id and status have separate indexes, PostgreSQL may use one or both, depending on which filter is more selective.
If customer_id has high cardinality (many unique values), it’s likely to be used first because it filters out more rows.
If status has high cardinality, PostgreSQL might use that instead.
If neither is highly selective (doesn’t reduce the number of rows), PostgreSQL may decide to ignore both indexes and do a sequential scan instead.
If there’s a composite index (customer_id, status), PostgreSQL will use customer_id first only if it’s in the first position of the index and is selective enough.

Always remember the golden rule of indexes -

The more selective the first column, the better the index performs.

Woah, that was a lot of indexes, locks and SQL. But it’s fascinating the way small things that might get overlooked otherwise have such a great meaning to them underneath.

Key Takeaways:
- Use normal indexes for single-column lookups.
- Use composite indexes when filtering multiple columns together.
- Use partial indexes when you only need a subset of data.
- Choose the first column in an index wisely! The more selective, the better the performance.

That’s all for today. Will come back with something more interesting next week.

Discussion about this post

Ready for more?