Aggregate Functions: Counting, Summing, and Finding Useful Numbers

Welcome back.

In the previous lesson, you learned JOIN.

You learned how to read data from related tables.

Very important.

Now we can ask PostgreSQL questions like:

Show me products with category names.
Show me orders with customer names.
Show me posts with author names.

Excellent.

But sometimes we do not want every row.

Sometimes we want answers.

For example:

How many products do we have?
What is the total value of all products?
What is the average price?
What is the cheapest product?
What is the most expensive product?
How many products are in each category?

This is where aggregate functions help.

Aggregate functions take many rows and return one useful result.

Very polite.

Very productive.

Like a tired accountant with superpowers.

What You Will Learn

In this lesson, you will learn:

what aggregate functions are;
how to use COUNT;
how to use SUM;
how to use AVG;
how to use MIN;
how to use MAX;
the difference between COUNT(*) and COUNT(column);
how NULL affects aggregate functions;
how to use GROUP BY;
how to count rows per group;
how to aggregate data with JOIN;
how to use HAVING;
common mistakes with aggregate queries.

By the end of this lesson, you will be able to ask PostgreSQL useful business questions.

Not just:

Show me everything.

But:

Summarize this mess and tell me what matters.

That is a big upgrade.

Your database becomes less like a storage box.

And more like an assistant that can count.

Which is useful.

Because humans are not always good at counting.

Especially after coffee number four.

Prepare the Database

Open PostgreSQL:

sudo -iu postgres psql

Connect to your database:

\c learning_postgresql

If the database does not exist, create it:

CREATE DATABASE learning_postgresql;

Then connect:

\c learning_postgresql

Now remove old tables if they exist:

DROP TABLE IF EXISTS products;
DROP TABLE IF EXISTS categories;

We drop products first because it depends on categories.

PostgreSQL remembers relationships.

Sometimes better than people.

Create the Tables

Create the categories table:

CREATE TABLE categories (
  id SERIAL PRIMARY KEY,
  name VARCHAR(100) UNIQUE NOT NULL
);

Create the products table:

CREATE TABLE products (
  id SERIAL PRIMARY KEY,
  name VARCHAR(100) NOT NULL,
  price NUMERIC(10, 2) CHECK (price >= 0),
  quantity INTEGER CHECK (quantity >= 0),
  category_id INTEGER REFERENCES categories(id)
);

This table has:

a product name;
a price;
a quantity;
a category.

This is already close to something real.

Not a full shop system.

But enough to make PostgreSQL sweat a little.

In a healthy educational way.

Insert Data

Insert categories:

INSERT INTO categories (name)
VALUES
  ('Electronics'),
  ('Furniture'),
  ('Office'),
  ('Kitchen');

Insert products:

INSERT INTO products (name, price, quantity, category_id)
VALUES
  ('Laptop', 900.00, 5, 1),
  ('Mouse', 25.00, 30, 1),
  ('Keyboard', 70.00, 15, 1),
  ('Desk Chair', 150.00, 8, 2),
  ('Bookshelf', 120.00, 4, 2),
  ('Notebook', 5.00, 100, 3),
  ('Pen', 2.00, 200, 3),
  ('Whiteboard', 80.00, 3, 3);

Check the data:

SELECT * FROM categories;
SELECT * FROM products;

Notice something.

The Kitchen category has no products.

This will be useful later.

Poor Kitchen.

Waiting for a spoon.

What Are Aggregate Functions?

Aggregate functions calculate one result from many rows.

For example, this table has many products.

But we can ask:

How many products are there?

PostgreSQL can return one number.

That is aggregation.

Common aggregate functions are:

COUNT
SUM
AVG
MIN
MAX

They answer questions like:

How many?
How much total?
What is the average?
What is the smallest?
What is the biggest?

This is very useful in dashboards, reports, admin panels, APIs, invoices, statistics, and business tools.

Basically everywhere people want numbers and then argue about them in meetings.

COUNT

COUNT counts rows.

Run:

SELECT COUNT(*)
FROM products;

Result:

count
-----
8

COUNT(*) means:

Count all rows.

It does not care about specific columns.

It just counts rows.

Simple.

Strong.

No drama.

COUNT with an Alias

The result column is called count.

That is okay.

But we can make it clearer:

SELECT COUNT(*) AS total_products
FROM products;

Result:

total_products
--------------
8

Much better.

Aliases make results easier to understand.

Future you will be grateful.

Future you is already tired.

Help him.

COUNT Column

You can also count a specific column:

SELECT COUNT(category_id) AS products_with_category
FROM products;

This counts rows where category_id is not NULL.

In our data, every product has a category.

So the result is:

products_with_category
----------------------
8

But there is an important difference.

COUNT(*) counts all rows.

COUNT(column) counts only rows where that column is not NULL.

This matters a lot.

Because NULL is not zero.

NULL means unknown or missing.

PostgreSQL treats it carefully.

As it should.

NULL is not a number.

It is a small database ghost.

Example with NULL

Insert one product without a category:

INSERT INTO products (name, price, quantity, category_id)
VALUES ('Mystery Box', 50.00, 2, NULL);

Now run:

SELECT COUNT(*) AS total_products
FROM products;

You will get:

total_products
--------------
9

Now run:

SELECT COUNT(category_id) AS products_with_category
FROM products;

You will get:

products_with_category
----------------------
8

Why?

Because Mystery Box has category_id = NULL.

So COUNT(category_id) does not count it.

This is important.

Very important.

Database important.

Not “remember to buy milk” important.

More like “why is my report wrong?” important.

SUM

SUM adds values.

For example, we can calculate the total number of items in stock:

SELECT SUM(quantity) AS total_items
FROM products;

This adds all quantities.

Example result:

total_items
-----------
367

You can also calculate the total value of all stock.

For each product, we need:

price * quantity

Then we sum the result:

SELECT SUM(price * quantity) AS total_stock_value
FROM products;

Example result:

total_stock_value
-----------------
8945.00

This is useful.

Now PostgreSQL is not just storing data.

It is calculating useful information.

Very professional.

Very “I should invoice someone”.

AVG

AVG calculates the average value.

Example:

SELECT AVG(price) AS average_price
FROM products;

PostgreSQL may return many decimal digits.

To make it nicer, use ROUND:

SELECT ROUND(AVG(price), 2) AS average_price
FROM products;

Result example:

average_price
-------------
155.78

The exact number depends on your data.

AVG is useful for questions like:

What is the average product price?
What is the average order total?
What is the average salary?
What is the average chaos level in this project?

PostgreSQL can answer the first three.

The fourth one needs emotional support.

MIN

MIN finds the smallest value.

Example:

SELECT MIN(price) AS cheapest_price
FROM products;

Result:

cheapest_price
--------------
2.00

This tells us the lowest product price.

But it does not tell us which product has that price.

For that, we can use ORDER BY and LIMIT:

SELECT name, price
FROM products
ORDER BY price ASC
LIMIT 1;

Result:

name | price
-----|------
Pen  | 2.00

MIN gives the minimum value.

ORDER BY with LIMIT can show the full row.

Both are useful.

Different tools.

Same toolbox.

MAX

MAX finds the largest value.

Example:

SELECT MAX(price) AS highest_price
FROM products;

Result:

highest_price
-------------
900.00

To find the most expensive product:

SELECT name, price
FROM products
ORDER BY price DESC
LIMIT 1;

Result:

name   | price
-------|------
Laptop | 900.00

Very good.

PostgreSQL has found the expensive beast.

Probably with RGB lights.

Multiple Aggregates in One Query

You can use several aggregate functions in one query.

Example:

SELECT
  COUNT(*) AS total_products,
  SUM(quantity) AS total_items,
  ROUND(AVG(price), 2) AS average_price,
  MIN(price) AS cheapest_price,
  MAX(price) AS highest_price
FROM products;

This gives one summary row.

Example:

total_products | total_items | average_price | cheapest_price | highest_price
---------------|-------------|---------------|----------------|--------------
9              | 367         | 155.78        | 2.00           | 900.00

This is perfect for dashboards.

One query.

Many useful numbers.

PostgreSQL is doing office work without asking for a chair.

Respect.

GROUP BY

So far, aggregate functions gave one result for the whole table.

But often we want results per group.

For example:

How many products are in each category?

For that, we use GROUP BY.

Example:

SELECT
  category_id,
  COUNT(*) AS products_count
FROM products
GROUP BY category_id;

Result example:

category_id | products_count
------------|---------------
1           | 3
2           | 2
3           | 3
            | 1

The empty category means NULL.

That is our Mystery Box.

It has no category.

Very mysterious.

Very box.

GROUP BY with JOIN

category_id is useful for PostgreSQL.

But humans want category names.

So we use JOIN.

SELECT
  c.name AS category_name,
  COUNT(p.id) AS products_count
FROM categories AS c
JOIN products AS p
ON c.id = p.category_id
GROUP BY c.name;

Result:

category_name | products_count
--------------|---------------
Electronics   | 3
Furniture     | 2
Office        | 3

This is much nicer.

But notice something.

Kitchen is missing.

Why?

Because JOIN means INNER JOIN.

Only matching rows appear.

Kitchen has no products.

So it does not appear.

Poor Kitchen again.

GROUP BY with LEFT JOIN

If we want to show all categories, even categories without products, we use LEFT JOIN.

SELECT
  c.name AS category_name,
  COUNT(p.id) AS products_count
FROM categories AS c
LEFT JOIN products AS p
ON c.id = p.category_id
GROUP BY c.name
ORDER BY c.name;

Result:

category_name | products_count
--------------|---------------
Electronics   | 3
Furniture     | 2
Kitchen       | 0
Office        | 3

Now Kitchen appears.

Important detail:

We used:

COUNT(p.id)

Not:

COUNT(*)

Why?

Because with LEFT JOIN, COUNT(*) would count the category row even if there is no product.

For categories without products, COUNT(*) could give 1.

That would be wrong.

COUNT(p.id) counts only real products.

This detail saves reports.

And sometimes jobs.

SUM with GROUP BY

We can calculate stock value per category.

SELECT
  c.name AS category_name,
  SUM(p.price * p.quantity) AS stock_value
FROM categories AS c
LEFT JOIN products AS p
ON c.id = p.category_id
GROUP BY c.name
ORDER BY stock_value DESC;

Result example:

category_name | stock_value
--------------|------------
Electronics   | 6750.00
Furniture     | 1680.00
Office        | 1140.00
Kitchen       |

Kitchen has no products.

So the sum is NULL.

If we want to show 0 instead of NULL, we can use COALESCE.

SELECT
  c.name AS category_name,
  COALESCE(SUM(p.price * p.quantity), 0) AS stock_value
FROM categories AS c
LEFT JOIN products AS p
ON c.id = p.category_id
GROUP BY c.name
ORDER BY stock_value DESC;

Now Kitchen shows 0.

COALESCE returns the first non-null value.

So this:

COALESCE(SUM(p.price * p.quantity), 0)

means:

If the sum exists, show it.
If the sum is NULL, show 0.

Very useful.

Very practical.

Very “please do not break my dashboard”.

AVG with GROUP BY

We can calculate average product price per category:

SELECT
  c.name AS category_name,
  ROUND(AVG(p.price), 2) AS average_price
FROM categories AS c
LEFT JOIN products AS p
ON c.id = p.category_id
GROUP BY c.name
ORDER BY average_price DESC;

Result example:

category_name | average_price
--------------|---------------
Electronics   | 331.67
Furniture     | 135.00
Office        | 29.00
Kitchen       |

Again, Kitchen has no products.

So average price is NULL.

That is correct.

You cannot calculate an average from no values.

PostgreSQL refuses to invent numbers.

Good database.

HAVING

WHERE filters rows before grouping.

HAVING filters groups after grouping.

This is important.

Suppose we want categories with more than two products.

Wrong:

SELECT
  category_id,
  COUNT(*) AS products_count
FROM products
WHERE COUNT(*) > 2
GROUP BY category_id;

This does not work.

Why?

Because WHERE cannot use aggregate results.

Use HAVING:

SELECT
  category_id,
  COUNT(*) AS products_count
FROM products
GROUP BY category_id
HAVING COUNT(*) > 2;

Result:

category_id | products_count
------------|---------------
1           | 3
3           | 3

HAVING filters groups.

Very simple idea.

Very common source of errors.

SQL likes order.

Humans like guessing.

SQL wins.

HAVING with JOIN

Now let us show category names.

SELECT
  c.name AS category_name,
  COUNT(p.id) AS products_count
FROM categories AS c
JOIN products AS p
ON c.id = p.category_id
GROUP BY c.name
HAVING COUNT(p.id) > 2
ORDER BY products_count DESC;

Result:

category_name | products_count
--------------|---------------
Electronics   | 3
Office        | 3

This means:

Show only categories that have more than two products.

This is useful for reports.

For example:

Show active categories.
Show customers with more than five orders.
Show authors with more than ten posts.
Show products sold more than one hundred times.

HAVING is for filtering aggregated results.

Do not forget it.

It waits quietly.

Like a serious SQL librarian.

WHERE and HAVING Together

You can use WHERE and HAVING in the same query.

Example:

SELECT
  c.name AS category_name,
  COUNT(p.id) AS products_count
FROM categories AS c
JOIN products AS p
ON c.id = p.category_id
WHERE p.price >= 10
GROUP BY c.name
HAVING COUNT(p.id) >= 2
ORDER BY products_count DESC;

This means:

First, keep only products with price at least 10.
Then group them by category.
Then show only categories with at least two products.

Order matters.

WHERE works before grouping.

HAVING works after grouping.

Remember this and SQL will hurt you less.

Not never.

Less.

Common Mistakes

Using WHERE Instead of HAVING

Wrong:

SELECT category_id, COUNT(*)
FROM products
WHERE COUNT(*) > 2
GROUP BY category_id;

Correct:

SELECT category_id, COUNT(*)
FROM products
GROUP BY category_id
HAVING COUNT(*) > 2;

Use WHERE for normal rows.

Use HAVING for groups.

Forgetting GROUP BY

Wrong:

SELECT category_id, COUNT(*)
FROM products;

PostgreSQL will complain because category_id is not aggregated and not grouped.

Correct:

SELECT category_id, COUNT(*)
FROM products
GROUP BY category_id;

If you select a normal column together with an aggregate function, you usually need GROUP BY.

PostgreSQL is strict.

But fair.

Mostly.

Counting the Wrong Thing with LEFT JOIN

Wrong for counting products per category:

SELECT
  c.name,
  COUNT(*) AS products_count
FROM categories AS c
LEFT JOIN products AS p
ON c.id = p.category_id
GROUP BY c.name;

This may count a category with no products as 1.

Better:

SELECT
  c.name,
  COUNT(p.id) AS products_count
FROM categories AS c
LEFT JOIN products AS p
ON c.id = p.category_id
GROUP BY c.name;

Count the column from the table you really want to count.

Small detail.

Big difference.

Classic database trap.

Practice

Count all products:

SELECT COUNT(*) AS total_products
FROM products;

Calculate the total number of items:

SELECT SUM(quantity) AS total_items
FROM products;

Calculate the total stock value:

SELECT SUM(price * quantity) AS total_stock_value
FROM products;

Find the average price:

SELECT ROUND(AVG(price), 2) AS average_price
FROM products;

Find the cheapest and most expensive prices:

SELECT
  MIN(price) AS cheapest_price,
  MAX(price) AS highest_price
FROM products;

Count products by category:

SELECT
  c.name AS category_name,
  COUNT(p.id) AS products_count
FROM categories AS c
LEFT JOIN products AS p
ON c.id = p.category_id
GROUP BY c.name
ORDER BY c.name;

Calculate stock value by category:

SELECT
  c.name AS category_name,
  COALESCE(SUM(p.price * p.quantity), 0) AS stock_value
FROM categories AS c
LEFT JOIN products AS p
ON c.id = p.category_id
GROUP BY c.name
ORDER BY stock_value DESC;

Show only categories with more than two products:

SELECT
  c.name AS category_name,
  COUNT(p.id) AS products_count
FROM categories AS c
JOIN products AS p
ON c.id = p.category_id
GROUP BY c.name
HAVING COUNT(p.id) > 2;

Run the queries.

Change the numbers.

Add products.

Delete products.

Watch the results change.

That is the fun part.

Well, database fun.

Different species of fun.

Mini Challenge

Create two tables:

customers
orders

Rules:

one customer can have many orders;
each order belongs to one customer;
customer name is required;
order total must be zero or greater;
some customers may have no orders.

Create tables:

DROP TABLE IF EXISTS orders;
DROP TABLE IF EXISTS customers;

CREATE TABLE customers (
  id SERIAL PRIMARY KEY,
  name VARCHAR(100) NOT NULL
);

CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  total NUMERIC(10, 2) CHECK (total >= 0),
  customer_id INTEGER REFERENCES customers(id)
);

Insert data:

INSERT INTO customers (name)
VALUES
  ('Anna'),
  ('Marco'),
  ('Sofia'),
  ('Luca');

INSERT INTO orders (total, customer_id)
VALUES
  (49.99, 1),
  (120.00, 1),
  (35.50, 2),
  (200.00, 2),
  (15.00, 2),
  (80.00, 3);

Now write queries to:

count all orders;
calculate total sales;
calculate average order total;
find the smallest order;
find the largest order;
show total spent by each customer;
show number of orders by each customer;
show customers with no orders;
show only customers who spent more than 100.

Hint for customers with no orders:

SELECT
  c.name AS customer_name,
  COUNT(o.id) AS orders_count
FROM customers AS c
LEFT JOIN orders AS o
ON c.id = o.customer_id
GROUP BY c.name
HAVING COUNT(o.id) = 0;

This is real-world SQL.

Customers.

Orders.

Totals.

Reports.

Exactly the kind of thing databases are born to do.

Like spreadsheets.

But with more discipline.

And fewer mysterious merged cells.

Summary

Today you learned:

aggregate functions calculate results from many rows;
COUNT(*) counts all rows;
COUNT(column) counts only non-null values;
SUM adds values;
AVG calculates averages;
MIN finds the smallest value;
MAX finds the largest value;
GROUP BY creates groups;
aggregates can be used with JOIN;
LEFT JOIN is useful when you want groups with zero related rows;
COUNT(p.id) is safer than COUNT(*) when counting joined rows;
COALESCE can replace NULL with a better value;
WHERE filters rows before grouping;
HAVING filters groups after grouping.

This is a very important lesson.

Now you can analyze data.

Not just read it.

You can build reports.

You can build dashboards.

You can answer business questions.

You can make PostgreSQL count things while you drink coffee.

That is civilization.

Next Lesson

In the next lesson, we will learn about indexes and basic performance.

We will see why some queries become slow.

We will learn how indexes help PostgreSQL find data faster.

Because when your database has ten rows, everything is fast.

When it has ten million rows, PostgreSQL starts asking serious questions.

And so should you.