The GROUP BY
statement in SQL is used to arrange identical data into groups. This statement is often used with aggregate functions (COUNT
, AVG
, SUM
, MIN
, and MAX
) to perform a calculation on each group of data. GROUP BY
can be a powerful tool for analyzing data and extracting meaningful insights from a database.
Syntax of GROUP BY:
SELECT column_name(s), aggregate_function(column_name)
FROM table_name
WHERE condition
GROUP BY column_name(s);
Consider a table named employees
that contains data about the employees of a company, including their department. To count the number of employees in each department, you would use:
SELECT department, COUNT(*) AS employee_count
FROM employees
GROUP BY department;
In this query:
COUNT(*)
counts all rows in each group.department
is used to group rows with the same department value together.employee_count
is a label for the count column in the output.Aggregate functions perform a calculation on a set of values and return a single value. When used with GROUP BY
, these functions produce a summary for each group.
If you want to find the average salary in each department:
SELECT department, AVG(salary) AS average_salary
FROM employees
GROUP BY department;
This query calculates the average salary (AVG(salary)
) for each department and displays it along with the department name.
To get more insights, such as the total salary expenditure, and the minimum and maximum salaries in each department:
SELECT department, SUM(salary) AS total_salary, MIN(salary) AS min_salary, MAX(salary) AS max_salary
FROM employees
GROUP BY department;
This query shows the total, minimum, and maximum salaries for each department, giving a clear view of the salary distribution within departments.
You can also group by more than one column. For example, to find the average sales by each employee within each region:
SELECT region, employee_id, AVG(sales_amount) AS average_sales
FROM sales
GROUP BY region, employee_id;
This groups the sales data first by region
and then by employee_id
, calculating the average sales for each employee within each region.
GROUP BY
clause is important when grouping by multiple columns. The results are grouped by the first column and then sub-grouped by the next.GROUP BY
, if you need to filter rows based on the group, use the HAVING
clause instead of WHERE
. WHERE
filters rows before they are grouped, and HAVING
filters after groups are formed.To find departments where the average salary is greater than $50,000:
SELECT department, AVG(salary) AS average_salary
FROM employees
GROUP BY department HAVING AVG(salary) > 50000;
This query first groups the employees by department and calculates the average salary per department. Then, it filters out the departments where the average salary is not greater than $50,000.