Aggregate functions.

Group functions return a value based on a group of rows. The exact group of rows is not determined until the query is executed and all rows are fetched. Group functions do not process NULL values and do not return a NULL value, even when NULLs are the only values evaluated. For example, a COUNT or SUM of NULL values will result in 0.

Most of the group functions can be applied either to ALL values or to only the DISTINCT values for the specified expression. When ALL is specified, all non-NULL values are applied to the group function. When DISTINCT is specified, only one of each non-NULL value is is applied to the function.

Using group functions.

Aggregate functions can appear in the SELECT or HAVING clauses of SELECT statements. When used in the SELECT clause, they usually require a GROUP BY clause, as well. If no GROUP BY is specified, the default grouping is the entire grouping set.
Note: group functions cannot appear in the WHERE clause of a SELECT statement.
If you click on an example, you'll see the result of the query.
Function Description Example
AVG([DISTINCT | ALL] x) Returns the statistical mean of the expression x. If neither DISTINCT or ALL is specified, the default is ALL.
SELECT dep, AVG(salary)
  FROM employees
 GROUP BY dep;
CORR(y, x) Takes two arguments: y (the dependent variable) and x (the independent variable) and returns the coefficient of correlation of a set of number pairs. The coefficient of correlation is a measure of strength of the relationship between the two numbers. It is calculated from those (x, y) pairs that are both not NULL using the formula:
COVAR_POP(y, x)
 STDDEV_POP(y) * STDDEV_POP(x)
CORR can return NULL.
SELECT CORR(salary, SYSDATE-doh)
  FROM employees;
COUNT({* | [DISTINCT | ALL]} x) Returns the number of row in the query. If an expression x is given, then option DISTINCT counts all distinct not NULL elements, option ALL counts all not NULL fields (the default is ALL). If the asterisk is given, then all rows in the set are counted, regardless of NULLs.
SELECT COUNT(DISTINCT dep)
  FROM employees;
COVAR_POP(y, x) Returns the population covariance of a set of number pairs,m which can be NULL. The covariance is a measure of how two sets of data vary in the same way, The population covariance is calculated from those (y, x) pairs that are not both NULL using the formula:
SUM(y*x) - 
SUM(y)*SUM(x)
 COUNT(x)
COUNT(x)
SELECT COVAR_POP(salary, SYSDATE-doh)
  FROM employees;
COVAR_SAMP(y, x) Returns the sample covariance of a set of number pairs,m which can be NULL. The covariance is a measure of how two sets of data vary in the same way. The sample covariance is calculated from those (y, x) pairs that are not both NULL using the formula:
SUM(y*x) - 
SUM(y)*SUM(x)
 COUNT(x)
COUNT(x) - 1
SELECT COVAR_SAMP(salary, SYSDATE-doh)
  FROM employees;
CUME_DIST(val_list)
 WITHIN GROUP (
  ORDER BY col_list [ASC|DESC]
  [NULLS {first|last}]
 )
Returns the cumulative distribution of a value in val_list within a distribution in col_list, where val_list is a comma-delimited list of expressions that evaluate to numeric constant values, and col_list is the comma-delimited list of column expressions. The cumulative distribution is a measure of ranking within the ordered group and will be range 0 to 1 (including 1). See also PERCENT_RANK.
SELECT dep, 
       CUME_DIST(15000) WITHIN GROUP 
        (ORDER BY salary DESC)
  FROM employees
 GROUP BY dep;
DENSE_RANK(val_list)
 WITHIN GROUP (
  ORDER BY col_list [ASC|DESC]
  [NULLS {first|last}]
 )
Returns the row's rank within an ordered group. When there are ties, ranks are not skipped. That is, all equal fields are counted as one. See also RANK.
SELECT dep, 
       DENSE_RANK(15000) WITHIN GROUP 
        (ORDER BY salary DESC) dense_rank
  FROM employees
 GROUP BY dep;
GROUP_ID() Returns a numeric identifier that can be used to uniquely identify duplicate groups. For i duplicate groups, GROUP_ID will return values 0 through i-1. This unction take no arguments and requires a GROUP BY clause.  
GROUPING(x) Takes one argument x an expression in the GROUP BY clause of the query. This function is applicable only for queries with GROUP BY clause and a ROOLUP or CUBE clause. The ROLLUP and CUBE clause create summary rows containing NULLs in the grouped expressions. The GROUPING function returns a 1 for these summary rows and a 0 for the non-summary rows. (It's used to distinguish the summary rows from the non-summary rows. It becomes significant when when the data in the values being aggregated may contain NULL values.)
SELECT dep, sex, COUNT(*), 
       GROUPING(dep), GROUPING(sex)
  FROM employees
 GROUP BY CUBE(dep, sex); 
agg_func KEEP (
  DENSE_RANK {FIRST|LAST}
  ORDER BY col_list [ASC|DESC]
  [NULLS] {first|last}
 )
where agg_func is an aggregate function, and col_list is a list of columns to be ordered for the grouping. This function is actually a modifier for one of the other group functions, like MIN or COUNT. The KEEP function returns the first or last row of a sorted group. It's used to avoid the need for self-join, looking for the MIN or MAX.
SELECT dep, 
       MIN(doh) "Earliest", 
       COUNT(ssn) KEEP(DENSE_RANK FIRST 
        ORDER BY doh) "First",
       MAX(doh) "Latest",
       COUNT(ssn) KEEP(DENSE_RANK LAST 
        ORDER BY doh) "Last"
  FROM employees
 GROUP BY dep; 
MAX([DISTINCT | ALL] x) Returns the highest value of the expression x.
  • If x is a date/time datatype, it returns a DATE. For dates, the maximum is the latest date.
  • If x is a numeric datatype, it returns a NUMBER.
  • If x is a character datatype, it returns a VARCHAR2. For character strings, the maximum is the one that sorts highest based on the database character set.
SELECT dep, 
       to_char(MAX(salary), '$999,999.99')
        "Max Salary"
  FROM employees
 GROUP BY dep
 ORDER BY 2 DESC;
MIN([DISTINCT | ALL] x) Returns the lowest value of the expression x.
  • If x is a date/time datatype, it returns a DATE. For dates, the minimum is the earliest date.
  • If x is a numeric datatype, it returns a NUMBER.
  • If x is a character datatype, it returns a VARCHAR2. For character strings, the minimum is the one that sorts lowest based on the database character set.
SELECT dep, MIN(salary)
  FROM employees
 GROUP BY dep
 ORDER BY 2 DESC;
PERCENT_RANK(val_list)
 WITHIN GROUP (
  ORDER BY col_list [ASC|DESC]
  [NULLS {first|last}]
 )
Returns the percent ranking of a value in val_list within a distribution in col_list, where val_list is a comma-delimited list of expressions that evaluate to numeric constant values, and col_list is the comma-delimited list of column expressions. The percent distribution is a measure of ranking within the ordered group and will be range 0 to 1 (including ends). See also CUME_DIST.
The main difference between PERCENT_RANK and CUME_DIST is that PERCENT_RANK will always return a 0 for the first row in any set, while the CUME_DIST cannot return a 0. /* It looks like the CUME_DIST function adds its argument to the list and computes the percentage of how many elements in the modified list is greater/smaller or equal to the argument. The PERCENT_RANK doesn't modify the list. */
SELECT dep, COUNT(*),
       PERCENT_RANK(35000) WITHIN GROUP
        (ORDER BY salary DESC) Rank
  FROM employees
 GROUP BY dep; 
PERCENTILE_CONT(x)
 WITHIN GROUP (
  ORDER BY col_list [ASC|DESC]
 )
where x is a percentile value in the range 0<x<1, and col_list is the sort specification. This function returns the interpolated value that would fall in percentile position x within the sorted group col_list.
The function assumes a continuous distribution and is most useful for obtaining the median value of an ordered group. The median value is defined to be the midpoint in a group of ordered numbers - half of the values are above the median, and half of the values are below the median.
SELECT dep, AVG(salary),
       PERCENTILE_CONT(0.5) WITHIN GROUP
        (ORDER BY salary DESC) "MEDIAN"
  FROM employees
 GROUP BY dep;
PERCENTILE_DISC(x)
 WITHIN GROUP (
  ORDER BY col_list [ASC|DESC]
 )
where x is a percentile value in the range 0<x<1, and col_list is the sort specification. This function returns the smallest cumulative distribution value from the col_list set that is greater than or equal to value x. This function assumes a discrete distribution. Sometimes, data cannot be averaged, in a meaningful way. For example, we cannot average hiring date data, but we can calculate the median date in a group of dates.
SELECT dep, MIN(doh), MAX(doh),
       PERCENTILE_DISC(0.5) WITHIN GROUP
        (ORDER BY doh DESC) "Median"
  FROM employees
 GROUP BY dep;
RANK(val_list)
 WITHIN GROUP (
  ORDER BY col_list [ASC|DESC]
  [NULLS {first|last}]
 )
Returns the row's rank of a comma-delimited list of numeric constant expressions val_list within an ordered group col_list. When there are ties, ranks of equal vale are assigned equal rank, and the number of tied rows is skipped before the next rank is assigned. For example, if there are three items tied for first, the second and third item will be skipped, and the next item will be fourth. See also DENSE_RANK.
SELECT dep, 
       DENSE_RANK(15000) WITHIN GROUP 
        (ORDER BY salary DESC) "DenseRank",
       RANK(15000) WITHIN GROUP 
        (ORDER BY salary DESC) "Rank"
  FROM employees
 GROUP BY dep;
REGR_AVGX(y, x) This linear regression function returns the numeric average x value after eliminating NULL y, x pairs. It first removes y, x pairs that have a NULL value in either y or x, then computes AVG(x). This function can return NULL.
SELECT dep, AVG(salary), 
       REGR_AVGX(bonus, salary)
  FROM employees
 GROUP BY dep;
REGR_AVGY(y, x) This linear regression function returns the numeric average y value after eliminating NULL y, x pairs. It first removes y, x pairs that have a NULL value in either y or x, then computes AVG(y). This function can return NULL.
SELECT dep, AVG(salary), 
       REGR_AVGY(salary, bonus)
  FROM employees
 GROUP BY dep;
REGR_COUNT(y, x) Returns the number of non-NULL y, x pairs. It first removes y, x pairs that have a NULL value in either y or x, then counts the remaining number of pairs. This function can return a 0, but not a NULL.
SELECT dep, COUNT(*),
       REGR_COUNT(mar_status, bonus) "AllKnown"
  FROM employees
 GROUP BY dep;
REGR_INTERCEPT(y, x) Returns the numeric y interception of the linear-regression line. It first removes y, x pairs that have a NULL value in either y or x, then computes the y intercept. This function can return NULL. When the data are fitted to a line, the formula y = A + Bx can be used to represent the data, where A is the intercept and B is the slope. See also REGR_SLOPE.
SELECT dep,
       REGR_INTERCEPT(bonus, salary) "Intercept"
  FROM employees
 GROUP BY dep;
REGR_R2(y, x) Returns the numeric coefficient of determinations, or R2, which can be in the range 0<R2<=1 or NULL. This coefficient is a measure how well the data fits the line, with 1 being a direct linear relationship.
SELECT dep,
       REGR_R2(bonus, salary)
  FROM employees
 GROUP BY dep;
REGR_SLOPE(y, x) Returns the numeric slope of the line of non-NULL y, x pairs using the least-square fit. See the description of REGR_INTERCEPT.
SELECT dep,
       REGR_INTERCEPT(bonus, salary) intercept,
       REGR_SLOPE(bonus, salary) slope,
       REGR_R2(bonus, salary) determcoeff
  FROM employees
 GROUP BY dep;
REGR_SXX(y, x) Returns the sum of the squares of x, which can be NULL. It's calculated as
SUM(x*x) - 
SUM(x)*SUM(x)
 REGR_COUNT(y, x)
SELECT REGR_COUNT(bonus, salary) REGR_COUNT,
       REGR_SXX(bonus, salary) REGR_SXX,
       REGR_SXY(bonus, salary) REGR_SXY,
       REGR_SYY(bonus, salary) REGR_SYY
  FROM employees;
REGR_SXY(y, x) Returns the sum of the products of x and y, which can be NULL. It's calculated as
SUM(y*x) - 
SUM(y)*SUM(x)
 REGR_COUNT(y, x)
REGR_SYY(y, x) Returns the sum of the squares of y, which can be NULL. It's calculated as
SUM(y*y) - 
SUM(y)*SUM(y)
 REGR_COUNT(y, x)
STDDEV([DISTINCT | ALL] x) Returns the numeric standard deviation of the numeric expression x. The standard deviation is calculated as the square root of the variance. STDDEV is very similar to the STDEDV_SAMP function, except STDDEV will return 1 when there is only one row of input, while STDDEV_SAMP will return NULL.
SELECT dep, AVG(salary),
       STDDEV(salary)
  FROM employees
 GROUP BY dep;
STDDEV_POP(x) Returns of the numeric population standard deviation of the numeric expression x. It's calculated as the square root of the population variance VAR_POP.
SELECT dep, 
       STDDEV_POP(salary),
       STDDEV(salary),
       STDDEV_SAMP(salary)	   
  FROM employees
 GROUP BY dep;
STDDEV_SAMP(x) Returns the numeric sample standard deviation of the numeric expression x. The sample standard deviation is calculated as the square root of the sample variance VAR_SAMP.
SUM([DISTINCT | ALL] x) Returns the sum of the numeric expression x.
SELECT dep, SUM(salary)
  FROM employees
 GROUP BY dep;
VAR_POP(x) Returns the numeric population variance of the numeric expression x. The population variance is calculated with the formula
SUM(x*x) - 
SUM(x)*SUM(x)
 COUNT(x)
COUNT(x)
SELECT dep,
       COUNT(*), 
       VAR_POP(salary),
       VAR_SAMP(salary),
       VARIANCE(salary)
  FROM employees
 GROUP BY dep;
VAR_SAMP(x) Returns the numeric sample variance of the numeric expression x. The sample variance is calculated with the formula
SUM(x*x) - 
SUM(x)*SUM(x)
 COUNT(x)
COUNT(x)-1
VARIANCE([DISTINCT | ALL] x) Returns the variance of the numeric expression x. It slightly differs from the VAR_SAMP function. When the number of expressions COUNT(x)=1, VARIANCE returns a 0, whereas VAR_SAMP returns NULL. When COUNT(x)=0, they both return NULL.

The HAVING clause.

It's important to remember that
Group functions cannot be used in WHERE clause, because all conditions in WHERE clause only limit the number of rows extracted from the table. The group functions are applied to sets of already extracted rows.
The correct way to use the group functions in conditions is to use them in the HAVING clause. For example,
SELECT dep, SUM(salary)
  FROM employees
 GROUP BY dep
HAVING SUM(salary) > 300000;   

The CUBE and ROLLUP modifiers.

The CUBE and ROLLUP are modifiers to the GROUP BY clause that allow to create aggregations of aggregates (or summary rows).
SELECT sex, 
       decode( grouping(mar_status), 
               0, nvl(mar_status, 'unknown'), 
               ''
       ) "Status", 
       COUNT(*)
  FROM employees
 GROUP BY ROLLUP(sex, mar_status); 
ROLLUP creates hierarchical aggregation, so the order of the expressions in the ROLLUP clause is significant. The ordering follows the same convention used in the GROUP BY clause - most general to most specific. If we reverse the order in the previous example, we'll get different subtotals.
SELECT sex, 
       decode( grouping(mar_status), 
               0, nvl(mar_status, 'unknown'), 
               ''
       ) "Status", 
       COUNT(*)
  FROM employees
 GROUP BY ROLLUP(mar_status, sex); 

The CUBE modifier produces all possible aggregations, not just those in the hierarchy of columns specified. The number of aggregations created by the CUBE modifier is the number of distinct combinations of data values in all of the columns that appear in the CUBE clause. Since CUBE creates aggregations for all possible combinations of columns, the order of expressions in a CUBE is not significant.

SELECT sex, 
       decode( grouping(mar_status), 
               0, nvl(mar_status, 'unknown'), 
               ''
       ) "Status", 
       COUNT(*)
  FROM employees
 GROUP BY CUBE(sex, mar_status);