Most of the group functions can be applied either to ALL values or to only the DISTINCT values for the specified expression. When ALL is specified, all non-NULL values are applied to the group function. When DISTINCT is specified, only one of each non-NULL value is is applied to the function.
| Function | Description | Example | ||||||
|---|---|---|---|---|---|---|---|---|
| AVG([DISTINCT | ALL] x) | Returns the statistical mean of the expression x. If neither DISTINCT or ALL is specified, the default is ALL. | SELECT dep, AVG(salary) FROM employees GROUP BY dep; |
||||||
| CORR(y, x) | Takes two arguments: y (the dependent variable) and x (the independent variable)
and returns the coefficient of correlation of a set of number pairs. The coefficient of correlation is a
measure of strength of the relationship between the two numbers. It is calculated from those (x, y)
pairs that are both not NULL using the formula:
|
SELECT CORR(salary, SYSDATE-doh) FROM employees; |
||||||
| Returns the number of row in the query. If an expression x is given, then option DISTINCT counts all distinct not NULL elements, option ALL counts all not NULL fields (the default is ALL). If the asterisk is given, then all rows in the set are counted, regardless of NULLs. |
SELECT COUNT(DISTINCT dep) FROM employees; |
|||||||
| COVAR_POP(y, x) | Returns the population covariance of a set of number pairs,m which can be NULL.
The covariance is a measure of how two sets of data vary in the same way, The population covariance
is calculated from those (y, x) pairs that are not both NULL using the formula:
|
SELECT COVAR_POP(salary, SYSDATE-doh) FROM employees; |
||||||
| COVAR_SAMP(y, x) | Returns the sample covariance of a set of number pairs,m which can be NULL.
The covariance is a measure of how two sets of data vary in the same way. The sample covariance
is calculated from those (y, x) pairs that are not both NULL using the formula:
|
SELECT COVAR_SAMP(salary, SYSDATE-doh) FROM employees; |
||||||
|
CUME_DIST(val_list) WITHIN GROUP ( ORDER BY col_list [ASC|DESC] [NULLS {first|last}] ) |
Returns the cumulative distribution of a value in val_list within a distribution in col_list, where val_list is a comma-delimited list of expressions that evaluate to numeric constant values, and col_list is the comma-delimited list of column expressions. The cumulative distribution is a measure of ranking within the ordered group and will be range 0 to 1 (including 1). See also PERCENT_RANK. |
SELECT dep,
CUME_DIST(15000) WITHIN GROUP
(ORDER BY salary DESC)
FROM employees
GROUP BY dep;
|
||||||
|
DENSE_RANK(val_list) WITHIN GROUP ( ORDER BY col_list [ASC|DESC] [NULLS {first|last}] ) |
Returns the row's rank within an ordered group. When there are ties, ranks are not skipped. That is, all equal fields are counted as one. See also RANK. |
SELECT dep,
DENSE_RANK(15000) WITHIN GROUP
(ORDER BY salary DESC) dense_rank
FROM employees
GROUP BY dep;
|
||||||
| GROUP_ID() | Returns a numeric identifier that can be used to uniquely identify duplicate groups. For i duplicate groups, GROUP_ID will return values 0 through i-1. This unction take no arguments and requires a GROUP BY clause. | |||||||
| GROUPING(x) | Takes one argument x an expression in the GROUP BY clause of the query. This function is applicable only for queries with GROUP BY clause and a ROOLUP or CUBE clause. The ROLLUP and CUBE clause create summary rows containing NULLs in the grouped expressions. The GROUPING function returns a 1 for these summary rows and a 0 for the non-summary rows. (It's used to distinguish the summary rows from the non-summary rows. It becomes significant when when the data in the values being aggregated may contain NULL values.) |
SELECT dep, sex, COUNT(*),
GROUPING(dep), GROUPING(sex)
FROM employees
GROUP BY CUBE(dep, sex); |
||||||
| agg_func KEEP ( DENSE_RANK {FIRST|LAST} ORDER BY col_list [ASC|DESC] [NULLS] {first|last} ) |
where agg_func is an aggregate function, and col_list is a list of columns to be ordered for the grouping. This function is actually a modifier for one of the other group functions, like MIN or COUNT. The KEEP function returns the first or last row of a sorted group. It's used to avoid the need for self-join, looking for the MIN or MAX. |
SELECT dep,
MIN(doh) "Earliest",
COUNT(ssn) KEEP(DENSE_RANK FIRST
ORDER BY doh) "First",
MAX(doh) "Latest",
COUNT(ssn) KEEP(DENSE_RANK LAST
ORDER BY doh) "Last"
FROM employees
GROUP BY dep;
|
||||||
| MAX([DISTINCT | ALL] x) | Returns the highest value of the expression x.
|
SELECT dep,
to_char(MAX(salary), '$999,999.99')
"Max Salary"
FROM employees
GROUP BY dep
ORDER BY 2 DESC; |
||||||
| MIN([DISTINCT | ALL] x) | Returns the lowest value of the expression x.
|
SELECT dep, MIN(salary) FROM employees GROUP BY dep ORDER BY 2 DESC; |
||||||
|
PERCENT_RANK(val_list) WITHIN GROUP ( ORDER BY col_list [ASC|DESC] [NULLS {first|last}] ) |
Returns the percent ranking of a value in val_list within a distribution in col_list,
where val_list is a comma-delimited list of expressions that evaluate to numeric constant values, and
col_list is the comma-delimited list of column expressions. The percent distribution is a measure
of ranking within the ordered group and will be range 0 to 1 (including ends).
See also CUME_DIST. The main difference between PERCENT_RANK and CUME_DIST is that PERCENT_RANK will always return a 0 for the first row in any set, while the CUME_DIST cannot return a 0. /* It looks like the CUME_DIST function adds its argument to the list and computes the percentage of how many elements in the modified list is greater/smaller or equal to the argument. The PERCENT_RANK doesn't modify the list. */ |
SELECT dep, COUNT(*),
PERCENT_RANK(35000) WITHIN GROUP
(ORDER BY salary DESC) Rank
FROM employees
GROUP BY dep; |
||||||
| PERCENTILE_CONT(x) WITHIN GROUP ( ORDER BY col_list [ASC|DESC] ) |
where x is a percentile value in the range 0<x<1, and col_list is
the sort specification. This function returns the interpolated value that would fall in percentile position
x within the sorted group col_list. The function assumes a continuous distribution and is most useful for obtaining the median value of an ordered group. The median value is defined to be the midpoint in a group of ordered numbers - half of the values are above the median, and half of the values are below the median. |
SELECT dep, AVG(salary),
PERCENTILE_CONT(0.5) WITHIN GROUP
(ORDER BY salary DESC) "MEDIAN"
FROM employees
GROUP BY dep; |
||||||
| PERCENTILE_DISC(x) WITHIN GROUP ( ORDER BY col_list [ASC|DESC] ) |
where x is a percentile value in the range 0<x<1, and col_list is the sort specification. This function returns the smallest cumulative distribution value from the col_list set that is greater than or equal to value x. This function assumes a discrete distribution. Sometimes, data cannot be averaged, in a meaningful way. For example, we cannot average hiring date data, but we can calculate the median date in a group of dates. | SELECT dep, MIN(doh), MAX(doh),
PERCENTILE_DISC(0.5) WITHIN GROUP
(ORDER BY doh DESC) "Median"
FROM employees
GROUP BY dep; |
||||||
|
RANK(val_list) WITHIN GROUP ( ORDER BY col_list [ASC|DESC] [NULLS {first|last}] ) |
Returns the row's rank of a comma-delimited list of numeric constant expressions val_list within an ordered group col_list. When there are ties, ranks of equal vale are assigned equal rank, and the number of tied rows is skipped before the next rank is assigned. For example, if there are three items tied for first, the second and third item will be skipped, and the next item will be fourth. See also DENSE_RANK. |
SELECT dep,
DENSE_RANK(15000) WITHIN GROUP
(ORDER BY salary DESC) "DenseRank",
RANK(15000) WITHIN GROUP
(ORDER BY salary DESC) "Rank"
FROM employees
GROUP BY dep;
|
||||||
| REGR_AVGX(y, x) | This linear regression function returns the numeric average x value after eliminating NULL y, x pairs. It first removes y, x pairs that have a NULL value in either y or x, then computes AVG(x). This function can return NULL. |
SELECT dep, AVG(salary),
REGR_AVGX(bonus, salary)
FROM employees
GROUP BY dep; |
||||||
| REGR_AVGY(y, x) | This linear regression function returns the numeric average y value after eliminating NULL y, x pairs. It first removes y, x pairs that have a NULL value in either y or x, then computes AVG(y). This function can return NULL. | SELECT dep, AVG(salary),
REGR_AVGY(salary, bonus)
FROM employees
GROUP BY dep; |
||||||
| REGR_COUNT(y, x) | Returns the number of non-NULL y, x pairs. It first removes y, x pairs that have a NULL value in either y or x, then counts the remaining number of pairs. This function can return a 0, but not a NULL. | SELECT dep, COUNT(*),
REGR_COUNT(mar_status, bonus) "AllKnown"
FROM employees
GROUP BY dep; |
||||||
| REGR_INTERCEPT(y, x) | Returns the numeric y interception of the linear-regression line. It first removes y, x pairs that have a NULL value in either y or x, then computes the y intercept. This function can return NULL. When the data are fitted to a line, the formula y = A + Bx can be used to represent the data, where A is the intercept and B is the slope. See also REGR_SLOPE. | SELECT dep,
REGR_INTERCEPT(bonus, salary) "Intercept"
FROM employees
GROUP BY dep; |
||||||
| REGR_R2(y, x) | Returns the numeric coefficient of determinations, or R2, which can be in the range 0<R2<=1 or NULL. This coefficient is a measure how well the data fits the line, with 1 being a direct linear relationship. | SELECT dep,
REGR_R2(bonus, salary)
FROM employees
GROUP BY dep; |
||||||
| REGR_SLOPE(y, x) | Returns the numeric slope of the line of non-NULL y, x pairs using the least-square fit. See the description of REGR_INTERCEPT. | SELECT dep,
REGR_INTERCEPT(bonus, salary) intercept,
REGR_SLOPE(bonus, salary) slope,
REGR_R2(bonus, salary) determcoeff
FROM employees
GROUP BY dep; |
||||||
| REGR_SXX(y, x) | Returns the sum of the squares of x, which can be NULL. It's calculated
as
|
SELECT REGR_COUNT(bonus, salary) REGR_COUNT,
REGR_SXX(bonus, salary) REGR_SXX,
REGR_SXY(bonus, salary) REGR_SXY,
REGR_SYY(bonus, salary) REGR_SYY
FROM employees;
|
||||||
| REGR_SXY(y, x) | Returns the sum of the products of x and y, which can be NULL. It's calculated
as
|
|||||||
| REGR_SYY(y, x) | Returns the sum of the squares of y, which can be NULL. It's calculated
as
|
|||||||
| STDDEV([DISTINCT | ALL] x) | Returns the numeric standard deviation of the numeric expression x. The standard deviation is calculated as the square root of the variance. STDDEV is very similar to the STDEDV_SAMP function, except STDDEV will return 1 when there is only one row of input, while STDDEV_SAMP will return NULL. | SELECT dep, AVG(salary),
STDDEV(salary)
FROM employees
GROUP BY dep; |
||||||
| STDDEV_POP(x) | Returns of the numeric population standard deviation of the numeric expression x. It's calculated as the square root of the population variance VAR_POP. | SELECT dep,
STDDEV_POP(salary),
STDDEV(salary),
STDDEV_SAMP(salary)
FROM employees
GROUP BY dep; |
||||||
| STDDEV_SAMP(x) | Returns the numeric sample standard deviation of the numeric expression x. The sample standard deviation is calculated as the square root of the sample variance VAR_SAMP. | |||||||
| SUM([DISTINCT | ALL] x) | Returns the sum of the numeric expression x. | SELECT dep, SUM(salary) FROM employees GROUP BY dep; |
||||||
| VAR_POP(x) | Returns the numeric population variance of the numeric expression x.
The population variance is calculated with the formula
|
SELECT dep,
COUNT(*),
VAR_POP(salary),
VAR_SAMP(salary),
VARIANCE(salary)
FROM employees
GROUP BY dep;
|
||||||
| VAR_SAMP(x) | Returns the numeric sample variance of the numeric expression x.
The sample variance is calculated with the formula
|
|||||||
| VARIANCE([DISTINCT | ALL] x) | Returns the variance of the numeric expression x. It slightly differs from the VAR_SAMP function. When the number of expressions COUNT(x)=1, VARIANCE returns a 0, whereas VAR_SAMP returns NULL. When COUNT(x)=0, they both return NULL. | |||||||
SELECT dep, SUM(salary) FROM employees GROUP BY dep HAVING SUM(salary) > 300000;
SELECT sex,
decode( grouping(mar_status),
0, nvl(mar_status, 'unknown'),
''
) "Status",
COUNT(*)
FROM employees
GROUP BY ROLLUP(sex, mar_status);
ROLLUP creates hierarchical aggregation, so the order of the expressions in the ROLLUP
clause is significant. The ordering follows the same convention used in the GROUP BY clause -
most general to most specific. If we reverse the order in the previous example, we'll get different
subtotals.
SELECT sex,
decode( grouping(mar_status),
0, nvl(mar_status, 'unknown'),
''
) "Status",
COUNT(*)
FROM employees
GROUP BY ROLLUP(mar_status, sex);
The CUBE modifier produces all possible aggregations, not just those in the hierarchy of columns specified. The number of aggregations created by the CUBE modifier is the number of distinct combinations of data values in all of the columns that appear in the CUBE clause. Since CUBE creates aggregations for all possible combinations of columns, the order of expressions in a CUBE is not significant.
SELECT sex,
decode( grouping(mar_status),
0, nvl(mar_status, 'unknown'),
''
) "Status",
COUNT(*)
FROM employees
GROUP BY CUBE(sex, mar_status);