Next: BagReduce and BagPO Up: Aggregation Previous: Aggregation

# Min, Max, Sum, Count, Avg

In XSB we use HiLog and tables to support aggregation. In HiLog one can manipulate predicates, or the names of sets. So we can construct a set, or bag really, that contains all the salaries in our example of the simple employee relation:

```    :- hilog salaries.
salaries(Sal) :- employee(_Name,_Dept,Sal).
```
The symbol `salaries` is the name of a unary predicate that is true of all salaries, or rather is the name of a bag of all salaries. It is a bag since it may contain the same salary multiple times. XSB provides a predicate `bagSum` which can be used to sum up the elements in a named bag. So given the definition of the HiLog predicate `salaries/1` above, we can get the sum of all the salaries with:
```    :- bagSum(salaries,TotalSals).
```
The first argument to `bagSum` is the name of a bag, and the second is bound to the sum of the elements in the bag.

We can also do a ``group by'' to get total salaries within departments as follows. We define a parameterized predicate, `sals(Dept)`, to be the bag of salaries of employees in department `Dept`, as follows:

```    sals(Dept)(Sal) :- employee(_Name,Dept,Sal).
```
This rule says that `Sal` is in the bag named `sals(Dept)` if there is an employee with some name who works in department `Dept` and has salary `Sal`.

Now with this definition, we can define a predicate, `deptPayroll/2`, that associates with each department the sum of all the salaries of employees in that department:

```    deptPayroll(Dept,Payroll) :- bagSum(sals(Dept),Payroll).
```

XSB provides analogous aggregate operators, `bagMin/2`, `bagMax/2`, `bagCount/2`, `bagAvg/2`, to compute the minimum, maximum, count, and average, of a bag, respectively.

As an interesting example of aggregation and recursion, consider the following problem. Say our university is considering instituting a policy that guarantees that no supervisor shall make less than anyone he or she supervises. Since they do not want to lower anyone's salary, to initiate such a policy, they will have to give raises to supervisors who make less than one of their employees. And this may cascade up the chain of command. We want to write a program that will calculate how much they will have to spend initially to start this new program. The following program does this:

```% inlcude needed predicates
:- import bagMax/2, bagSum/2 from aggregs.
:- hilog maximum, sum, raise.
maximum(X,Y,Z) :- X>=Y -> Z=X; Z=Y.
sum(X,Y,Z) :- Z is X+Y.

% The total cost is the sum of the raises.
totcost(Cost) :- bagSum(raise,Cost).

% A raise is the max of the posible new salaries (own and
% subordinates' salaries) minus the old salary.
raise(Raise) :-
bagMax(possNewSal(Emp),NSal),
emp(Emp,_,OSal),
Raise is NSal-OSal.

% A possible new salary is either one's old salary or the max of the
% possible new salaries of one's immediate subordinates.
possNewSal(Emp)(Sal) :- emp(Emp,_,Sal).
possNewSal(Emp)(Sal) :-
dept(Dept,Emp), emp(Sub,Dept,_),
bagMax(possNewSal(Sub),Sal).

% dept(Dept,Mgr): department data
dept(univ,provost).
dept(ceas,deanCEAS).
dept(cs,chairCS).
dept(ee,chairEE).

% emp(Name,Dept,Salary):
emp(provost,univ,87000).
emp(deanCEAS,univ,91000).
emp(chairCS,ceas,95000).
emp(chairEE,ceas,93000).
emp(prof1CS,cs,65000).
emp(prof2CS,cs,97000).
emp(prof1EE,ee,90000).
emp(prof2EE,ee,94000).
```

Here is the execution of this program for this data:

```| ?- [raises].
[Compiling ./raises]
% Specialising partially instantiated calls to apply/3
% Specialising partially instantiated calls to apply/2
[raises compiled, cpu time used: 1.669 seconds]

yes
| ?- totcost(C).

C = 19000;

no
| ?-
```
And indeed, it would cost \$19,000 to upgrade everyone's salary appropriately.

We can combine aggregation with dynamic programming (which is actually what is happening to some extent in the previous example) to nice effect.

A variation on the knapsack problem discussed above in the section on dynamic programming uses aggregation to find an optimal knapsack packing. (This example taken most recently from JLP 12(4) 92, Clocksin, Logic Programming specification and exectuion of dynamic-programming problems. He refers to Sedgewick, Algorithms A-W 88.) Recall that in the earlier knapsack problem we were given a set of integers and we try to see if, given a target integer, we can choose a subset of the given integers that sum to the target integer. The version we consider here is a bit more complicated, and perhaps more realistic. Here we have a set of kinds of items; each item has a value and a size (both are integers.) We are given a knapsack of a given capacity and we want to pack the knapsack with the items that maximize the value of the total. We assume that there is an unlimited supply of items of each kind.

This problem can be formulated as follows:

```:- import bagMax/2 from aggregs.
:- hilog maximum.
maximum(X,Y,Z) :- X>=Y -> Z=X; Z=Y.

:- table cap/2.
% cap(Size,Cap) if capacity of knapsick of size Size is Cap.
cap(I,Cap) :- I >= 0, bagMax(small_cap(I),Cap).

% small_cap(BigSize)(BigCap) if there is some item with ISize and IVal
% such that the capacity of a knapsack of size (BigSize-ISize) has
% capacity (BigCap-Ival).
small_cap(BigSize)(BigCap) :-
item(ISize,IVal),
SmallSize is BigSize-ISize,
cap(SmallSize,SmallCap),
BigCap is IVal+SmallCap.
% every knapsack (>=0) has capacity of 0.
small_cap(BigSize)(0) :- BigSize >= 0.
```
Here the tabling of `cap/2` is not necessary, since the call to `bagMax/2` is tabled automatically.

A simple example of executing this program is:

```%Data:
item(10,18).
item(8,14).
item(6,10).
item(4,6).
item(2,2).

| ?- [aggreg].
[Compiling ./aggreg]
[aggreg compiled, cpu time used: 0.861 seconds]

yes
| ?- cap(48,C).

C = 86;

no
| ?- cap(49,C).

C = 86;

no
| ?-
```
And we can see that indeed to fill a knapsack of size 48, one should take four of item 10 (for total value 72), and one of item 8 (with value 14), giving us a total value of 86.

Another problem that combines dynamic programming and aggregation is the problem of finding the way to associate a matrix chain product to minimize the cost of computing the product.

```:- import bagMin/2 from aggregs.
:- hilog minimum.
minimum(X,Y,Z) :- X=<Y -> Z=X; Z=Y.

% mult_costI,J,C) if C is the cost of the cheapest way to compute the
% product M_I x M_{I+1} x ... x M_J.
mult_cost(I,I,0).
mult_cost(I,J,C) :- I<J, bagMin(factor(I,J),C).

% factor(I,J) is true of costs obtained by computing the product of
% matrices between I and J by factoring the chain at any point between
% I and J and assuming optimal costs for the two factors.
factor(I,J)(C) :-
I1 is I-1,
J1 is J-1,
between(I,K,J1),
mult_cost(I,K,C1),
K1 is K+1,
mult_cost(K1,J,C2),
r(I1,Ri1), r(K,Rk), r(J,Rj),
C is C1+C2+Ri1*Rk*Rj.

between(X,X,_).
between(X,Y,Z) :- X < Z, X1 is X+1, between(X1,Y,Z).

% r(I,N) if N is the number of rows in the I-1st matrix.  (The last is
% the number of columns in the last matrix.)
r(0,5).
r(1,3).
r(2,6).
r(3,9).
r(4,7).
r(5,2).
```

Next: BagReduce and BagPO Up: Aggregation Previous: Aggregation
David S. Warren
1999-07-31