Andy Glew's comp-arch.net wiki, http://semipublic.comp-arch.net

If you are reading this elsewhere, e.g. at site waboba.info, it is an unauthorized copy, and probably a malware site.
comp-arch.net wiki on hold from October 17, 2011

Bit matrix multiply (BMM)

From CompArch

Jump to: navigation, search



Typically, bit matrix multiply multiplies an integer scalar, e.g. 64-bits, by a bit matrix. I.e. a bit-vector X bit-matrix multiply. Although, as we will see below, the use of the term vector can become confusing.

Contents

Pedantic Names

I suppose that, strictly speaking, we should distinguish names for:

BVMM - Bit Vector Matrix Multiply
e.g. 64 bit integer "vector" * 64 x 64 matrix
e.g. 8 bit vector * 8x8 matrix
BMMM - Bit Matrix Matrix Multiply
e.g. 2 64 bit integer vectors, interpreted as 8x8 bit matrices, multiplied together
e.g. 2 64 element 64 bit vector register or memory operands, interpreted as 64x64 bit matrices
Transpose versions of the above

Operand Size and Format

E.g. a 64x64 bit matrix. (512 bytes!) That's pretty big! It necessarily requires a special operand, e.g. a special 64x64 register, or a Big Memory Operand, since few instruction sets have so many integer registers. Vector instruction sets may have 64 element vectors, each element 64 bits.

Because of the size of the matrix operand, sometimes operations with smaller operands that fit within a typical register are continued. E.g. 64 bits = 8x8 bits. Since this only consumes 8 bits of the other input, that other input may be considered as a SIMD vector: 64 bits = a vector of 8 8b elements, each of which is multiplied by the 64=8x8 matrix.

a full 64x[64x64] BMM can be synthesized with several such operations, and some bit manipulations.

Note: this vector of 8 x 8b X 8x8 matrix is similar to, but not quite the same, as multiplying two 8x8 matrices.

Note: since multiplying two 8x8 matrices involves adjacent (row) bits in one operand, but non-adjacent (column) bits in the other, it may be more natural (less wiring) to provide the operation A*B-transpose - i.e. transpose matrix multiplication.

Form of Matrix Multiplication

One of the most natural forms is "conventional" AND-OR BMM:

for(i=0;i<63;i++) {
    for(tmp=0, j=0;j<63;j++) {
        tmp |= a.bit[i] * b[i][j];
    }
    result.bit[i] = tmp; 
}

Another much loved form is AND-XOR (also called AND parity).

Yet another form is AND-POPCNT

for(i=0;i<63;i++) {
    for(tmp=0, j=0;j<63;j++) {
        tmp += a.bit[i] * b[i][j];
    }
    result[i] = tmp; 
}

although here the result is not 64 bits, but at least 64*6 bits (typically 64*8).


Uses

The AND-OR form is a fairly general and standard way of expressing logic functions - sum-of-products. As is, for that matter, the OR-AND form - product-of-sums. Other forms may also allow expression of all logic functions, although they may be less familiar.

The AND-OR form can be used to accomplish arbitrary permutations, using matrices that have only a single bit per row and column. As can, for that matter, the AND-XOR form, although it may be necessary to invert after the permutation.

The AND-XOR form can be used for information theoretic calculations, as can POPCNT.

The AND-XOR form can be used as a fairly general form of XOR based hash function, as described by Vandierendonck and De Bosschere in "XOR-Based Hash Functions".

Hardware

The AND-OR form is obviously easy to do with wired logic, and not so bad even with limited fan-in.

The XOR form is more expensive, requiring an XOR tree. POPCNT still more so, typically requiring an 8:3 counter (of which the XOR form needs only the lowest bit).

Because 8 levels of XOR can be hard, may motivate wider flatter formulations.


References

When writing this, I found the recent publication (2008):

It is nice to see this "lore" of (super)computer architecture leaking into academia.
Personal tools
No more shadowing