Andy Glew's comp-arch.net wiki, http://semipublic.comp-arch.net
If you are reading this elsewhere, e.g. at site waboba.info, it is an unauthorized copy, and probably a malware site.
comp-arch.net wiki on hold from October 17, 2011
Content Addressable Memory (CAM)
From CompArch
CAMs are a powerful concept often desired by computer architects, and often deprecated by circuit designers.
Whereas a RAM is indexed by the number of an entry, a CAM is indexed by something within an entry. Usually not the whole entry, usually a field within an entry.
Examples of structures that are, at least conceptually, CAMs:
- Translation Lookaside Buffer (TLB): indexed by virtual address, returns physical address and permissions
- Cache: indexed by virtual or physical address, contains instruction or data
Contents |
CAM array structure
CAMs are usually, but not always, implemented as arrays. Specifically, an array of memory elements, although CAMs also include more logic than RAM arrays.
Pitch problems TBD
Split into separate arrays: CAM tag array and SRAM data array.
See Andy Glew's Method of Abstract Array Design
CAM Pseudocode
A CAM can usually be described with pseudocode as
for every entry E in CAM array
if match(E,Input.address) then select E
The CAM match function may take various forms:
match(E,Input.address) ::== E.address = Input.address
- Masked Encoded CAM, a generalization of Ternary CAM
match(E,Input.address) ::== E.address & E.mask = Input.address & E.mask
or
match(E,Input.address) ::== E.address = Input.address & E.mask
- Decoded CAM, also known as a Mask CAM or Bitmask CAM
match(E,Input.address) ::== 0 != (E.address & Input.address)
- Range CAM, encoded form
match(E,Input.address) ::== E.address_lo <= Input.address < E.address_hi
- Interval CAM, encoded form
match(E,Input.address) ::== [E.lo,E.hi) INTERSECT [Input.lo, Input.hi)
- Bitmask CAMs lend themselves to a different implementation of range or interval CAMs.
for every entry E in CAM array
if match(E,Input.address) then assert E.matched
for all entries E that have E.matched asserted
select one and only one according to some prioritization rule
Typical prioritization rules may be
- lowest physical entry number E
- entry E that has the lowest value of some other field, such as a timestamp
Are CAMs prohibitively expensive?
For many years it has been a truism that CAMs are prohibitively expensive. This is not always true.
It is always worth knowing just how much bigger a CAM decoder is than a RAM decoder. Often 2X-4X, and up.
Furthermore, RAM decoders can share much logic, essentially via common subexpression elimination at the gate level. CAM decoders cannot share such logic in general, if the addresses that are to be CAMmed are completely independent of each other.
Similarly CAMs may be more power hungry, because every bit is potentially active at all times.
However, after these basic differences, CAMs have often earned a bad rap because of the prioritization logic in Priority CAMs - sometimes called a picker circuit. Non-priority CAMs can be significantly less expensive that priority CAMs.
Duality between CAMs and Software Data structures
CAMs are a hardware data structure.
In hardware, CAMs may be quite simple and efficient - if you neglect things like power and circuit area. At least we know how to build hardware CAMs, no matter what their cost.
In software, there is no equivalent software data structure. At least not on a single threaded Von Neumann machine. Arguably, one can build a CAM out of a MIMD parallel machine, with a separate processor dedicated to each CAM comparison, but we are not yet at a point where that is a reasonable approach.
Moreover, even hardware CAMs are, more and more, deprecated. So the duality described here also applies to hardware - alternative ways of implementing a hardware CAM.
However, there are certain well-known dualities between hardware CAMs and software data structures:
- Equality CAMs often map to software hash tables, with whatever convenient collision structure
- Ternary CAMs or Masked CAMs map to the sort of data structures that I used for mkIrecog - usually a multilevel tree dispatched on the largest high information content fields that can be found.
- Range CAMs map to ... TBD
- Priority CAMs map to ... TBD
Encoded versus Decoded CAMs
An encoded CAM (Content Addressable Memory) is indexed by an encoded number N. A decoded CAM is indexed by the corresponding decoded number, 1<<N.
I.e.
Encoded CAM:
for every entry E of the CAM array: if E.address = Input.address then select entry
Decoded CAM:
for every entry E of the CAM array: if E.decoded_address_bitmask & Input.decoded_address_bitmask then select entry
Where the Input.decoded_address_bitmask might be prepared by
Input.decoded_address_bitmask = 1 << Input.encoded_address, i.e. all bits 0 except for the single bit corresponding to the encoded address
i.e. it might be a one-hot.
However, a decoded CAM might also be indexed by multiple bits in the mask. I.e. it is a little bit easier to select multiple entries for output with an decoded CAM, whereas with an encoded CAM you would have to have several encoded addresses provided (although a ternary CAM provides a limited degree of flexibility in matching by allowing some bits to be "don't care").
(In either case, the CAM may need a policy as to what happens when multiple entries in a CAM are selected - does it act like a Priority CAM or are the multiple selected entries combined by a logic function such as AND or OR?)
---
- Decoded CAM has slightly simpler logic per decoded address bit
- Encoded CAM requires what amounts to an XOR at every address bit for every entry
As mentioned above
- Decoded CAM or bitmask CAM permits multiple entries to be addressed
- whereas an encoded CAM might require multiple encoded addresses
Most importantly
- number of address bits scales more slowly for encoded CAMs than for decoded CAMs.
Overall, decoded CAMs are fine for small addresses. Encoded CAMs are more efficient for large addresses.
Priority CAMs versus Non-Priority CAMs
TBD.
May already have been discussed above.
CAM examples
- usually N-way associative or direct mapped
- of note: original StrongARM, with multiple fully associative banks. Rarely seen.
- often fully associative
- sometimes Masked Encoded CAMs, to support multiple TLB page sizes
- often a variant of Priority CAM, since a load must receive the data from the youngest store older than itself matching its address.
- sometimes a Range CAM or Interval CAM
- BTPs are good examples for CAM versus non-CAM data structures - good examples of many different possibilities
- BTPs, including BTBs and indirect predictors may be CAMs and non-CAMs.
- BDPs, branch direction predictors, are nearly always non-CAM tables, although occasionally tag bits are used.
- See also CAM-indexed versus tagged arrays.
- detecting that writeback of a result enables dependent instructions is often done by an Encoded CAM or Decoded CAM, matching on renamed physical register numbers or result number (See Decoded versus Encoded CAMs for Instruction Scheduling.)
- Decoded CAMs are at the heart of bit matrix schedulers
- in older machines, e.g. HPSm, without register renaming, these were Priority CAMs.
- picking instructions to schedule out of ready instructions involves prioritization, but in schedulers the CAMming to mark operands as ready is usually quite separate from the picking
- Bypass detection
- ...
- Predictors, such as branch predictors and other types
- nearly any predictor can be originally imagined as a CAM indexed by history and/or location, e.g. branch from address. See Andy Glew's method for understanding branch predictors
- however, TnT predictors are often so simple - basically 2-bit counters that it is more efficient to build many of them, in a simple hashed, direct mapped, structure, than it is to build a CAM for them. In fact, often these are untagged hashed direct mapped structures. However, a few bits of tag occasionally help such direction predictors.
- I have worked on branch predictors with variable length history using mask CAMs - however, it is quite difficult to make these area efficient. The algorithms work, but the hardware is expensive. Nevertheless, a good thinking tool.
- Branch identifiers and branch target buffers more often benefit from tags. The usual tradeoffs for associativity apply.
- Glew opinion: untagged hashed direct mapped may be fastest, if the speed of an L0 BTB is needed. (But usually an L0 BTB is not needed techniques for tolerating predicted taken branch latency.) Tagged direct mapped or highly associative structures may be next fastest - an L1 BTB, if you will. L2 BTBs may use more general techniques, such as N-way associativity.
- often fully associative
---
- Content Addressable Memory (CAM)
- CAM (Content Addressable Memory)