Andy Glew's comp-arch.net wiki, http://semipublic.comp-arch.net

If you are reading this elsewhere, e.g. at site waboba.info, it is an unauthorized copy, and probably a malware site.
comp-arch.net wiki on hold from October 17, 2011

Topics for CompArch wiki

From CompArch

Jump to: navigation, search

There are currently 455 articles in this wiki. (Some may be support, infrastructure, or administrivia.)

Contents

What should be on this wiki

Topics for CompArch wiki

This is not expected to be a full or final list. Rather, it is hoped to be just a start - topics that I want to write essays on.

See What Kind of Topics for CompArch Wiki?

See Bad, Good, and Middling Ideas in Computer Architecture

See Ideas As To Where To Get CompArch Wiki Topics


See Wanted Pages, with discussion


WikiWish: I wish that this was a clickable, folding/unfolding, list. (TBD: choose one of the many, many, examples of such code.) Currently this is an odd mix of sections and lists. I really want it to be a TopicTree.

Categories for CompArch wiki

In addition to the topics, listed in something like a Table of contents (TOC) below, I will also try to take advantage of mediawiki's [1] to organize the pages.

The mediawiki categories are NOT a substitute for a Table of contents (TOC) or TopicTree. They are just convenient, an add-on, more easily maintained in some ways. They are not a complete tree: they are only provided for topics that experience has shown have many subtopic pages.

The categories are arranged as follows;

Manual

Often enough new categories not above may be created - see Special:Categories.

Topic Related Categories


Parallel programming and synchronization often includes aspects of both ISA and uarch:

Corresponding topic pages

Parallel programming and synchronization often includes aspects of both ISA and uarch:


Incoming Topics

Topics that I have not yet organized into the tree. Typically these are topics that I have sketched out in my head, possibly even completely written, but which I have not yet had the time to enter in yet.

See TBD.

See also Vocabulary to-do list.

The Vocabulary to-do list is more a list of items. Whereas this Incoming Topics list is usually a list of mini-essays or discussions

May 24, 2013

April 17, 2012

April 3, 2012

Page tables

Workloads and Benchmarks

Old


  • Unusual Operations
delt - Sun SX pixel processor. (1995). "Add four delta values cumulatively giving 4n-long vector."
I think this amounts to r0=s0+d0, r1=s1+d0+d1, r2=s2+d0+d1+d2, r3=s3+d0+d1+d2+d3
I.e. sum prefix of the delta vector, and a vector-vector add, i.e. vadd(s+sumprefix(d))
Q what is it used for?
Or, sumprefix(s+d)
plot - "Bresenham interpolation to vector"
  • Intel genX plane, line, blend


What is Computer Architecture?

See Computer Architecture: Different Levels of Abstraction

Laws of Computer Architecture

There are few fundamental laws in computer architecture. The most famous of such laws are observational or predictive.

Moore's Law

Not really fundamental.

Originally presented in "Cramming more components onto integrated circuits" (Electronics, Volume 38, Number 8, April 19, 1965) as "The complexity for minimum component costs has in- creased at a rate of roughly a factor of two per year" where complexity is measured by component count.

Usually phrased as a law of exponential growth in computing capabilities (of silicon), e.g. the number of transistors doubles every year (or 18 months, or...), e.g. the density of transistors increases similarly, e.g. the speed of transistors (or processors) increases similarly.

Of late, circa 2000, various aspects of Moore's Law(s) have fallen away.


See wikipedia page.

TBD: ref

Pollack's Law aka Square Root Law or N-th Root Laws in Computer Architecture

Empirical: speedup is proportional to the square root of the number of devices.

Some possible theoretical basis in terms of communications latency: if N processors are packed into a volume proportional to N in a space of dimension D, if there is a fixed maximum speed of communication (e.g. the speed of light), then the diameter in time is proportional to the D-th root of N.

Maximum speedup might then generously be supposed to be N/D-th-root-of-N, or N^(1-1/D). Sublinear, but not that bad.

In the 2D layouts common in VLSI, this is a square-root law.

Amdahl's Law

Gene Amdahl, Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities, AFSIPS Conf Proc, 30(8):483-485, 1967

For fixed work, if the fraction of serial work is s and hence parallel work is (1-s), then speedup is

 1/ ( (1-s)/N + s )

and, as N->infinity, speedup is at most 1/s

This law is often applied to fractional improvements other than speedup through parallelism such as power or cost or speedup through other methods. E.g., if an improvement of components that are responsible for 60% of energy consumption (cf. parallel portion, s=0.4) makes those components ten times more energy efficient (N=10), then the improvement factor is 2.17 (1/ (0.6/10 + 0.4) or a mere 117% improvement from a ten-fold improvement in a substantial majority of the system). The formula is a basic check on expected system improvement from improvements in portions of the system.

Gustafson's Law

J. Gustafson, Reevaluating Amdahl's Law, CACM 31(5):532-533, 1988.

Observing that we seldom keep the amount of work fixed as we scale up parallelism, but instead increase the problem size.

Speedup = (np + s) / (p+s)

-> n*( 1/(1+s/p) )

as n-> infinity


(Note that incorporating a term for communications cost limits this to sublinear, as in N-th Root Laws in Computer Architecture:


Speedup = (np + s) / (p+s + c*D-th-root-of-N )

-> N^(1-1/D)

as n-> infinity

Instruction Set Architecture





Register Architecture

Basic Instruction Concepts

  • Weakly defined behavior ("Here there be dragons"; data cache block allocate can have such--not deterministically zeroing unwritten bytes--, x86's BSR leaves the destination register and all but the zero flag undefined for Intel implementations if the source operand is zero while AMD defines that the destination register will remain unchanged if the source register value is zero.)
    • Undefined behavior (This can include requiring a reset for execution to continue.)
    • Unpredictable behavior (Sometimes defined as "equivalent in effect to any arbitrary sequence of instructions"; as such it will not represent a security hole for unprivileged software; often further constrained)
    • Implementation-specific behavior (Implementation defined behavior might be distinct in indicating that the architecture guarantees that an implementation will define a consistent behavior rather than saying that an implementation is permitted to define the behavior.)
    • Subarchitecture defined behavior (Useful for a broader range of implementations)
    • Unstable behavior (MIPS: "software may depend on the fact that a sampling of an UNSTABLE value results in a legal transient value that was correct at some point in time prior to the sampling.")


Basic Flow Control Instructions

Funky control-flow stuff:

Basic Data Types

  • Addresses / Pointers
  • Offsets

Less common

  • strings - usually strings are arrays of small byte-sized integers, counted or terminated. Possibly lists.
  • FP16
  • FP80, FP128
  • Complex Numbers - usually FP, sometimes integers

Basic Integer Instructions

Basic Floating Point Instructions

Basic System ISA

Basic Memory Reference Instructions

SIMD and vector ISAs

Special Instructions

  • ABS absolute value, and related instructions such as SAD, sum of absolute difference

String and Block Memory Operations

Extending an ISA

Extending an instruction set architecture is one of the fundamental tests of an ISA.

  1. Increasing Register Width: Overlaid vs. Extended
  2. Increasing Register Count

See API or instruction to decode an instruction

Instruction Set Groups

Commonly Seen:

Less Common:

Cross Cutting:

Instruction Set Extensions, a Rough History

This section presents recent significant instruction set extensions in an approximately historical order and context for major current companies.

TBD: extend far enough back into the past to see the wheel of reincarnation.

TBD: make predictions about instruction set futures.

Instruction Formats

Predicates

  • Predicate field in instruction

Data Types



Less standard:

ISA Examples

See Instruction Set Architectures, Lists of. See also Example Machines.

Notable but not typical

Microarchitecture

Modern Microarchitecture Examples

Let us walk through a modern microarchitecture. This provides us a place to hang certain topics.

or M - Map
or ROB - ReOrder Buffer

Hanging Topics Off the Pipeline

Now that we have generic names for such pipestages, we can hang some topics, ranging from simple to advanced, off them.

TBD: many, many, topics need to be hung up above.

TBD: what I really need is one such tree object for the pipeline, with multiple views - one view with little detail, one view with more. I like such "show the overview", "show the detail", "now zoom in". Automatically maintained to prevent inconsistencies. Unfortunately, this is yet another WikiWish, something I hope to add to this wiki in my copious spare time. Might be able to approximate by using one of Mediawiki's automatically generated TOCs.


BP - Branch Prediction

Branch Predictor Subcomponents

Branch prediction history

IF - Instruction Fetch

Trace BTB

ID - Instruction Decode


RR - Register Rename or M - Map

RD - Register Read

S - Schedule


EX - Execute

MEM - Memory Access, usually Data Cache Access


It is surprisingly challenging to maintain the ability to retire (commit) 1 store per cycle - let alone more. You cannot fall into the trap of initiating the store at retirement, and then waiting until it is a confirmed hit (in a writeback cache - let alone in store-through) before initiating the next.

Instruction Window

Pipelining

Parallelism


Predictors and Caches

Two standard techniques in computer architecture are predictors and caches. By this point in time these can be considered well known and obvious.

See the topic page for a design tree by me, preserved by Mark Smotherman.

See also Taxonomy of Patterns for Prefetchers

Varieties of Cache

VLSI Layout Issues

Arrays

One of the most common hardware data structure classes is the array. Specifically, array of memory elements, although arrays of logic and arrays of processing elements are also of interest.



I'm not sure that I like the term "RAM", Random Access Memory. Something like Fixed Address Memory might be more descriptive, since over the years such memories have become less and less randomly accessible


There are two forks in the discussion of CAM arrays.
  1. True CAMs - all of the flavors of CAMs
  2. The more basic question of how to implement a simple non-priority CAM - aka a cache with tags and data


Schedulers

One of the most common hardware functions is scheduling, picking, or prioritization. Not just in OOO schedulers. The scoreboards of in-order processors are really schedulers. But also in memory schedulers, selecting what memory access to handle next, etc.

Caches

Varieties of Cache

Cache Terminology

Memory Ordering

  1. Strong Ordering or Sequential Consistency
  2. Weak Consistency
    1. Eventual Consistency
  3. Intermediate Consistency
    1. Processor Consistency
    2. TSO
    3. Causal Consistency
  1. Message Passing Ordering
  1. Fence Instructions

Memory Optimizations

See also Memory coalescing

Synchronization









Interconnect

  1. Busses
  2. Interconnection Networks
    1. Rings
    2. Meshes
    3. Hierarchies
      1. Fat Trees, etc.

Virtual Memory

High Speed Design

Memory

Memory Alignment

Operating System and Other Privilege Architecture

Virtual Machines

Security

Controlling Bugs That Lead to Security Vulnerabilities

Security Architectures

Stack Protection

Since stack clobbering, buffer overflows on mixed data/return address stacks, is a common source of security problems, there have been many band-aid proposals that fix just this aspect of computer (in)security, including:

as well as some that particularly help this, although are of more general use

Fine Grain versus Coarse Grain Security


Small Security Friendly Features

Program Integrity

Coprocessors

Performance Measurement and Debugging Features

TBD


Embedded systems concerns

  • The four zeros of embedded systems (Nick Tredennick, "The Death of the DSP" [2000])
    • Zero cost (e.g., high volume systems for which cost optimizations can have a significant total impact)
    • Zero power (motivated by size, mobility, lack of serviceability, cost, etc. which limits battery capacity or encourages use of energy harvesting which is typically limiting in capability)
    • Zero delay (primarily high performance requirements given other constraints; real-time constraints)
    • Zero volume (?? systems with extreme demands on reliability, performance; space-rated systems might fit this category??)
  • Code density (This can influence cost, form factor, and power/energy use)
  • Tamper resistance (For security features or to protect trade secrets)
  • Scratchpad memory (primarily useful for real-time guarantees but power/energy and security concerns)
  • Long part lifetime (when redesign is expensive, especially with certification requirements)
  • ADC and DAC (to support sensors, motor control, etc.)
  • Memory Protection Unit (extremely low cost and predictable timing)
  • Watchdog timers (systems cannot rely on external detection and handling of soft failure)
  • Reliability (systems are often not serviceable and have high availability requirements)
    • Tolerance of temperature extremes and/or variability
    • Tolerance of Radio frequency interference
    • Tolerance of variability in power supply, ground, and signal voltages
      • Low voltage detection (prepare for safe shutdown, signal imminent failure, etc.)
      • Tolerance of electrostatic discharge
  • Integration (reduces costs, increases reliability, reduces size, reduces power)
    • Integrated persistent storage
    • Volatile memory (distinct from scratchpad memory mainly by lack of interface to additional external memory)
  • Special purpose interfaces
  • Support for higher voltage interfaces (for legacy systems or to reduce component count [cost, reliability, etc.])
  • Form factor considerations
  • Power use and Energy efficiency (e.g., radioisotope power supply can be more power limited than energy limited, chemical storage cells are more energy limited, similar issues can apply for energy harvesting)
    • Heat dissipation issues (lack of serviceability, size/weight constraints, environmental isolation, cost, etc. can constrain the thermal solutions that can be applied)

Reliability, Availability, and Serviceability (RAS)

  • Types of Errors
    • Fault (A state persistence or transition that violates the design intent; effectively errors at the implementation level which may or may not be exposed at the architectural level--e.g., a soft error in a branch direction predictor or in a register that will be overwritten before being read again.)
    • Error (A fault which becomes exposed at the architectural level.)
    • Temporary errors
      • Transient errors (typically not repeatable, usually associated with external causes such as radiation, EMI, power supply fluctuation)
      • Intermittent errors (related to operating conditions, theoretically repeatable but often involving complex interaction of conditions; can be an early warning of more persistent failure)
      • An intermittent condition can increase the vulnerability to transient errors, so the distinction between transient errors and intermittent errors may be less clear cut.
    • Persistent errors or Hardware failures
      • There may not be a clear cut point at which intermittent errors are considered hardware failures. (This may be somewhat analogous to the distinction between defects and process variation in manufacturing; some extremes of variation are clearly defects but a component with significant variation may still be usable even if voltage, frequency, endurance--rf. BubbleWrap Many-Core--, or other factors must be adjusted.)
  • Timing of error detection and error handling (Failing early can sometimes reduce the impact of failure or facilitate recovery; failing late can eliminate some errors and simplify attribution of the error)
  • Reliability through circuit design (This can include both avoiding faults like more resilient latches and increasing the probability of detection and possibly correction by higher level mechanisms--bit steering might be considered to be at this level.)
  • Reliability for Memory Arrays: ECC and Redundancy
    • Hashing address with data to detect addressing errors using data ECC
  • Reliability for Communication Channels: CRC
    • Channels with high error rates or high retransmission penalties benefit from ECC
  • Reliability for Combinatorial and Irregular Logic

Power and Energy


Code Density

  • Variable length instructions
    • parcel granularity
    • boundaries and packing rules (e.g., CDC 6600's 15-bit and 30-bit instructions in 60-bit word bundles, M32R's 32-bit bundles with one or two instructions)
  • Short instructions
  • Variable work instruction encoding (increasing work per instruction)
    • Optimizing call/return is particularly attractive as a common operation and a means of reducing the cost of procedural abstraction
  • Microcode call instruction (provides target and register operands which are moved to specific registers that are used in the microcode sequence and values of specific registers are moved back to the operand registers)
  • Echo instruction (a call with an implicit return after N instructions [or bytes])
  • Software methods (the importance of considering optimized code when considering hardware mechanisms to improve a metric)
    • Procedural abstraction
    • Path length versus local or global instruction count (e.g., increasing the path length for superscalar execution can reduce local instruction count, increasing local instruction count and path length can provide additional opportunities for procedural abstraction by making code sequences more general)

Integration

Advanced Topics

"Conventional" Alternative or Advanced Computer Architecture

See Andy Glew's Pet Topics in Computer Architecture for stuff that is especially interesting to me. Also BS: Bluesky, Brainstorming, Bullshit

Topics listed here are "conventional", in that many people have proposed them, often time and again.

The following are similarly often proposed topics, but these are topics I am more sympathetic to.

Less often proposed

Why I am more sympathetic to complex arithmetic than I am to interval arithmetic

Andy Glew's Pet Topics in Computer Architecture

Hey, this is my wiki - why can't I indulge my pet topics?

.


Asynchronous Logic

BS: Bluesky, Brainstorming, Bullshit

The boundary between an advanced topic and BS is variable.

Tools in the Toolkit

Hardware data structures



Tricks and hacks:

=

Example Machines

Try as I might to extract general principles, it always is good to be familiar with particular machines - particularly those particularly interesting for their innovation, but although those important just because they sell a lot.


Early era MPPs

Graphics and GPUs


Recent, circa 2010:

TBD

comp.arch post 8/12/2012, ISO important Example Machines

quite a few replies.

This list of examples is something I would like to have a public wiki for.

Topics for the Professional Computer Architect - and Wannabe

Design Principles and Rules of Thumb

Wannabe, Hobbyists, Amateurs and Students

Many more people are wannabe computer architects than get paid to it. Heck: I was a wannabe before it became my job, and I still am!

Many people have job descriptions that say computer architect, but are not.

Reading List for Computer Architecture

I am often asked for a recommended reading list. Might as well record it.

Terminology, Vocabulary, and Taxonomy




Personal tools
No more shadowing