Andy Glew's comp-arch.net wiki, http://semipublic.comp-arch.net
If you are reading this elsewhere, e.g. at site waboba.info, it is an unauthorized copy, and probably a malware site.
comp-arch.net wiki on hold from October 17, 2011
Topics for CompArch wiki
From CompArch
There are currently 455 articles in this wiki. (Some may be support, infrastructure, or administrivia.)
What should be on this wiki
Topics for CompArch wiki
This is not expected to be a full or final list. Rather, it is hoped to be just a start - topics that I want to write essays on.
See What Kind of Topics for CompArch Wiki?
See Bad, Good, and Middling Ideas in Computer Architecture
See Ideas As To Where To Get CompArch Wiki Topics
See Wanted Pages, with discussion
WikiWish: I wish that this was a clickable, folding/unfolding, list. (TBD: choose one of the many, many, examples of such code.) Currently this is an odd mix of sections and lists. I really want it to be a TopicTree.
Categories for CompArch wiki
In addition to the topics, listed in something like a Table of contents (TOC) below, I will also try to take advantage of mediawiki's [1] to organize the pages.
The mediawiki categories are NOT a substitute for a Table of contents (TOC) or TopicTree. They are just convenient, an add-on, more easily maintained in some ways. They are not a complete tree: they are only provided for topics that experience has shown have many subtopic pages.
The categories are arranged as follows;
- Special:UncategorizedCategories gives the top level categories - although some may have themselves as their own category
Manual
- Topic Related
- Single Topic
- Topic discussions, comparisons, opinions - Category:Discussion, Comparison, and Opinion
- Blue sky, brainstorming, b***t - Category:BS
- Category:Terminology
Often enough new categories not above may be created - see Special:Categories.
Topic Related Categories
- Category:non-Computer Architecture - some non-computer architecture topics will inevitably arise
- Category:Computer Architecture - the main topic of this wiki
- Category:Macroarchitecture - that which is externally visible
- Category:Instruction set architecture (ISA) - the contrast between programmer and machine
- Category:Vector ISA
- Category:System ISA - operating systems, virtual machines, interrupts, etc.
- Category:Funky Instructions
- other aspects of macroarchitecture include Pin Architecture, Interconnect Architecture
- Category:Instruction set architecture (ISA) - the contrast between programmer and machine
- Category:Microarchitecture
- Category:Macroarchitecture - that which is externally visible
Parallel programming and synchronization often includes aspects of both ISA and uarch:
- Category:Parallel
- Category:Multiprocessing
- Category:Multithreading
- Category:Synchronization - often instructions, sometimes hardware
Corresponding topic pages
- Computer Architecture - the main topic of this wiki
- Macroarchitecture - that which is externally visible
- Instruction set architecture (ISA) - the contract between programmer and machine
- Vector ISA
- System ISA - operating systems, virtual machines, interrupts, etc.
- Funky Instructions
- other aspects of macroarchitecture include Pin Architecture, Interconnect Architecture
- Instruction set architecture (ISA) - the contract between programmer and machine
- Macroarchitecture - that which is externally visible
Parallel programming and synchronization often includes aspects of both ISA and uarch:
- Category:Parallel
- Category:Multiprocessing
- Category:Multithreading
- Category:Synchronization - often instructions, sometimes hardware
- non-Computer Architecture - some non-computer architecture topics will inevitably arise
Incoming Topics
Topics that I have not yet organized into the tree. Typically these are topics that I have sketched out in my head, possibly even completely written, but which I have not yet had the time to enter in yet.
See TBD.
See also Vocabulary to-do list.
The Vocabulary to-do list is more a list of items. Whereas this Incoming Topics list is usually a list of mini-essays or discussions
May 24, 2013
April 17, 2012
- Speculative execution I can't believe I have gone this long without defining it.
April 3, 2012
Workloads and Benchmarks
Old
- TLB topics
- Why saying "You need N-way associativity in the TLB to guarantee forward progress" is bogus and reflective of an in-order mindset
- Large pages versus large page tables
- Multilevel TLBs
- Sharing the L_n TLB between multiple processors
- Not just multilevel TLBs: different functionality
- Routing and certain pagefaults in close in TLBs
- Rearrangement of physical memory in far-out TLBs, e.g. near the memory controllers
- Unusual Operations
- delt - Sun SX pixel processor. (1995). "Add four delta values cumulatively giving 4n-long vector."
- I think this amounts to r0=s0+d0, r1=s1+d0+d1, r2=s2+d0+d1+d2, r3=s3+d0+d1+d2+d3
- I.e. sum prefix of the delta vector, and a vector-vector add, i.e. vadd(s+sumprefix(d))
- Q what is it used for?
- Or, sumprefix(s+d)
- plot - "Bresenham interpolation to vector"
- Intel genX plane, line, blend
What is Computer Architecture?
See Computer Architecture: Different Levels of Abstraction
Laws of Computer Architecture
There are few fundamental laws in computer architecture. The most famous of such laws are observational or predictive.
Moore's Law
Not really fundamental.
Originally presented in "Cramming more components onto integrated circuits" (Electronics, Volume 38, Number 8, April 19, 1965) as "The complexity for minimum component costs has in- creased at a rate of roughly a factor of two per year" where complexity is measured by component count.
Usually phrased as a law of exponential growth in computing capabilities (of silicon), e.g. the number of transistors doubles every year (or 18 months, or...), e.g. the density of transistors increases similarly, e.g. the speed of transistors (or processors) increases similarly.
Of late, circa 2000, various aspects of Moore's Law(s) have fallen away.
See wikipedia page.
TBD: ref
Pollack's Law aka Square Root Law or N-th Root Laws in Computer Architecture
Empirical: speedup is proportional to the square root of the number of devices.
Some possible theoretical basis in terms of communications latency: if N processors are packed into a volume proportional to N in a space of dimension D, if there is a fixed maximum speed of communication (e.g. the speed of light), then the diameter in time is proportional to the D-th root of N.
Maximum speedup might then generously be supposed to be N/D-th-root-of-N, or N^(1-1/D). Sublinear, but not that bad.
In the 2D layouts common in VLSI, this is a square-root law.
Amdahl's Law
Gene Amdahl, Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities, AFSIPS Conf Proc, 30(8):483-485, 1967
For fixed work, if the fraction of serial work is s and hence parallel work is (1-s), then speedup is
1/ ( (1-s)/N + s )
and, as N->infinity, speedup is at most 1/s
This law is often applied to fractional improvements other than speedup through parallelism such as power or cost or speedup through other methods. E.g., if an improvement of components that are responsible for 60% of energy consumption (cf. parallel portion, s=0.4) makes those components ten times more energy efficient (N=10), then the improvement factor is 2.17 (1/ (0.6/10 + 0.4) or a mere 117% improvement from a ten-fold improvement in a substantial majority of the system). The formula is a basic check on expected system improvement from improvements in portions of the system.
Gustafson's Law
J. Gustafson, Reevaluating Amdahl's Law, CACM 31(5):532-533, 1988.
Observing that we seldom keep the amount of work fixed as we scale up parallelism, but instead increase the problem size.
Speedup = (np + s) / (p+s)
-> n*( 1/(1+s/p) )
as n-> infinity
(Note that incorporating a term for communications cost limits this to sublinear, as in N-th Root Laws in Computer Architecture:
Speedup = (np + s) / (p+s + c*D-th-root-of-N )
-> N^(1-1/D)
as n-> infinity
Instruction Set Architecture
Register Architecture
Basic Instruction Concepts
- Number of Operands in an Instruction
- The usual terms (which I dislike):
- 3 operand - usually 2 input, 1 output
- 2 operand - usually 1 input/output, 1 input
- 4 operand - usually 1 output, 3 input
- 1 operand - usually 1 output or 1 input/output
- Instructions with more than 2 inputs, 1 output
- The usual terms (which I dislike):
- Special Instructions vs. Synthesized
- NOP
- MOV
- CLR (Clear Register)
- Faulting Prefetch
- Error Reporting
- Weakly defined behavior ("Here there be dragons"; data cache block allocate can have such--not deterministically zeroing unwritten bytes--, x86's BSR leaves the destination register and all but the zero flag undefined for Intel implementations if the source operand is zero while AMD defines that the destination register will remain unchanged if the source register value is zero.)
- Undefined behavior (This can include requiring a reset for execution to continue.)
- Unpredictable behavior (Sometimes defined as "equivalent in effect to any arbitrary sequence of instructions"; as such it will not represent a security hole for unprivileged software; often further constrained)
- Implementation-specific behavior (Implementation defined behavior might be distinct in indicating that the architecture guarantees that an implementation will define a consistent behavior rather than saying that an implementation is permitted to define the behavior.)
- Subarchitecture defined behavior (Useful for a broader range of implementations)
- Unstable behavior (MIPS: "software may depend on the fact that a sampling of an UNSTABLE value results in a legal transient value that was correct at some point in time prior to the sampling.")
Basic Flow Control Instructions
- Absolute Jump
- Conditional Relative Branches
- Indirect Jump
- Call/Return Instructions
- Return and adjust stack pointer (MIPS16, microMIPS; fits two register read, one register write constraint when Instruction Pointer/Program Counter is not a GPR, suited to the common case of releasing a small constant-sized stack frame when a frame pointer is not used)
- Where to put the return address
- Save the return address at a fixed offset from the stack pointer, possibly incrementing the stack pointer (x86)
- Save the return address to a special link register (e.g., Power)
- Save the return address to a dedicated GPR (MIPS, SPARC, ARM)
- Save the return address to a specified GPR (Alpha)
- Save the return address onto a separate return address stack (stack processors: Chuck Moore's c18 and Bernd Paysan's b16)
- RISCifying CALL/RETurn
- Jump and Link
- Background Call/Return Stuff
- Position Independent Code
- Call/Return Trivia
- Advanced Call/Return Stuff
Funky control-flow stuff:
- Come-From Instruction
- Branch-on-random (Instruction proposed by Edward Lee and Craig Zilles to reduce the overhead of sampling-based program instrumentation.)
- Jump using table Jump and link using table Conditional jump using table (merging the table look-up instruction with an indirect jump, indirect jump and link, or conditional indirect jump)
- Conditional indirect branch (TBD should this be called 'branch' or 'jump'?)
Basic Data Types
- Integer, fixed point
- Floating point (FP) - 32bit, 64bit
- Addresses / Pointers
- Offsets
Less common
- strings - usually strings are arrays of small byte-sized integers, counted or terminated. Possibly lists.
- FP16
- FP80, FP128
- Complex Numbers - usually FP, sometimes integers
Basic Integer Instructions
- Shift Instructions
- Special Integer Instructions
- Divide by 2^n
- Saturating Arithmetic
- Fixed Point
- Population count
- Find First Bit
- high, low, signed high (signed low)
Basic Floating Point Instructions
Basic System ISA
Basic Memory Reference Instructions
- Load and Store instructions
- Load-op
- Load-op-store or Operate-on-memory instructions
- Addressing modes
SIMD and vector ISAs
Special Instructions
- Hint instructions and/or hint flavors of non-hint instructions
- ABS absolute value, and related instructions such as SAD, sum of absolute difference
- Special Control Flow
String and Block Memory Operations
- Find special byte in register
- Block Copy Instructions
- BitBlt and Compositing
- Compress and Crypto
- XML instructions - just say no!!
Extending an ISA
Extending an instruction set architecture is one of the fundamental tests of an ISA.
See API or instruction to decode an instruction
Instruction Set Groups
Commonly Seen:
- DSP Instructions
- Video Instructions
- Vector Instructions
- Typical Embedded Instructions
- Parallel Processing Instructions
- Cryptographic Instructions
- Coprocessor Instructions
- Message Passing Instructions
- Cache Control Instructions
- Cache Prefetch Instructions
- Parallel Programming and Synchronization Instructions
Less Common:
- Reconfigurable Logic Instructions
- Logarithmic Number System Instructions
- Fixed Point Arithmetic Instructions
- Block Floating Point Instructions
- Micro-optimization primitive instructions
Cross Cutting:
- Hint instructions and/or hint flavors of non-hint instructions
Instruction Set Extensions, a Rough History
This section presents recent significant instruction set extensions in an approximately historical order and context for major current companies.
TBD: extend far enough back into the past to see the wheel of reincarnation.
TBD: make predictions about instruction set futures.
Instruction Formats
- Fixed Length Instructions
- Standard RISC 32 or 64 bit instructions
- 16-bit instruction RISC
- Long instructions: 128 bit
- Funky Fixed Width Instructions like 24, 40, or 42 bits
- Variable Length Instructions
- Why x86 is easier to decode than VAX or Motorola 68K
- ARM Thumb and other RISC compression
- Generic Instruction Set Compression
- Immediates as a primary motivation for variable length instructions
- Heidi Pan's Heads and Tails (fixed width packet with fixed width instruction portions packed left to right and variable length additions packed right to left)
- fixed width packet filled with instructions of two lengths (CDC 6600, M32R)
- VLIW: multiple operations per instruction packet
- Fixed width VLIW
- Variable Length VLIW
- Parallel vs. sequential Semantics in VLIW
- Non-adjacent Instruction Formats
- e.g. ATI R700 - control flow program controlling separate, non-adjacent, program "clauses" for exec, data fetch, etc.
Predicates
- Predicate field in instruction
- SKIP instructions
Data Types
Less standard:
ISA Examples
See Instruction Set Architectures, Lists of. See also Example Machines.
- RISC
- MIPS
- SPARC
- IBM Power
- ARM
- DEC Alpha
- Hitachi SuperH
- Defunct: AMD 29K
- Defunct: Motorola 88K
- Defunct: Intel i960 ([[2]] seems to indicate that it is still offered)
- Defunct: Intel i860
- CISC
- DSP
- VLIW/EPIC
- Intel Itanium
- Defunct: Intel i860
- Uncategorized
Notable but not typical
Microarchitecture
Modern Microarchitecture Examples
Let us walk through a modern microarchitecture. This provides us a place to hang certain topics.
- Classic RISC Five Stage Pipeline
- IF - Instruction Fetch
- ID - Instruction Decode
- EX - Execute
- MEM - Memory Access, usually Data Cache Access
- WB - Writeback
- Glew Opinion: the classic 5-stage pipeline should have 7-stages
- IF - Instruction Fetch
- ID - Instruction Decode
- RD - Register Read
- EX - Execute
- MEM - Memory Access, usually Data Cache Access
- RW - Register Write
- Typical OOO Pipeline
- BP - Branch Prediction
- IF - Instruction Fetch
- ID - Instruction Decode
- RR - Register Rename
- or M - Map
- RD - Register Read
- S - Schedule
- EX - Execute
- MEM - Memory Access, usually Data Cache Access
- RW - Register Write
- IW - Instruction Window
- or ROB - ReOrder Buffer
Hanging Topics Off the Pipeline
Now that we have generic names for such pipestages, we can hang some topics, ranging from simple to advanced, off them.
TBD: many, many, topics need to be hung up above.
TBD: what I really need is one such tree object for the pipeline, with multiple views - one view with little detail, one view with more. I like such "show the overview", "show the detail", "now zoom in". Automatically maintained to prevent inconsistencies. Unfortunately, this is yet another WikiWish, something I hope to add to this wiki in my copious spare time. Might be able to approximate by using one of Mediawiki's automatically generated TOCs.
BP - Branch Prediction
Branch Predictor Subcomponents
- Branch direction predictor (BDP), branch identifier (BID)
- Branch target buffer (BTB), Branch target predictor, Indirect Branch Predictor
- Return predictor (Often referred to as a Return address stack (RAS), return stack predictor, or return predictor stack)
- Multilevel Branch Predictors
- Trace BTB, Unrolled BTB
- Branch Predictors for Trace Caches, trace predictors
- taken/not-taken branch prediction history (TNT), stew
- Managing branch prediction history: copying short history versus pointing to large history
- Local versus global branch history
- TNT versus path history
IF - Instruction Fetch
- Trace Cache, particularly Instruction Trace caches
ID - Instruction Decode
RR - Register Rename or M - Map
- Register Renamer Port Reduction
- Register Renamer Dependency Comparison Circuitry
- Multithreaded Register Renaming
RD - Register Read
- Register File Port Reduction
- Read after schedule versus read before schedule
- Register file cache
- Incomplete Bypass Network
S - Schedule
EX - Execute
MEM - Memory Access, usually Data Cache Access
- Techniques for reducing memory access latency
- Techniques for reducing memory access energy
- Techniques for increasing memory access bandwidth
- Sum Addressed Memory
- Memory registerification and memory renaming
- Semantically Specialized Caches
- Todd Austin's Knapsack
- Stack caches
- Pointer caches / object caches / item caches
- Data trace caches
- It is surprisingly challenging to maintain the ability to retire (commit) 1 store per cycle - let alone more. You cannot fall into the trap of initiating the store at retirement, and then waiting until it is a confirmed hit (in a writeback cache - let alone in store-through) before initiating the next.
Instruction Window
Pipelining
- In-order
- In-order Pipelined
- Speculative Execution
- Out-of-Order
Parallelism
- Conventional MP
- SMT, SoEMT, and other hardware threading
- Thread Forking
- GPUs, SIMT, Coherent Threading
Predictors and Caches
Two standard techniques in computer architecture are predictors and caches. By this point in time these can be considered well known and obvious.
See the topic page for a design tree by me, preserved by Mark Smotherman.
See also Taxonomy of Patterns for Prefetchers
VLSI Layout Issues
Arrays
One of the most common hardware data structure classes is the array. Specifically, array of memory elements, although arrays of logic and arrays of processing elements are also of interest.
- I'm not sure that I like the term "RAM", Random Access Memory. Something like Fixed Address Memory might be more descriptive, since over the years such memories have become less and less randomly accessible
- There are two forks in the discussion of CAM arrays.
- True CAMs - all of the flavors of CAMs
- The more basic question of how to implement a simple non-priority CAM - aka a cache with tags and data
- Separate Tag and Data Arrays - or Combined
- Direct mapped
- Fully Associative - comparisons done at each entry
- N-way Associative
- Tag-sequential - read N tags, compare, and select data way to read
- In array tag comparison
Schedulers
One of the most common hardware functions is scheduling, picking, or prioritization. Not just in OOO schedulers. The scoreboards of in-order processors are really schedulers. But also in memory schedulers, selecting what memory access to handle next, etc.
- In-order
- Out-of-order
- Scheduler Elements and Toolkits
Caches
- Cache structure
- Cache indexing and Cache tagging
- Fully associative caches
- see also CAMs, and the terminology discussion CAMs versus caches
- N-way associative caches
- Cache banks
- Skewed associativity
- Sectored caches
Memory Ordering
- Strong Ordering or Sequential Consistency
- Weak Consistency
- Intermediate Consistency
Memory Optimizations
See also Memory coalescing
Synchronization
Interconnect
Virtual Memory
- TLBs
- Page table Structure
- Conventional Tree Structured Page Tables
- Hashed Page Tables
- Folded Virtual Memory
- Special Virtual Memory Issues
- Virtual Memory Offshoots
- Single Address Space like IBM AS400
- The Protection Lookaside Buffer
- Virtual Memory Issues
- Fine Grain Protection
High Speed Design
- Why I prefer to say Speed rather than High Frequency
- Increasing Speed of Transistors
- Increasing Speed of Logic Design
- Microarchitecture for High Speed
- Arguments Against High Speed Design
Memory
Memory Alignment
- DRAM Alignment
- Burst / Block Transfers and Caches
- Aligned Cache Accesses
- Alignment Network Complexity
- Why Vectors Increase Frequency of Misalignment
- Instruction Set Support for Misalignment
- Scatter/Gather Support in the Memory Subsystem
- Alignment and Memory Ordering
Operating System and Other Privilege Architecture
- Privilege Levels
Virtual Machines
Security
Controlling Bugs That Lead to Security Vulnerabilities
- Hardware and/or Software
- Software Mainly
Security Architectures
- The Ubiquitous User/Kernel
- Why More Rings Have Failed - And Keep Getting Reinvented
- ACLs vs Capabilities
- Secure Up-calls
- Memory Protection Unit (MPU) - Memory protection without address translation
- Page groups - PA-RISC and Itanium; can support distinct memory protection within the same address space
- User/Kernel versus Virtual Machines
- Secure Execution Modes versus Virtual Machines
- Memory keys (Tags associated with physical memory; IBM mainframe's early protection mechanism--not sure of name.)
Stack Protection
- Since stack clobbering, buffer overflows on mixed data/return address stacks, is a common source of security problems, there have been many band-aid proposals that fix just this aspect of computer (in)security, including:
as well as some that particularly help this, although are of more general use
Fine Grain versus Coarse Grain Security
Small Security Friendly Features
- Stack Direction
- Separate stacks for return addresses and stack frame data
- e.g. Itanium
- Stack defined by SW, not ISA
- Instruction encodings to minimize executing data as code
Program Integrity
Coprocessors
- Coprocessors and Virtual Memory
- Access to Page Tables
- IOMMU
- Coprocessors and TLBs, including TLB consistency
Performance Measurement and Debugging Features
Embedded systems concerns
- The four zeros of embedded systems (Nick Tredennick, "The Death of the DSP" [2000])
- Zero cost (e.g., high volume systems for which cost optimizations can have a significant total impact)
- Zero power (motivated by size, mobility, lack of serviceability, cost, etc. which limits battery capacity or encourages use of energy harvesting which is typically limiting in capability)
- Zero delay (primarily high performance requirements given other constraints; real-time constraints)
- Zero volume (?? systems with extreme demands on reliability, performance; space-rated systems might fit this category??)
- Code density (This can influence cost, form factor, and power/energy use)
- Tamper resistance (For security features or to protect trade secrets)
- Scratchpad memory (primarily useful for real-time guarantees but power/energy and security concerns)
- Long part lifetime (when redesign is expensive, especially with certification requirements)
- ADC and DAC (to support sensors, motor control, etc.)
- Memory Protection Unit (extremely low cost and predictable timing)
- Watchdog timers (systems cannot rely on external detection and handling of soft failure)
- Reliability (systems are often not serviceable and have high availability requirements)
- Tolerance of temperature extremes and/or variability
- Tolerance of Radio frequency interference
- Tolerance of variability in power supply, ground, and signal voltages
- Low voltage detection (prepare for safe shutdown, signal imminent failure, etc.)
- Tolerance of electrostatic discharge
- Integration (reduces costs, increases reliability, reduces size, reduces power)
- Integrated persistent storage
- Volatile memory (distinct from scratchpad memory mainly by lack of interface to additional external memory)
- Special purpose interfaces
- Support for higher voltage interfaces (for legacy systems or to reduce component count [cost, reliability, etc.])
- Form factor considerations
- Power use and Energy efficiency (e.g., radioisotope power supply can be more power limited than energy limited, chemical storage cells are more energy limited, similar issues can apply for energy harvesting)
- Heat dissipation issues (lack of serviceability, size/weight constraints, environmental isolation, cost, etc. can constrain the thermal solutions that can be applied)
Reliability, Availability, and Serviceability (RAS)
- Types of Errors
- Fault (A state persistence or transition that violates the design intent; effectively errors at the implementation level which may or may not be exposed at the architectural level--e.g., a soft error in a branch direction predictor or in a register that will be overwritten before being read again.)
- Architectural Vulnerability Factor (The likelihood that a fault will generate an error.)
- Error (A fault which becomes exposed at the architectural level.)
- Temporary errors
- Transient errors (typically not repeatable, usually associated with external causes such as radiation, EMI, power supply fluctuation)
- Intermittent errors (related to operating conditions, theoretically repeatable but often involving complex interaction of conditions; can be an early warning of more persistent failure)
- An intermittent condition can increase the vulnerability to transient errors, so the distinction between transient errors and intermittent errors may be less clear cut.
- Persistent errors or Hardware failures
- There may not be a clear cut point at which intermittent errors are considered hardware failures. (This may be somewhat analogous to the distinction between defects and process variation in manufacturing; some extremes of variation are clearly defects but a component with significant variation may still be usable even if voltage, frequency, endurance--rf. BubbleWrap Many-Core--, or other factors must be adjusted.)
- Fault (A state persistence or transition that violates the design intent; effectively errors at the implementation level which may or may not be exposed at the architectural level--e.g., a soft error in a branch direction predictor or in a register that will be overwritten before being read again.)
- Timing of error detection and error handling (Failing early can sometimes reduce the impact of failure or facilitate recovery; failing late can eliminate some errors and simplify attribution of the error)
- Reliability through circuit design (This can include both avoiding faults like more resilient latches and increasing the probability of detection and possibly correction by higher level mechanisms--bit steering might be considered to be at this level.)
- Reliability for Memory Arrays: ECC and Redundancy
- Hashing address with data to detect addressing errors using data ECC
- Reliability for Communication Channels: CRC
- Channels with high error rates or high retransmission penalties benefit from ECC
- Reliability for Combinatorial and Irregular Logic
Power and Energy
- Spatial Aspects of Power Management: Domains
- this naturally causes me to wonder of there is any use to Temporal Aspects of Power Management: Modes. Hmm, seems that there is, and, in fact, power management modes came first.
Code Density
- Variable length instructions
- parcel granularity
- boundaries and packing rules (e.g., CDC 6600's 15-bit and 30-bit instructions in 60-bit word bundles, M32R's 32-bit bundles with one or two instructions)
- Short instructions
- 16-bit instruction RISC
- 24-bit fixed-length instruction ISA (XTensa without VLE extension)
- Variable work instruction encoding (increasing work per instruction)
- Optimizing call/return is particularly attractive as a common operation and a means of reducing the cost of procedural abstraction
- Register stack
- Load/store multiple registers
- Return and adjust stack pointer (actually still two register sources and one register destination operation if immediate adjust is used)
- Optimizing call/return is particularly attractive as a common operation and a means of reducing the cost of procedural abstraction
- Microcode call instruction (provides target and register operands which are moved to specific registers that are used in the microcode sequence and values of specific registers are moved back to the operand registers)
- Echo instruction (a call with an implicit return after N instructions [or bytes])
- Software methods (the importance of considering optimized code when considering hardware mechanisms to improve a metric)
- Procedural abstraction
- Path length versus local or global instruction count (e.g., increasing the path length for superscalar execution can reduce local instruction count, increasing local instruction count and path length can provide additional opportunities for procedural abstraction by making code sequences more general)
Integration
- integration is obvious:
- integration is obvious (except when it is not).
- Levels of integration
- Chip-level integration
- Wafer scale integration
- Chip stacking or 3D integration
- Package level integration
- Chip carriers such as IBM's ceramic thermal modules
- Board level integration
- miniature boards carrying several chips
- Blade level integration
- Mesa-level integration
- Integrating logic and memory
Advanced Topics
"Conventional" Alternative or Advanced Computer Architecture
See Andy Glew's Pet Topics in Computer Architecture for stuff that is especially interesting to me. Also BS: Bluesky, Brainstorming, Bullshit
Topics listed here are "conventional", in that many people have proposed them, often time and again.
The following are similarly often proposed topics, but these are topics I am more sympathetic to.
- Complex arithmetic
- Matrix-matrix and matrix-vector operations
- LNS (logarithmic number systems)
- Generic multi-component arithmetic:
- Complex arithmetic - very sympathic
- Rational numbers - less sympathetic
- Interval arithmetic - less sympathetic
Less often proposed
Why I am more sympathetic to complex arithmetic than I am to interval arithmetic
Andy Glew's Pet Topics in Computer Architecture
Hey, this is my wiki - why can't I indulge my pet topics?
- Speculative Multithreading (SpMT)
- Log-based computer architecture
- Multilevel Branch Predictors
- Dynamic Instruction Rewriting
- Symbolic Memory Disambiguation
- Bit Munging Instructions
- MLP Memory Level Parallelism
- Capability Architecture
- Large Instruction Windows
- Coherent Threading
- Eliminating Unnecessary Read for Ownerships
- Block Memory Operation Agenda
- Lightweight Threading
- Poor Man's ECC
- Multicluster Multithreading (MCMT)
- Alphabet Soup: a Collection of Microarchitectures
.
- Take every optimization in the Dragon compiler book and consider it in hardware
Asynchronous Logic
BS: Bluesky, Brainstorming, Bullshit
The boundary between an advanced topic and BS is variable.
- BS: A Scatter/Gather based Memory System and Interconnection Fabric
- BS: Bitmask Coherency for Writeback Caches
- Approximate and stochastic computation (particularly use of analog computation)
Tools in the Toolkit
Tricks and hacks:
=
Example Machines
Try as I might to extract general principles, it always is good to be familiar with particular machines - particularly those particularly interesting for their innovation, but although those important just because they sell a lot.
- Manchester MU-5
- IBM mainframes: IBM 360 to IBM z-series
- IBM Stretch .. (shipped about 16 instances)
- IBM 360-91 OOO
- IBM 360-85 caches
- Burroughs computers
- Xerox Dragon and Alto
- LISP machines
- IBM minicomputers - IBM AS400 etc.
- Japanese supercomputers, such as Hitachi SX-1
- MIT computers
- commercialized
- Intel microprocessors
- Intel iAPX 432 - important failure of object oriented computer architecture
- Intel i860
- Intel i960
- Intel x86:
- Intel Itanium VLIW, not a great success
- Intel Larrabee and LRBNI
- Intel research machines - Borkhar's group
- IBM and Motorola PowerPC
- RISCs
- MIPS both academic and commercial
- Berkeley RISC-1, SOAR, SPUR
- SPARC
- ARM ...
- ARM7TDMI - breakthrough? when Texas Instruments licensed the ARM7-TDMI, which was designed into the Nokia 6110.
- also DEC StrongARM
- ARM Cortex A8 in order
- ARM Cortex A9 ooo
Early era MPPs
Graphics and GPUs
Recent, circa 2010:
TBD
comp.arch post 8/12/2012, ISO important Example Machines
quite a few replies.
This list of examples is something I would like to have a public wiki for.
Topics for the Professional Computer Architect - and Wannabe
Design Principles and Rules of Thumb
- Design Goals
- Economic factors in computer architecture
- virtuous cycles, vicious cycles, network effects
- Non-recurring expenses (NRE) and incremental costs
- Risk management (Capital intensive area with rapid rate of change)
Wannabe, Hobbyists, Amateurs and Students
Many more people are wannabe computer architects than get paid to it. Heck: I was a wannabe before it became my job, and I still am!
Many people have job descriptions that say computer architect, but are not.
Reading List for Computer Architecture
I am often asked for a recommended reading list. Might as well record it.
Terminology, Vocabulary, and Taxonomy