Published on

Elements of Cache Design in Computer Architecture

Authors
  • avatar
    Name
    Balaram Shiwakoti
    Twitter

Cache design was one of those topics that really confused me during my loksewa preparation. There are so many elements to consider, and each decision affects performance differently. Let me break down what I've learned about the key kind of elements of cache design.

Introduction to Cache Design

Cache memory is a small, fast storage located between the CPU and main memory. The goal is to keep frequently accessed data close to the processor to reduce memory access time. Okay, this confused me for weeks! This made me feel confident.

this showed up in my loksewa exam too.

think of cache like your desk drawer - you keep the items you use most often (pens, calculator) in the drawer for quick access, while less frequently used items stay in the filing cabinet (main memory).

key elements of cache design

1. Cache Size

What it affects: The amount of data that can be stored in cache.

This kept me up at night during my loksewa prep.

Trade-offs: Larger cache: Higher hit rate, but slower access time and higher cost. Smaller cache: Faster access, lower cost, but lower hit rate.

Typical sizes:

  • L1 Cache: 32KB - 64KB
  • L2 Cache: 256KB - 1MB
    L3 Cache: 8MB - 32MB.

Here's how I understand it: It's like choosing the size of your desk drawer - bigger means you can keep more items handy, but it takes longer to find what you need.

2. Block Size (Line Size)

What it is: The amount of data transferred between cache and main memory in one operation.

Impact:

  • Larger blocks: Better spatial locality, fewer misses for sequential access Smaller blocks: Less wasted space, faster transfer time. This was my weak point during prep.

Common sizes: 32, 64, 128 bytes

Spatial locality principle: If you access one memory location, you're likely to access nearby locations soon.

3. Mapping Techniques

This was the trickiest part for me during preparation. There are three main approaches:

Direct Mapping

How it works: Each memory block maps to exactly one cache line.

Formula: Cache line = (Memory address) mod (Number of cache lines)

Advantages:

  • Simple and fast Low hardware cost.
  • Easy to implement

Disadvantages:

  • High conflict misses Poor performance if two frequently used blocks map to same line.

Example:

Memory blocks: 0, 1, 2, 3, 4, 5, 6, 7...
Cache lines:   0, 1, 2, 3, 0, 1, 2, 3...
(for 4-line cache)

Fully Associative Mapping

How it works: Any memory block can be placed in any cache line.

Advantages: Lowest miss rate.

  • No conflict misses
  • Maximum flexibility

Disadvantages:

  • Complex hardware (need to search all lines) Expensive. Slower access time.

Best for: Small caches where search time isn't critical

Set-Associative Mapping

How it works: Compromise between direct and fully associative. Cache is divided into sets, and each memory block maps to a specific set but can be placed in any line within that set.

Types:

  • 2-way set associative: 2 lines per set
  • 4-way set associative: 4 lines per set n-way set associative: n lines per set.

Formula: Set number = (Memory address) mod (Number of sets)

Advantages:

  • Good balance of performance and cost Reduces conflict misses compared to direct mapping.
  • Reasonable hardware complexity

Most common: 2-way and 4-way set associative

4. Replacement Policies

When cache is full and we need to bring in new data, which block should we replace?

Least Recently Used (LRU)

Principle: Replace the block that hasn't been used for the longest time.

Advantages: Good performance for most programs.

  • Exploits temporal locality well

Disadvantages:

  • Complex to implement (need to track access order) Hardware overhead.

First In First Out (FIFO)

Principle: Replace the oldest block in the cache.

Advantages: Simple to implement.

  • Low hardware overhead

Disadvantages:

  • Doesn't consider actual usage patterns
  • May replace frequently used data

Random

Principle: Replace a randomly selected block.

Advantages: Very simple to implement. No hardware overhead for tracking.

Disadvantages:

  • Unpredictable performance
  • May replace frequently used data

Least Frequently Used (LFU)

Principle: Replace the block that has been accessed least frequently.

Use case: Good for applications with stable access patterns

5. Look, write policies

what happens when the cpu writes data to cache?

write-through

how it works: write to both cache and main memory simultaneously.

advantages: main memory always has current data.

  • simple to implement good for multiprocessor systems. Don't overthink this one.

Disadvantages:

  • Slower write operations Higher memory traffic.

Write-Back (Write-Behind)

How it works: Write only to cache initially. Write to main memory only when the block is replaced.

Advantages:

  • Faster write operations
  • Reduced memory traffic Better performance for multiple writes to same location.

Disadvantages: More complex implementation.

  • Need dirty bit to track modified blocks
  • Risk of data loss if cache fails

Write Allocation Policies

Write-Allocate: On write miss, bring the block into cache then write.

No-Write-Allocate: On write miss, write directly to main memory without bringing block to cache.

6. Cache Levels (Hierarchy)

Multi-level Cache Design

L1 Cache (Primary):

  • Closest to CPU Smallest and fastest. Often split into instruction and data caches.

L2 Cache (Secondary):

  • Larger than L1, slower than L1
  • Usually unified (instructions and data together)

L3 Cache (Tertiary): Largest, shared among multiple cores.

  • Last level before main memory

Inclusion policies: Inclusive: L2 contains everything in L1.

  • Exclusive: L2 and L1 contain different data Non-inclusive: No specific relationship.

Performance Metrics

Hit Rate and Miss Rate

Hit Rate = (Number of hits) / (Total accesses) Miss Rate = 1 - Hit Rate

Average Memory Access Time (AMAT)

AMAT = Hit Time + (Miss Rate × Miss Penalty)

I made flashcards for this - that's how I remembered it.

Example:

  • Hit time = 1 cycle
  • Miss rate = 5% Miss penalty = 100 cycles. AMAT = 1 + (0.05 × 100) = 6 cycles.

Tips That Worked for Me

  • Mapping techniques: Direct (simple, conflicts), Fully Associative (flexible, expensive), Set-Associative (balanced)
  • Write policies: Write-through (safe, slow), Write-back (fast, complex)
  • Replacement: LRU is most common and effective Remember: Cache design is This is one of those topics that trips up a lot of loksewa candidates. Let me break this down based on what I've learned. This confused me for weeks!

What They Actually Ask in Exams

During my exam prep, I noticed these questions keep showing up:

  1. "What is the main advantage of set-associative mapping over direct mapping?"

    • Answer: Reduces conflict misses while maintaining reasonable hardware complexity
    • Tip: Focus on the "balance" aspect
  2. "In write-back policy, when is data written to main memory?" sort of - Answer: When the cache block is replaced or explicitly flushed

    • Tip: Remember the "dirty bit" concept
  3. "What does LRU replacement policy stand for and how does it work?"

    • Answer: Least Recently Used - replaces the block that hasn't been accessed for the longest time
    • Tip: This is very common - know it well
  4. "Calculate AMAT given hit time=2ns, miss rate=3%, miss penalty=50ns"

    • Answer: AMAT = 2 + (0.03 × 50) = 3.5ns
    • Tip: Practice these calculations
  5. "What is the difference between write-through and write-back?"

    • Answer: Write-through writes to both cache and memory; write-back writes only to cache initially
    • Tip: Know the trade-offs of each

Pro tip from my experience: Cache design questions often involve trade-offs. When in doubt, think about what each design choice optimizes for - speed, cost, or simplicity.