| CS698Y-Modern | Memory | Systems | |---------------|--------|---------| | m , 11 | | | Tutorial 1 10<sup>th</sup> August, 2017 Time Limit: 60 Minutes | Name: | | |---------|--| | rvaine. | | Roll No.: \_\_\_\_\_ - 1. You are trying to appreciate how important the principle of locality is in justifying the use of a cache memory, so you experiment with a computer having an L1 data cache and a main memory (you exclusively focus on data accesses). The latencies (in CPU cycles) of the different kinds of accesses are as follows: cache hit, 1 cycle; cache miss, 105 cycles; main memory access with cache disabled, 100 cycles. - (a) When you run a program with an overall miss rate of 5%, what will the average memory access time (in CPU cycles) be? (b) Next, you run a program specifically designed to produce completely random data addresses with no locality. Toward that end, you use an array of size 256 MB (all of it fits in the main memory). Accesses to random elements of this array are continuously made (using a uniform random number generator to generate the elements indices). If your data cache size is 64 KB, what will the average memory access time be? (c) If you compare the result obtained in part (b) with the main memory access time when the cache is disabled, what can you conclude about the role of the principle of locality in justifying the use of cache memory? (d) You observed that a cache hit produces a gain of 99 cycles (1 cycle vs. 100), but it produces a loss of 5 cycles in the case of a miss (105 cycles vs. 100). In the general case, we can express these two quantities as G (gain) and L (loss). Using these two quantities (G and L), identify the highest miss rate after which the cache use would be disadvantageous. Memory access time with no cache = Topp with cache = Ton miss vale = m. Ton = $$(1-m)$$ (Topp-G) + m (Topp+L) Cache becomes useless, when Topp(= $(1-m)$ ) (Topp-G) + m (Topp+L) m $7/(G)$ $7/(G)$ $7/(G)$ $7/(G)$ $104$ 2. Increasing a cache's associativity (with all other parameters kept constant), statistically reduces the miss rate. However, there can be pathological cases where increasing a cache's associativity would increase the miss rate for a particular workload. Consider the case of direct mapped compared to a two-way set associative cache of equal size. Assume that the set associative cache uses the LRU replacement policy. To simplify, assume that the block size is one word. Now construct a trace of word accesses that would produce more misses in the two-way associative cache. (Hint: Focus on constructing a trace of accesses that are exclusively directed to a single set of the two-way set associative cache, such that the same trace would exclusively access two blocks in the direct-mapped cache.) Trace ob $$(T_1, T_2, T_3)^{\infty}$$ ## 3. Memory Hierarchy. (a) Assume that we have a 32-bit processor (with 32-bit words) and that this processor is byte-addressed (i.e. addresses specify bytes). Suppose that it has a 512-byte cache that is two-way set-associative, has 4-word cache lines, and uses LRU replacement. Split the 32-bit address into "tag", "index", and "cache-line offset" pieces. Which address bits comprise each piece? (b) How many sets does this cache have? Explain. 16 (c) Below is a series of memory read references set to the cache from part (a). Assume that the cache is initially empty and classify each memory references as a hit or a miss. Identify each miss as either compulsory, conflict, or capacity. One example is shown. Hint: start by splitting the address into components. Show your work. | Address | $\mathrm{Hit/Miss?}$ | Miss Type? | |---------|----------------------|------------| | 0x300 | Miss | Compulsory | | 0x1BC | Miss | Compulsory | | 0x206 | )1 | 77 | | 0x109 | 1/ | 77 | | 0x308 | n | conflict | | 0x1A1 | 1) | Compulsory | | 0x1B1 | Het | , | | 0x2AE | Miss | compulson | | 0x3B2 | n | 11 | | 0x10C | Hrt | | | 0x205 | Miss | conflict | | 0x301 | h | 11 | | 0x3AE | )/ | Conflict | | 0x1A8 | 11 | Conflict | | 0x3A1 | Hrt | | | 0x1BA | Hit | | (d) Calculate the miss rate and hit rate. 4. Cache organization: Your company has an application that must be run as fast as possible. The hardware division of your company has come up with three separate first-level cache configurations: Machine I: Direct-mapped with one-word blocks Machine II: Direct-mapped with four-word blocks Machine III: Two-way set associative with four-word blocks For these machines, the cache fill penalty is 4 cycles + 1 cycle for each word. You did some experiments and measured the following instruction mix for the application: Branch: 16%, Load: 15%, Store: 10%, FloatInsts: 20%, Integer: 39% Further, through a hardware cache monitor, you measured the following miss rates: Machine I: Instruction miss rate: 4%; Data miss rate: 20% Machine II: Instruction miss rate: 2%; Data miss rate: 16% Machine III: Instruction miss rate: 1.5%; Data miss rate: 14% Finally, the total CPI measured with Machine I is 3.0. (a) Which of these machines spends the most time waiting for memory? Justify your answer: M1: $$5 * (0.04*1) + (0.1 * 0.15) * 0.2) = 0.45$$ M2: $8 * ((0.02*1) + (0.1 * 0.15) * 0.16) = 0.48$ M3: $8 * ((0.015*1) + (0.1 * 0.15) * 0.14) = 0.40$ $5 = \text{fall penalty for 1 wood blows}$ $8 = " " A wood blows$ So M2 (b) suppose we increase the associativity of Cache III (keeping the block-size constant). In this new configuration, the instruction miss rate goes to zero. Further, we measure a total CPI of 2.71. What is the data miss rate?