Computer Architecture and Organization 5th Edition Chapter 1.12 Solutions

1.1 [edit | edit source]

Personal computer.
Server.
Supercomputer.
Embedded computer.

1.2 [edit | edit source]

Idea from other field	Idea from computer architecture
a	Performance via Pipelining
b	Dependability via Redundancy
c	Performance via Prediction
d	Make the Common Case Fast
e	Hierarchy of Memories
f	Performance via Parallelism
g	Design for Moore's Law
h	Use Abstraction to Simplify Design

1.3 [edit | edit source]

A special kind of program called a compiler reads the high-level source code and translates it into a program in assembly language.
Another program called an assembler transforms the program in assembly language into a program in machine language, which is what a computer understands and can execute directly.

Some compilers "cut the middleman" and produce machine code directly.

1.4 [edit | edit source]

$\mathrm {Minimum\ frame\ buffer\ size} ={\frac {\mathrm {bytes} }{\mathrm {pixel} }}\times {\frac {\mathrm {pixels} }{\mathrm {frame} }}={\frac {3\ \mathrm {bytes} }{\mathrm {pixel} }}\times {\frac {1280\times 1024\ \mathrm {pixels} }{\mathrm {frame} }}=3932160{\frac {\mathrm {bytes} }{\mathrm {frame} }}$
$\mathrm {Transmission\ time/frame} ={\frac {\mathrm {Frame\ size} }{\mathrm {Transmission\ rate} }}={\frac {3932160\times 8\ \mathrm {bits} }{100\times 10^{6}{\frac {\mathrm {bits} }{\mathrm {second} }}}}=0.3145728\ \mathrm {s}$

1.5 [edit | edit source]

a [edit | edit source]

$\mathrm {IPS} ={\frac {\mathrm {Instructions} }{\mathrm {Second} }}={\frac {\mathrm {Instructions} }{\mathrm {Clock\ cycle} }}\times {\frac {\mathrm {Clock\ cycles} }{\mathrm {Second} }}={\frac {\mathrm {Clock\ rate} }{\mathrm {CPI} }}$

Processor	Instructions per second
1	$\mathrm {IPS_{1}} ={\frac {3.0\ \mathrm {GHz} }{1.5\ \mathrm {CPI} }}=2\times 10^{9}$
2	$\mathrm {IPS_{2}} ={\frac {2.5\ \mathrm {GHz} }{1.0\ \mathrm {CPI} }}=2.5\times 10^{9}$
3	$\mathrm {IPS_{3}} ={\frac {4.0\ \mathrm {GHz} }{2.2\ \mathrm {CPI} }}\approx 1.82\times 10^{9}$

Thus, processor 2 has the highest performance in instructions per second.

b [edit | edit source]

Processor	Number of cycles	Number of instructions
1	$(3.0\ \mathrm {GHz} )\times (10\ \mathrm {s} )=3\times 10^{10}$	$(2\times 10^{9}\ \mathrm {IPS} )\times (10\ \mathrm {s} )=2\times 10^{10}$
2	$(2.5\ \mathrm {GHz} )\times (10\ \mathrm {s} )=2.5\times 10^{10}$	$(2.5\times 10^{9}\ \mathrm {IPS} )\times (10\ \mathrm {s} )=2.5\times 10^{10}$
3	$(4.0\ \mathrm {GHz} )\times (10\ \mathrm {s} )=4\times 10^{10}$	$(1.82\times 10^{9}\ \mathrm {IPS} )\times (10\ \mathrm {s} )=1.82\times 10^{10}$

c [edit | edit source]

Let $I$ be the number of instructions executed, then a reduction in execution time of 30% can be expressed by the following formula.

${\frac {\mathrm {Execution\ time_{new}} }{\mathrm {Execution\ time_{old}} }}={\frac {I\times (1.2\times \mathrm {CPI} )\times \mathrm {Clock\ cycle\ time_{new}} }{I\times \mathrm {CPI} \times \mathrm {Clock\ cycle\ time_{old}} }}={\frac {1.2\times \mathrm {Clock\ rate_{old}} }{\mathrm {Clock\ rate_{new}} }}=0.7$

Thus, $\mathrm {Clock\ rate_{new}} ={\frac {1.2}{0.7}}\times \mathrm {Clock\ rate_{old}}$ . This represents a 71% increase in clock rate.

1.6 [edit | edit source]

In order to find which implementation of the hypothetical Instruction Set Architecture is faster we need to find the execution time of the program under each processor. The execution time of the program can be calculated as follows:

$\mathrm {CPU\ time} ={\frac {\mathrm {CPU\ clock\ cycles} }{\mathrm {Clock\ rate} }}$

Since we know the clock rates of each processor, we need to find out how many clock cycles it takes each processor to execute the program. This number is given by:

$\mathrm {CPU\ clock\ cycles} =\sum _{i=1}^{n}(\mathrm {CPI} _{i}\times C_{i})$

In the above formula, $\mathrm {CPI} _{i}$ and $C_{i}$ are the CPI and instruction count, respectively, for each instruction class (A, B, C or D). From the problem description, we know that the program executes $10^{6}\times 10\%=10^{5}$ instructions of class A, $10^{6}\times 20\%=2\times 10^{5}$ instructions of class B, $10^{6}\times 50\%=5\times 10^{5}$ instructions of class C and $10^{6}\times 20\%=2\times 10^{5}$ instructions of class D.

Thus, for processor P1 we have:

$\mathrm {CPU\ clock\ cycles_{P1}} =(1\times 10^{5})+(2\times 2\times 10^{5})+(3\times 5\times 10^{5})+(3\times 2\times 10^{5})=2.6\times 10^{6}$

And for processor P2 we have:

$\mathrm {CPU\ clock\ cycles_{P2}} =(2\times 10^{5})+(2\times 2\times 10^{5})+(2\times 5\times 10^{5})+(2\times 2\times 10^{5})=2\times 10^{6}$

Hence, the execution times for each processor are:

$\mathrm {CPU\ time_{P1}} ={\frac {2.6\times 10^{6}}{2.5\times \mathrm {GHz} }}=1.04\ \mathrm {ms}$

$\mathrm {CPU\ time_{P2}} ={\frac {2\times 10^{6}}{3\times \mathrm {GHz} }}=0.66667\ \mathrm {ms}$

Therefore, processor P2 is faster.