- Well, if there are 10^11 stars in the galaxy, each iteration
would involve approximately 10^22 basic calculations. We multiply
this by an estimated 10 floating point operations per calculation,
and by 100 iterations, we get 10^25 floating point operations total.
If we divide this by 500,000,000 (5*10^8) floating point operations
per second, we get 2*10^16 seconds, or about 630 million
years.
- Let's look at a few examples. If n = 3, the torus has
a diameter (maxium least distance between nodes) of 2. If n
= 4, the torus has a diameter of 4. If n = 5, the torus has
a diameter of 4. If n = 6, the torus has a diameter of 6.
It appears that for odd n, the diameter is d=(n+1)/2,
and for even n, the diameter is d=n/2.
- If the tree is balanced, i.e., the depth of the tree is equal
on all branches from the root node, and the depth of the tree is n, then
the diameter of the network is is the distance from the deepest
node on one branch off the root to the deepest node on
another branch off the root. That is, d=2*n.
- The maximum distance in any dimension is n-1, if n is the
number of nodes in that dimension of the mesh. The maximum distance
for the d-dimensional mesh, then, is the the sum of nodes in each
dimension minus the number of dimensions. If n is the
number of elements in each of d dimensions, then the
diameter = (n-1)*d.
- First, we note that a five-dimensional hypercube which uses
the e-cube routing algorithm is a binary
hypercube, that is, the cube is 2x2x2x2x2. Therefore each node
is represented by an address comprised of 5 base-2 digits. The
address of node 7, then, is 00111 (7=4+2+1), and node 22 is 10110
(22=16+4+2). By taking the exclusive OR between the two
addresses, we get 10001 for the routing bits. In other
words, the two addresses differ only in the first and
last dimension of the hypercube. To obtain the route, we XOR the
left-most non-zero routing bit with the corresponding bit of the
current node to obtain the next node, set that bit in the routing
bits to zero, and repeat the process until all of the routing
bits are zero. The route, then would be from node 7 (00111) to
node 23 (10111) to to node 22 (10110).
- figure tbd. Since there are 4 processors in each
of 4 dimensions, we think of the hypercube as a 4x4x4x4 array,
so each node may be thought of as having an address comprised
of 4 base-4 digits. For example, if the "near" corner has an
address of 0000, the "far" corner would have an address of 3333.
Two adjacent nodes near the "middle" might have addresses of
1212 and 1112. The total number of nodes would be 4*4*4*4=256.
-
First, note that a 9 x 9 mesh requires an 8-dimensional
binary hypercube in order to implement the approach used
on page 17 in the text. To represent the row/column addresses
of the mesh, we require the number 0 through 8, which in
binary representation requires 4 bits. So, four bits for the
row address, and 4 bits for the column address add up to an
8-bit address.
Second, note that the row addresses and
column addresses are not your standard binary numbering;
the numbers use something called a Gray encoding which
has the property that adjacent numbers differ by only one
bit.
8 1100
7 0100
6 0101
5 0111
4 0110
3 0010
2 0011
1 0001
0 0000
0000 0001 0011 0010 0110 0111 0101 0100 1100
0 1 2 3 4 5 6 7 8
You may find it interesting to note that this embedded mesh is
not a torus. This technique of embedding a mesh in a
hypercube only works for toruses of size 2x2,
4x4, 4x2, 8x8, 8x4, 8x2, 16x16, 16x8, 16x4, 16x2, etc.
Of course, it will work fine for meshes of any size, provided
the sum of the number of bits required for the row and column
addresses is less than or equal to the dimension of the hypercube.
- Actually, this is a lot of questions. Here are a few
things to note.
The average distance in a mesh is r/2 + c/2,
where r is the number of rows, and c
is the number of columns. Acutally, we expect a somewhat
smaller average since the nodes near the middle of the
mesh are closer on average to more other nodes than
are nodes near the edge of the mesh. (?)
In a torus, the average distance is r/4 + c/4,
where r is the number of rows, and c is the number
of columns.
The average distance in a hypercube is d/2, where
d is the number of dimensions. On average, we
expect two nodes to differ in only 1/2 of the bits in
their respective addresses.
-
Let us assume that for this problem, the algorithm in question
is a sorting algorithm in which each node holds a single data
point and that upon completion of the algorithm, the data
points have migrated amongst the nodes such that they apear
in a left-reading order in the tree. In the following, let
h be the height of the tree.
The communication lower bound for a complete binary tree, based on
distance, depend on the height of the tree. In the worst case,
the tree is exactly out of order. In other words, the bottom
level on the right needs to trade places with the bottom level
on the left, and the second level on the right needs to trade
places with the second level on the left, etc. In this case,
the number of hops required to complete the transfer is
2*2^h*2*h + 2*2^(h-1)*2*(h-1) + ... + 2.
The communication lower bound for a complete binary tree, based
on the bisection width, depends on the height of the tree. In
the worst case, all of the data in nodes in the left half of
the tree need to be moved to the right half, and all the data
in the nodes on the right half need to move to the left.
Since the bisection width of a tree is 1, and the number of
nodes on each side of the tree is 2^h - 1 hops, the communication
lower bound is 2*(2^h - 1) hops. If on average only 1/2 of the
data need to migrate from one half of the tree to the other, the
communication lower bound is 2^h - 1 hops.
-
If the numbers start out un-sorted in a mesh that is
sqrt(n) x sqrt(n) in size, then a typical number will
have to move half-way across the mesh in each of two
dimensions during the sorting process, that is,
2 * sqrt(n) / 2 = sqrt(n).
In terms of bisection width, the mesh has a bisection
width of sqrt(n). If half the data needs to cross
the midline of the mesh during the sorting process,
the communication lower bound may be calculated by
n/2/sqrt(n) = sqrt(n)/2.
-
Suppose, for the sake of argument, that the program takes
100 machine seconds to run. Then clearly, one machine
will execute the 10% of the sequential code in a total
of 10 seconds, and all ten machines will execute the 90%
of parallelizable code for 9 seconds. During the execution
of the program then, 200 MFLOPS for ten seconds, plus
2000 MFLOPS for nine seconds = 20,000 MFLOP seconds.
If we then divide this by the total execution time of
19 seconds we get an
average performance of 1053 MFLOPS, or slightly better
than 50% of capacity.
-
Hmm. First thing to notice, is that 2*n*log(n^0.5) =
2*n*0.5*log(n) = n*log(n). Second, the n in the number
of processors is different from the n in the formula
for the number of steps required for the sequential
algorithm. For this reason, we will use p instead
of n to denote the number of processors.
According to the definition, a cost-optimal
parallel algorithm is one in which the cost on a multiprocessor
is proportional to the cost on a sequential processor.
In other words, Cp = k*Cs, where k is some constant,
Cp is the parallel algoritm cost, and Cs is the sequential
algoritm cost.
According to the definition of speedup, we define S(p^2)
to be the speedup using p^2 processors:
S(p^2) = n*log(n) / x, where x is the number of steps
with p^2 processors.
Note that Cp = Ts * p^2 / S(p^2). Then Cp = Ts * p^2 / (n*log(n)/x).
Note that Cs = Ts * 1 / 1 = Ts.
Using the proportionality equation expressed above, we get:
Ts * p^2 * x / (n*log(n)) = k * Ts. Solving for x, we get x =
k * n * log(n) / p^2.
The parallel algorithm, then, will be cost-optimal if it
requires k*n*log(n)/p^2 or fewer steps.
(???)