White R.E. Computational Mathematics: Models, Methods, and Analysis with MATLAB and MPI

Подождите немного. Документ загружается.

8.2. SYMMETRIC POSITIVE DEFINITE MATRICES 321

Thus the matrix

D can be factored as

D =

1 0 0

1@2 1 0

2@3 1

2 1 0

0 3

@2 1

0 0 4

1 0 0

1@2 1 0

0 2@3 1

2 0 0

0 3

@2 0

0 0 4@3

1 1@2 0

0 1

2@3

0 0 1

2 0 0

1@

2 0



3 2@

2 0 0

1@

2 0



3 2@

Deﬁnition. The Cholesky factorization of D is D = JJ

where J is a lower

triangular matrix with positive diagonal components.

Any SPD has a Cholesky factorization. The proof is again by mathematical

induction on the dimension of the matrix.

Theorem 8.2.3 (Cholesky Factorization) If

D is SPD, then it has a Cholesky

factorization.

Proof. The

q = 2 case is clearly true. Let e = d

A 0 and apply a row and

column elementary operation to

D =



e i

i F



1 0

ie

1

¸

e i

i F

¸

1 e

1

0 L



F  ie

1

The Schur complement

F = F ie

1

must be SPD and h as dimension q1=

Therefore, by the mathematical induction assumption it must have a Cholesky

factorization

F =

= Then



F  ie

1



e 0



e 0

¸

e 0



1 0

1

¸

e 0

¸

e 0

¸

1 e

1

0 L



e 0

¸

e 0

322 CHAPTER 8. CLASSICAL METHODS FOR AX = D

The mathematical induction proofs are not fully constructive, but they do

imply that the Schur complement is either invertible or is SPD. This allows

one to continue with possible permutations and elementary column operations.

This process can be done until the upper triangular matrix is obtained. In the

case of the SPD matrix, the Schur complement is also SPD and the ﬁrst pivot

must be positive and so no row interchanges are required.

The following Fortran 90 subroutine solves

D[ = G where D is SPD, [ and

G may have more than one column and with no row interchanges. The lower

triangular part of

D is overwritten by the lower triangular factor. The matrix

is factored only one time in lines 19-26 where the column version of the loops

is used. The subsequent lower and upper triangular solves are done for each

column of

G. The column versions of the lower triangular solves are done in

lines 32-40, and the column versions of the upper triangular solves are done in

lines 45-57. This subroutine will b e used in the next section as part of a direct

solver based on domain decomposition.

Fortran 90 Code for subroutine gespd()

1. Subroutine gespd(a,rhs,sol,n,m)

2.!

3.! Solves Ax = d with A a nxn SPD and d a nxm.

4.!

5. implicit none

6. real, dimension(n,n), intent(inout):: a

7. real, dimension(n,m), intent(inout):: rhs

8. real, dimension(n,n+m):: aa

9. real, dimension(n,m) :: y

10. real, dimension(n,m),intent(out)::sol

11. integer ::k,i,j,l

12. integer,intent(in)::n,m

13. aa(1:n,1:n)= a

14. aa(1:n,(n+1):n+m) = rhs

15.!

16.! Factor A via column version and

17.! write over the matrix.

18.!

19. do k=1,n-1

20. aa(k+1:n,k) = aa(k+1:,k)/aa(k,k)

21. do j=k+1,n

22. do i=k+1,n

23. aa(i,j) = aa(i,j) - aa(i,k)*aa(k,j)

24. end do

25. end do

26. end do

27.!

28.! Solve Ly = d via column version and

8.2. SYMMETRIC POSITIVE DEFINITE MATRICES 323

29.! multiple right sides.

30.!

31. do j=1,n-1

32. do l =1,m

33. y(j,l)=aa(j,n+l)

34. end do

35. do i = j+1,n

36. do l=1,m

37. aa(i,n+l) = aa(i,n+l) - aa(i,j)*y(j,l)

38. end do

39. end do

40. end do

41.!

42.! Solve Ux = y via column version and

43.! multiple right sides.

44.!

45. do j=n,2,-1

46. do l = 1,m

47. sol(j,l) = aa(j,n+l)/aa(j,j)

48. end do

49. do i = 1,j-1

50. do l=1,m

51. aa(i,n+l)=aa(i,n+l)-aa(i,j)*sol(j,l)

52. end do

53. end do

54. end do

55. do l=1,m

56. sol(1,l) = aa(1,n+l)/a(1,1)

57. end do

58. end subroutine

8.2.1 Exercises

1. Complete the details showing Example 2 is an SPD matrix.

2. By hand ﬁnd the Cholesky factorization for

D =

3 1 0

1 3 1

0 1 3

3. In Theorem 8.2.1, part 2, prove F is SPD.

4. For the matrix in problem 2 use Theorem 8.2.2 to show it is SPD.

5. Prove the converse part of Theorem 8.2.2.

6. In Theorem 8.2.3 prove the

q = 2 case.

324 CHAPTER 8. CLASSICAL METHODS FOR AX = D

7. For the matrix in exercise 2 trace through the steps in the subroutine

gespd() to solve

D[ =

1 4

2 5

3 6

8.3 Domain Decomposition and MPI

Domain decomposition order can be used to directly solve certain algebraic

systems. This was initially described in Sections 2.4 and 4.6. Consider the

Poisson problem where the spatial domain is partitioned into three blocks with

the ﬁrst two big blocks separated by a smaller interface block. If the interface

block for the Poisson problem is listed last, then the algebraic system may have

the form

0 D

In the Schur complement E is the 2 × 2 block given by the block diagonal

from

and D

, and F is D

. Therefore, all the solves with E can be done

concurrently, in this case with two processors. By partitioning the domain into

more blocks one can take advantage of additional processors. In the 3D space

model the big block solves will be smaller 3D subproblems, and here one may

need to use iterative methods such as SOR or conjugate gradient. Note the

conjugate gradient algorithm has a number of vector updates, dot products

and matrix-vector products, and all these steps have indep endent parts.

In order to be more precise about the above, consider the (

s + 1) × (s + 1)

block matrix equation in block component form with 1  n  s

n>n

+ D

n>s+1

s+1

= I

(8.3.1)

n=1

s+1>n

+ D

s+1>s+1

s+1

= I

s+1

= (8.3.2)

Now solve (8.3.1) for X

, and note the computations for D

1

n>n

n>s+1

and D

1

n>n

can be done concurrently. Put X

into (8.3.2) and solve for X

s+1

s+1>s+1

s+1

where

s+1>s+1

= D

s+1>s+1



n=1

s+1>n

1

n>n

n>s+1

s+1

= I

s+1



n=1

s+1>n

1

n>n

Then concurrently solve for X

= D

1

n>n

 D

1

n>n

n>s+1

s+1

8.3. DOMAIN DECOMPOSITION AND MPI 325

In order to do the above calculations, the matrices

n>n

for 1  n  s> and

s+1>s+1

must be invertible. Consider the 2 × 2 block version of the (s + 1) ×

(s + 1) matrix

D =



E H

I F

where

E is the block diagonal of D

n>n

for 1  n  s> and F is D

n+1>n+1

. In

this case the Schur complement of

E is

s+1>s+1

= According to Theorem 8.1.2,

if the matrices

D and D

n>n

for 1  n  s have inverses, then

s+1>s+1

will have

an inverse. Or, according to Theorem 8.2.2, if the matrix

D is SPD, then the

matrices D

n>n

for 1  n  s> and

s+1>s+1

must be SPD and have inverses.

Consider the 2D steady state heat diusion problem as studied in Section

4.6. The M

ATLA B code gedd.m uses block Gaussian elimination where the E

matrix, in the 2 × 2 block matrix of the Schur complement formulation, is a

block diagonal matrix with four (

s = 4) blocks on its diagonal. The F = D

matrix is for the coe!cients of the three interface grid rows between the four

big blocks

0 0 0 D

0 D

0 0 D

0 D

0 0 0 D

The following MPI code is a parallel implementation of the MATLAB code

gedd.m. It uses three subroutines, which are not listed. The subroutine ma-

trix_def() initializes the above matrix for the Poisson problem, and the dimen-

sion of the matrix is 4

+ 3q where q = 30 is the number of unknowns in

the x direction and 4

q + 3 is the number of unknowns in the y direction. The

subroutine gesp d() is the same as in the previous section, and it assumes the

matrix is SPD and does Gaussian elimination for multiple right hand sides. The

subroutine cgssor3() is a sparse implementation of the preconditioned conjugate

gradient method with SSOR preconditioner. It is a variation of cgssor() used

in Section 4.3, but now it is for multiple right hand sides. For the larger solves

with

n>n

for 1  n  s, this has a much shorter computation time than when

using gespd(). Because the Schur complement,

s+1>s+1

, is not sparse, gespd()

is used to do this solve for X

s+1

The arrays for q = 30 are declared and initialized in lines 8-27. MPI is

started in lines 28-37 where up to four processors can be used. Lines 38-47

concurrently compute the arrays that are used in the Schur complement. These

are gathered onto processor 0 in lines 49-51, and the Schur complement array

is formed in lines 52-58. The Schur complement equation is solved by a call to

gesp d() in line 60. In line 62 the solution is broadcast from processor 0 to the

other processors. Lines 64-70 concurrently solve for the big blocks of unknowns.

The results are gathered in lines 72-82 onto processor 0 and partial results are

printed.

326 CHAPTER 8. CLASSICAL METHODS FOR AX = D

MPI/Fortran Code geddmpi.f

1. program schurdd

2.! Solves algebraic system via domain decomposition.

3.! This is for the Poisson equation with 2D space grid nx(4n+3).

4.! The solves may be done either by GE or PCG. Use either PCG

5.! or GE for big solves, and GE for the Schur complement solve.

6. implicit none

7. include ’mpif.h’

8. real, dimension(30,30):: A,Id

9.! AA is only used for the GE big solve.

10. real, dimension(900,900)::AA

11. real, dimension(900,91,4)::AI,ZI

12. real, dimension(900,91):: AII

13. real, dimension(90,91) :: Ahat

14. real, dimension(90,91,4) :: WI

15. real, dimension(900) :: Ones

16. real, dimension(90) :: dhat,xO

17. real, dimension(900,4) :: xI, dI

18. real:: h

19. real:: t0,t1,timef

20. integer:: n,i,j,loc_n,bn,en,bn1,en1

21. integer:: my_rank,p,source,dest,tag,ierr,status(mpi_status_size)

22. integer :: info

23.! Deﬁne the nonzero parts of the coe

!cient matrix with

24.! domain decomposition ordering.

25. n = 30

26. h = 1./(n+1)

27. call matrix_def(n,A,AA,Ahat,AI,AII,WI,ZI,dhat)

28.! Start MPI

29. call mpi_init(ierr)

30. call mpi_comm_rank(mpi_comm_world,my_rank,ierr)

31. call mpi_comm_size(mpi_comm_world,p,ierr)

32. if (my_rank.eq.0) then

33. t0 = timef()

34. end if

35. loc_n = 4/p

36. bn = 1+my_rank*loc_n

37. en = bn + loc_n -1

38.! Concurrently form the Schur complement matrices.

39. do i = bn,en

40. ! call gespd(AA,AI(1:n*n,1:3*n+1,i),&

41. ! ZI(1:n*n,1:3*n+1,i),n*n,3*n+1)

42. call cgssor3(AI(1:n*n,1:3*n+1,i),&

43. ZI(1:n*n,1:3*n+1,i),n*n,3*n+1,n)

8.3. DOMAIN DECOMPOSITION AND MPI 327

44. AII(1:n*n,1:3*n) = AI(1:n*n,1:3*n,i)

45. WI(1:3*n,1:3*n+1,i)=matmul(transpose(AII(1:n*n,1:3*n))&

46. ,ZI(1:n*n,1:3*n+1,i))

47. end do

48. call mpi_barrier(mpi_comm_world,ierr)

49. call mpi_gather(WI(1,1,bn),3*n*(3*n+1)*(en-bn+1),mpi_real,&

50. WI,3*n*(3*n+1)*(en-bn+1),mpi_real,0,&

51. mpi_comm_world,status ,ierr)

52. if (my_rank.eq.0) then

53. Ahat(1:3*n,1:3*n) = Ahat(1:3*n,1:3*n)-&

54. WI(1:3*n,1:3*n,1)-WI(1:3*n,1:3*n,2)-&

55. WI(1:3*n,1:3*n,3)-WI(1:3*n,1:3*n,4)

56. dhat(1:3*n) = dhat(1:3*n) -&

57. WI(1:3*n,1+3*n,1)-WI(1:3*n,1+3*n,2)-&

58. WI(1:3*n,1+3*n,3) -WI(1:3*n,1+3*n,4)

59.! Solve the Schur complement system via GE

60. call gespd(Ahat(1:3*n,1:3*n),dhat(1:3*n),xO(1:3*n),3*n,1)

61. end if

62. call mpi_bcast(xO,3*n,mpi_real,0,mpi_comm_world,ierr)

63.! Concurrently solve for the big blocks.

64. do i = bn,en

65. dI(1:n*n,i) = AI(1:n*n,3*n+1,i)-&

66. matmul(AI(1:n*n,1:3*n,i),xO(1:3*n))

67. ! call gesp d(AA,dI(1:n*n,i),XI(1:n*n,i),n*n,1)

68. call cgssor3(dI(1:n*n,i),&

69. xI(1:n*n,i),n*n,1,n)

70. end do

71. call mpi_barrier(mpi_comm_world,ierr)

72. call mpi_gather(xI(1,bn),n*n*(en-bn+1),mpi_real,&

73. xI,n*n*(en-bn+1),mpi_real,0,&

74. mpi_comm_world,status ,ierr)

75. call mpi_barrier(mpi_comm_world,ierr)

76. if (my_rank.eq.0) then

77. t1 = timef()

78. print*, t1

79. print*, xO(n/2),xO(n+n/2),xO(2*n+n/2)

80. print*, xI(n*n/2,1),xI(n*n/2,2),&

81. xI(n*n/2,3),xI(n*n/2,4)

82. end if

83. call mpi_ﬁnalize(ierr)

84. end program

The code was run for 1, 2 and 4 processors with both the gespd() and

cgssor3() subroutines for the four large solves with

n>n

for 1  n  s = 4=

The computation times using gespd() were about 14 to 20 times longer than

328 CHAPTER 8. CLASSICAL METHODS FOR AX = D

Table 8.3.1: MPI Times for geddmpi.f

p gesp d() cgssor3()

1 18.871 .924

2 09.547 .572

4 04.868 .349

the time with cgssor3(). The computation times (sec.) are given in Table

8.3.1, and they indicate good speedups close to the numb er of processors. The

speedups with gespd() are better than those with cgssor3() because the large

solves are a larger proportion of the computations, which include the same time

for communication.

8.3.1 Exercises

1. Verify the computations in Table 8.3.1.

2. Experiment with the convergence criteria in the subroutine cgssor3().

3. Experiment with

q, the number of unknowns in the x direction.

4. Experiment with

n, the number of large spatial blocks of unknowns. Vary

the number of processors that divide

8.4 SOR and P-regular Splittings

SPD matrices were initially introduced in Section 3.5 where the steady state

membrane model was studied. Two equivalent models were introduced: the

deformation must satisfy a particular partial di

erential operation, or it must

minimize the potential energy of the membrane. The discrete forms of these

are for

{ the approximation of the deformation and M (|) the approximation of

the potential energy

D{ = g (8.4.1)

M({) = min

M(|) where M (|) 

D|  |

g= (8.4.2)

When

D is an SPD matrix, (8.4.1) and (8.4.2) are equivalent. Three additional

properties are stated in the following theorem.

Theorem 8.4.1 (SPD Equivalence Properties) If

D is an SPD matrix, then

1. the algebraic problem (8.4.1) and the minimum problem (8.4.2) are equiv-

alent,

2. there is a constant

A 0 such that {

D{  f

{



{

(Cauchy inequality) and

8.4. SOR AND P-REGULAR SPLITTINGS 329

4. k

 ({

D{)

is a norm.

Proof. 1. First, we will show if

D is SPD and D{ = g, then M({)  M(|) for

all

|. Let | = { + (|  {) and use D = D

to derive

M(|) =

(

{ + (|  {))

D({ + (|  {))  ({ + (|  {))

{

D{ + (|  {)

(

|  {)

D(|  {)  {

g  (|  {)

= M({)  u({)

(|  {) +

(

|  {)

D(|  {)= (8.4.3)

Since

u({) = g  D{ = 0> (8.4.3) implies

M(|) = M({) +

(

|  {)

D(|  {)=

Because D is positive deﬁnite, (|  {)

D(|  {) is greater than or equal to zero.

Thus,

M(|) is greater than or equal to M({).

Second, prove the converse by assuming

M({)  M(|) for all | = { + wu({)

where

w is any real number. From (8.4.3)

M(|) = M({)  u({)

(|  {) +

(

|  {)

D(|  {)

= M({)  u({)

(wu({)) +

(

wu({))

D(wu({))

M({)  wu({)

u({) +

u({)

Du({)=

Since 0  M(|)  M({) = wu({)

u({) +

u({)

Du({)= If u ({) is not zero,

then u({)

u({) and u({)

Du({) are positive. Choose

w = 

({)

u({)

Du({)

A 0=

This gives the following inequality

 u({)

u({) +

wu({)

Du({)

 u({)

u({) +



({)

u({)

Du({)

u({)

Du({)

 u({)

u({) +

u({)

u({)

 u({)

u({)(1 +

)=

For 0 ?  ? 2 this is a contradiction so that u({) must be zero.

330 CHAPTER 8. CLASSICAL METHODS FOR AX = D

2. The function

i ({) = {

D{ is a continuous real valued function. Since

the set of

| such that |

| = 1 is closed and bounded, i restricted to this set

will attain its minimum, that is, there exists b| with b|

b| = 1 such that

min

|=1

i(|) = i(b|) = b|

Db| A 0=

Now let | = {@({

{)

and f

= i(b|) = b|

Db| so that

i({@({

{)

)  i(b|)

(

{@({

{)

)

D({@({

{)

)  f

{

D{  f

{

3. Consider the real valued function of the real number  and use the SPD

property of

()  ({ + |)

D({ + |)

{

D{ + 2{

D| + 

D|=

This quadratic function of  attains its nonnegative minimum at

  {

D|@|

D| and

 i(b) = {

D{  ({

D|)

D|=

This implies the desired inequality.

4. Since

D is SPD, k{k

 ({

D{)

 0, and k{k

= 0 if and only if { = 0=

Let  be a real number.

{k

 (({)

D({))

= (

{

D{)

= || ({

D{)

= || k{k

The triangle inequality is given by the symmetry of D and the Cauchy inequality

{ + |k

= ({ + |)

D({ + |)

{

D{ + 2{

D| + |

 k{k

+ 2

{

+ k

 k{k

+ 2 k{k

k|k

+ k|k

 (k{k

+ k|k

)

We seek to solve the system D{ = g by an iterative method which utilizes

splitting the coe

!cient matrix D into a dierence of two matrices

D = P  Q

where P is assumed to have an inverse. Substituting this into the equation

D{ = g, we have

P{  Q{ = g=