线性变换方程包含最基本的线性方程组、线性矩阵方程, 如Lyapunov矩阵方程, Sylvester矩阵方程等.这些方程(组) 的求解问题广泛来源于控制理论[1]、图像恢复[2]、迁移理论中Riccati微分方程的数值解[3]、模型降阶[4-6]、线性系统的稳定性分析[7]等许多科学领域.
目前, 求解大型线性变换方程通常采用迭代法, 如ADI方法[8](Alternating Direction Implicit) 及Krylov子空间法[9-13]等.ADI方法的不便之处在于确定最优参数; 共轭梯度法[14](conjugate gradient method, 简称CG法) 具有存储量小、适合并行计算等优点, 但它仅适用于系数矩阵为对称正定的情形; 变形共轭梯度法[15](modified conjugate gradient method, 简称MCG法) 的收敛速度与系数矩阵的条件数密切相关, 条件数越小, 收敛速度越快, 因此MCG算法不适用于条件数较大的情形; SYMMLQ算法[16]适合求解系数矩阵为对称这一情形, 其易于实现并行计算且收敛速度快.但是该算法的缺点是在并行实现过程中, 需要两次全归约, 由此引起的大规模全局通讯使得算法的可扩放性比较差[17-18].
基于对上述算法的研究, 本文首先对SYMMLQ算法实现并行运算, 然后将该并行算法进行改进, 以减少并行计算时间, 提高并行效率.文中涉及到的所有矩阵都是实矩阵, 定义同阶矩阵A与B的内积为[A, B]=tr (ATB), 由此导出矩阵的Frobenius范数
文献[16]给出了求解对称情形下线性方程组的SYMMLQ算法, 文中将该算法推广到求解线性对称变换方程, 以拓宽其使用范围.
设线性变换T是n维欧式空间V上的对称变换, 即 X, Y∈V, 有[T(X), Y]=[X, T(Y)].线性对称变换方程为
$ T\left( \mathit{\boldsymbol{X}} \right) = \mathit{\boldsymbol{F}}, $ | (1) |
其中X∈V是未知的元素, F∈V是已知元素.
根据文献[12]的Lanczos过程, 推广到线性对称变换方程,可得到如下关系式
$ T\left( {{\mathit{\boldsymbol{Q}}^{\left( 1 \right)}},{\mathit{\boldsymbol{Q}}^{\left( 2 \right)}}, \cdots ,{\mathit{\boldsymbol{Q}}^{\left( k \right)}}} \right) = \left( {{\mathit{\boldsymbol{Q}}^{\left( 1 \right)}},{\mathit{\boldsymbol{Q}}^{\left( 2 \right)}}, \cdots ,{\mathit{\boldsymbol{Q}}^{\left( k \right)}}} \right){\mathit{\boldsymbol{T}}_k} + {b_k}{\mathit{\boldsymbol{Q}}^{\left( {k + 1} \right)}}\mathit{\boldsymbol{e}}_k^{\rm{T}}, $ |
其中Q(1), Q(2), …, Q(k)表示由Lanczos正交过程得到的n维欧式空间V中相互正交的单位元素(i, j=1, …, k).ekT表示k维空间第k个向量, 即ekT=(0, 0, …, 0, 1).Tk是一个k×k的对称三对角矩阵, 即
$ {\mathit{\boldsymbol{T}}_k} = \left( {\begin{array}{*{20}{c}} {{a_1}}&{{b_1}}&{}&{}&{}\\ {{b_1}}&{{a_2}}&{{b_2}}&{}&{}\\ {}&{{b_2}}& \ddots & \ddots &{}\\ {}&{}& \ddots & \ddots &{{b_{k - 1}}}\\ {}&{}&{}&{{b_{k - 1}}}&{{a_k}} \end{array}} \right). $ |
考虑方程(1) 的求解, 令
$ {\mathit{\boldsymbol{X}}^{\left( k \right)}} = {\mathit{\boldsymbol{X}}^{\left( 0 \right)}} + \left( {{\mathit{\boldsymbol{Q}}^{\left( 1 \right)}},{\mathit{\boldsymbol{Q}}^{\left( 2 \right)}}, \cdots ,{\mathit{\boldsymbol{Q}}^{\left( k \right)}}} \right){\mathit{\boldsymbol{f}}^{\left( k \right)}}, $ |
其中f(k)是k×1列向量.取f(k)使得R(k)=F-T(X(k)) 正交于Q(1), Q(2), …, Q(k), 即
$ \left[ {{\mathit{\boldsymbol{Q}}^{\left( i \right)}},{\mathit{\boldsymbol{R}}^{\left( k \right)}}} \right] = 0\left( {i = 1, \cdots ,k} \right). $ | (2) |
因为
$ \begin{array}{*{20}{c}} {{\mathit{\boldsymbol{R}}^{\left( k \right)}} = {\mathit{\boldsymbol{R}}^{\left( 0 \right)}} - T\left( {{\mathit{\boldsymbol{Q}}^{\left( 1 \right)}},{\mathit{\boldsymbol{Q}}^{\left( 2 \right)}}, \cdots ,{\mathit{\boldsymbol{Q}}^{\left( k \right)}}} \right){\mathit{\boldsymbol{f}}^{\left( k \right)}}}\\ { = {\mathit{\boldsymbol{R}}^{\left( 0 \right)}} - \left( {{\mathit{\boldsymbol{Q}}^{\left( 1 \right)}},{\mathit{\boldsymbol{Q}}^{\left( 2 \right)}}, \cdots ,{\mathit{\boldsymbol{Q}}^{\left( k \right)}}} \right){\mathit{\boldsymbol{T}}_k}{\mathit{\boldsymbol{f}}^{\left( k \right)}} - {b_k}{\mathit{\boldsymbol{Q}}^{\left( {k + 1} \right)}}\mathit{\boldsymbol{f}}_k^{\left( k \right)},} \end{array} $ | (3) |
将式(3) 左右两端分别与Q(i)(i=1, …, k) 做内积, 由式(2) 和(3), 可得
$ {\left\| {{\mathit{\boldsymbol{R}}^{\left( 0 \right)}}} \right\|_\mathit{\boldsymbol{F}}}{\mathit{\boldsymbol{e}}_1} - {\mathit{\boldsymbol{T}}_k}{\mathit{\boldsymbol{f}}^{\left( k \right)}} = 0. $ |
这样f(k)是方程
因为对称三对角矩阵Tk可能奇异, 因而对它采用LQ分解, 该过程类似于文献[12].于是, 得到求解线性对称变换方程(1) 的SYMMLQ算法(算法1):
(1) 任意给定初始元素X(0)∈V, 置k:=1, 计算
$ \begin{array}{*{20}{c}} {{\mathit{\boldsymbol{R}}^{\left( 0 \right)}} = \mathit{\boldsymbol{F}} - T\left( {{\mathit{\boldsymbol{X}}^{\left( 0 \right)}}} \right),{\mathit{\boldsymbol{Q}}^{\left( 1 \right)}} = \frac{{{\mathit{\boldsymbol{R}}^{\left( 0 \right)}}}}{{\left\| {{\mathit{\boldsymbol{R}}^{\left( 0 \right)}}} \right\|}},{a_1} = \left[ {T\left( {{\mathit{\boldsymbol{Q}}^{\left( 1 \right)}}} \right),{\mathit{\boldsymbol{Q}}^{\left( 1 \right)}}} \right],{{\tilde r}_1} = {a_1},}\\ {{b_1} = \left\| {T\left( {{\mathit{\boldsymbol{Q}}^{\left( 1 \right)}}} \right) - {a_1}{\mathit{\boldsymbol{Q}}^{\left( 1 \right)}}} \right\|,{{\tilde \xi }_1} = \frac{{\left\| {{\mathit{\boldsymbol{R}}^{\left( 0 \right)}}} \right\|}}{{{{\tilde r}_1}}},{{\mathit{\boldsymbol{\tilde W}}}^{\left( 1 \right)}} = {\mathit{\boldsymbol{Q}}^{\left( 1 \right)}},{\mathit{\boldsymbol{X}}^{\left( 1 \right)}} = {\mathit{\boldsymbol{X}}^{\left( 0 \right)}} + {{\tilde \xi }_1}{{\mathit{\boldsymbol{\tilde W}}}^{\left( 1 \right)}};} \end{array} $ |
(2) 置k:=2, 计算
$ \begin{array}{l} {\mathit{\boldsymbol{Q}}^{\left( 2 \right)}} = \frac{{\left( {T\left( {{\mathit{\boldsymbol{Q}}^{\left( 1 \right)}}} \right) - {a_1}{\mathit{\boldsymbol{Q}}^{\left( 1 \right)}}} \right)}}{{{b_1}}},{a_2} = \left[ {T\left( {{\mathit{\boldsymbol{Q}}^{\left( 2 \right)}}} \right),{\mathit{\boldsymbol{Q}}^{\left( 2 \right)}}} \right],\\ {b_2} = \left\| {T\left( {{\mathit{\boldsymbol{Q}}^{\left( 2 \right)}}} \right) - {a_2}{\mathit{\boldsymbol{Q}}^{\left( 2 \right)}} - {b_1}{\mathit{\boldsymbol{Q}}^{\left( 1 \right)}}} \right\|,{c_1} = \frac{{{a_1}}}{{{{\left( {a_1^2 + b_1^2} \right)}^{1/2}}}},{s_1} = \frac{{{b_1}}}{{{{\left( {a_1^2 + b_1^2} \right)}^{1/2}}}},\\ {\delta _2} = {b_1}{c_1} + {a_2}{s_1},{\varepsilon _3} = {b_2}{s_1},{r_1} = {\left( {a_1^2 + b_1^2} \right)^{1/2}},{\xi _1} = \left\| {{\mathit{\boldsymbol{r}}_0}} \right\|/{r_1},\\ {{\tilde r}_2} = {b_1}{s_1} - {a_2}{c_1}{r_2} = {\left( {\tilde r_2^2 + b_2^2} \right)^{1/2}},{{\tilde \delta }_3} = - {b_2}{c_1},{{\tilde \xi }_2} = - {\delta _2}{\xi _1}/{{\tilde r}_2},{\xi _2} = - \frac{{{\delta _2}{\xi _1}}}{{{r_2}}},\\ {\mathit{\boldsymbol{W}}^{\left( 1 \right)}} = {c_1}{{\mathit{\boldsymbol{\tilde W}}}^{\left( 1 \right)}} + {s_1}{\mathit{\boldsymbol{Q}}^{\left( 2 \right)}},{{\mathit{\boldsymbol{\tilde W}}}^{\left( 2 \right)}} = {s_1}{{\mathit{\boldsymbol{\tilde W}}}^{\left( 1 \right)}} - {c_1}{\mathit{\boldsymbol{Q}}^{\left( 2 \right)}},\\ {\mathit{\boldsymbol{Y}}^{\left( 1 \right)}} = {\mathit{\boldsymbol{X}}^{\left( 0 \right)}} + {\xi _1}{\mathit{\boldsymbol{W}}^{\left( 1 \right)}},{\mathit{\boldsymbol{X}}^{\left( 2 \right)}} = {\mathit{\boldsymbol{Y}}^{\left( 1 \right)}} + {{\tilde \xi }_2}{\mathit{\boldsymbol{W}}^{\left( 2 \right)}}; \end{array} $ |
(3) 计算ΔX(k-1)=X(k-1)-X(k-2), 若‖ΔX(k-1)‖ < ε停止计算; 否则, 计算
$ \begin{array}{l} {\mathit{\boldsymbol{Q}}^{\left( k \right)}} = \frac{{\left( {T\left( {{\mathit{\boldsymbol{Q}}^{\left( {k - 1} \right)}}} \right) - {a_{k - 1}}{\mathit{\boldsymbol{Q}}^{\left( {k - 1} \right)}} - {b_{k - 2}}{\mathit{\boldsymbol{Q}}^{\left( {k - 2} \right)}}} \right)}}{{{b_{k - 1}}}},{a_k} = \left[ {T\left( {{\mathit{\boldsymbol{Q}}^{\left( k \right)}}} \right),{\mathit{\boldsymbol{Q}}^{\left( k \right)}}} \right],\\ {b_k} = \left\| {T\left( {{\mathit{\boldsymbol{Q}}^{\left( k \right)}}} \right) - {a_k}{\mathit{\boldsymbol{Q}}^{\left( k \right)}} - {b_{k - 1}}{\mathit{\boldsymbol{Q}}^{\left( {k - 1} \right)}}} \right\|,{c_{k - 1}} = \frac{{{{\tilde r}_{k - 1}}}}{{{{\left( {\tilde r_{k - 1}^2 + b_{k - 1}^2} \right)}^{1/2}}}},{s_{k - 1}} = \frac{{{b_{k - 1}}}}{{{{\left( {\tilde r_{k - 1}^2 + b_{k - 1}^2} \right)}^{1/2}}}},\\ {\delta _k} = {{\tilde \delta }_k}{c_{k - 1}} + {a_k}{s_{k - 1}},{\varepsilon _{k + 1}} = {b_k}{s_{k - 1}},{{\tilde r}_k} = {{\tilde \delta }_k}{s_{k - 1}} - {a_k}{c_{k - 1}},{r_k} = {\left( {\tilde r_k^2 + b_k^2} \right)^{1/2}},\\ {{\tilde \delta }_{k + 1}} = - {b_k}{c_{k - 1}},{{\tilde \xi }_k} = - \left( {{\varepsilon _k}{\xi _{k - 2}} + {\delta _k}{\xi _{k - 1}}} \right)/{{\tilde r}_k},{\xi _k} = - \left( {{\varepsilon _k}{\xi _{k - 2}} + {\delta _k}{\xi _{k - 1}}} \right)/{r_k},\\ {\mathit{\boldsymbol{W}}^{\left( {k - 1} \right)}} = {c_{k - 1}}{{\mathit{\boldsymbol{\tilde W}}}^{\left( {k - 1} \right)}} + {s_{k - 1}}{\mathit{\boldsymbol{Q}}^{\left( k \right)}},{{\mathit{\boldsymbol{\tilde W}}}^{\left( k \right)}} = {s_{k - 1}}{{\mathit{\boldsymbol{\tilde W}}}^{\left( {k - 1} \right)}} - {c_{k - 1}}{\mathit{\boldsymbol{Q}}^{\left( k \right)}},\\ {\mathit{\boldsymbol{Y}}^{\left( {k - 1} \right)}} = {\mathit{\boldsymbol{Y}}^{\left( {k - 2} \right)}} + {\xi _{k - 1}}{\mathit{\boldsymbol{W}}^{\left( {k - 1} \right)}},{\mathit{\boldsymbol{X}}^{\left( k \right)}} = {\mathit{\boldsymbol{Y}}^{\left( {k - 1} \right)}} + {{\tilde \xi }_k}{{\mathit{\boldsymbol{\tilde W}}}^{\left( k \right)}}; \end{array} $ |
(4) 置k:=k+1, 转(3).
1.2 SYMMLQ算法的收敛性分析定理1 按照1.1中推导过程所求得的近似解X(k)有‖R(k)‖F=|bkfk(k)|, 其中fk(k)是f(k)的第k个分量.
证明 由于
$ \begin{array}{l} {\mathit{\boldsymbol{R}}^{\left( k \right)}} = {\mathit{\boldsymbol{R}}^{\left( 0 \right)}} - T\left( {{\mathit{\boldsymbol{Q}}^{\left( 1 \right)}},{\mathit{\boldsymbol{Q}}^{\left( 2 \right)}}, \cdots ,{\mathit{\boldsymbol{Q}}^{\left( k \right)}}} \right)\mathit{\boldsymbol{f}}\left( k \right) = \\ \;\;\;\;\;\;\;\;\;\left( {{\mathit{\boldsymbol{Q}}^{\left( 1 \right)}},{\mathit{\boldsymbol{Q}}^{\left( 2 \right)}}, \cdots ,{\mathit{\boldsymbol{Q}}^{\left( k \right)}}} \right)\left( {{{\left\| {{\mathit{\boldsymbol{R}}^{\left( 0 \right)}}} \right\|}_F}{\mathit{\boldsymbol{e}}_1} - {\mathit{\boldsymbol{T}}_k}{\mathit{\boldsymbol{f}}^{\left( k \right)}}} \right) - {b_k}{\mathit{\boldsymbol{Q}}^{\left( {k + 1} \right)}}\mathit{\boldsymbol{f}}_k^{\left( k \right)}, \end{array} $ |
因为Q(k+1)正交于Q(1), Q(2), …, Q(k), 所以
$ \left\| {{\mathit{\boldsymbol{R}}^{\left( k \right)}}} \right\|_F^2 = {\left\| {{{\left\| {{\mathit{\boldsymbol{R}}^{\left( 0 \right)}}} \right\|}_F}{\mathit{\boldsymbol{e}}_1} - {\mathit{\boldsymbol{T}}_k}{\mathit{\boldsymbol{f}}^{\left( k \right)}}} \right\|^2} + b_k^2\mathit{\boldsymbol{f}}_k^{\left( k \right)2}. $ |
而第一项为0, 故‖R(k)‖F=|bkfk(k)|.证毕.
因此当|bkfk(k)|很小时, 近似解较好地接近真解.
定理2 残量‖R(k)‖满足[R(k), R(j)]=0, 其中j=1, 2, …, k-1.
证明 由上述定理1的证明过程可知
$ \begin{array}{*{20}{c}} {{\mathit{\boldsymbol{R}}^{\left( k \right)}} = - {b_k}{\mathit{\boldsymbol{Q}}^{\left( {k + 1} \right)}}\mathit{\boldsymbol{f}}_k^{\left( k \right)},{\mathit{\boldsymbol{R}}^{\left( j \right)}} = - {b_{jk}}{\mathit{\boldsymbol{Q}}^{\left( {j + 1} \right)}}\mathit{\boldsymbol{f}}_j^{\left( j \right)},}\\ {\left[ {{\mathit{\boldsymbol{Q}}^{\left( {k + 1} \right)}},{\mathit{\boldsymbol{Q}}^{\left( {j + 1} \right)}}} \right] = 0\left( {j = 1,2, \cdots ,k - 1} \right),} \end{array} $ |
所以[R(k), R(j)]=0.证毕.
定理3 设方程(1) 相容, 则对任意初始元素X(0)∈V, 算法1至多需n步有R(n)=0, 即至多需n步算法收敛(n为欧式空间的维数).
证明 n维欧式空间V中两两正交的非零元素至多为n个, 又由定理2的结论可知对任意初始元素X(0)∈V所得到的残量R(0)与R(1), …, R(n-1), R(n)相互正交, 因此算法1至多需n步有R(n)=0, 即结论成立.证毕.
2 SYMMLQ算法的并行实现针对线性空间V=Rn×n上的对称变换T(X)=AX+XB, 即方程(1) 形式如下:
$ \mathit{\boldsymbol{AX}} + \mathit{\boldsymbol{XB}} = \mathit{\boldsymbol{F}}. $ | (4) |
其中A, B, F∈Rn×n, 并且A=AT, B=BT.下面给出求解方程(4) 的SYMMLQ算法的具体并行实现过程.设处理机台数为p, p整除n, 即n=pl, pi(i=1, 2, …, p) 表示第i台处理机.记
$ \begin{align} & \mathit{\boldsymbol{A}}={{\left( \mathit{\boldsymbol{A}}_{1}^{T},\mathit{\boldsymbol{A}}_{2}^{T},\cdots ,\mathit{\boldsymbol{A}}_{p}^{T} \right)}^{\rm{T}}},\mathit{\boldsymbol{B}}={{\left( \mathit{\boldsymbol{B}}_{1}^{T},\mathit{\boldsymbol{B}}_{2}^{T},\cdots ,\mathit{\boldsymbol{B}}_{p}^{T} \right)}^{\rm{T}}},\mathit{\boldsymbol{F}}={{\left( \mathit{\boldsymbol{F}}_{1}^{T},\mathit{\boldsymbol{F}}_{2}^{T},\cdots ,\mathit{\boldsymbol{F}}_{p}^{T} \right)}^{\rm{T}}}, \\ & {{\mathit{\boldsymbol{X}}}^{\left( k \right)}}={{\left( {{\left( \mathit{\boldsymbol{X}}_{1}^{\left( k \right)} \right)}^{\rm{T}}},{{\left( \mathit{\boldsymbol{X}}_{2}^{\left( k \right)} \right)}^{\rm{T}}},\cdots ,{{\left( \mathit{\boldsymbol{X}}_{p}^{\left( k \right)} \right)}^{\rm{T}}} \right)}^{\rm{T}}},{{\mathit{\boldsymbol{R}}}^{\left( k \right)}}= \\ & {{\left( {{\left( \mathit{\boldsymbol{R}}_{1}^{\left( k \right)} \right)}^{\rm{T}}},{{\left( \mathit{\boldsymbol{R}}_{2}^{\left( k \right)} \right)}^{\rm{T}}},\cdots ,{{\left( \mathit{\boldsymbol{R}}_{p}^{\left( k \right)} \right)}^{\rm{T}}} \right)}^{\rm{T}}}. \\ \end{align} $ |
类似地, 矩阵变量Q(k), W(k),
注 算法涉及到的所有矩阵乘法皆采用行行划分矩阵乘法的并行算法[17-18], 文中所涉及的矩阵乘法并行计算不再描述.
2.1 未改进的SYMMLQ算法的并行实现(1) 存储方式:将Ai, Bi, Fi, Xi(0)均存储在pi(i=1, 2, …, p) 处理机中.
(2) 计算过程:
①在处理机pi中, 计算Ri(0)=AiX(0)+Xi(0)B; 再计算内积[Ri(0), Ri(0)], 全归约后得到[R(0), R(0)], 然后计算Qi(1)=Ri(0)/‖R(0)‖;
②在处理机pi中, 计算Gi(1)=AiQ(1)+Qi(1)B, 再计算内积[Gi(1), Qi(1)], 全归约后得到a1=[G(1), Q(1)],计算Hi(1)=Gi(1)-a1Qi(1), 计算内积[Hi(1), Hi(1)], 全归约后得到[H(1), H(1)], 计算
③在pi中, 计算
④在各进程中计算变量c1, s1, δ2, ε3, r1, ξ1,
⑤在I=(I1T, I2T, …, IpT)T中, 计算变量ξ2以及
(3) 循环过程:
步骤1 计算
步骤2 在pi中, 计算
步骤3 在各进程计算变量
步骤4 在pi中, 计算变量
$ \begin{array}{*{20}{c}} {\mathit{\boldsymbol{W}}_i^{\left( {k - 1} \right)} = {c_{k - 1}}\mathit{\boldsymbol{\tilde W}}_i^{\left( {k - 1} \right)} + {s_{k - 1}}\mathit{\boldsymbol{Q}}_i^{\left( {k} \right)},\mathit{\boldsymbol{\tilde W}}_i^{\left( k \right)} = {s_{k - 1}}\mathit{\boldsymbol{\tilde W}}_i^{\left( {k - 1} \right)} - {c_{k - 1}}\mathit{\boldsymbol{Q}}_i^{\left( {k} \right)},}\\ {\mathit{\boldsymbol{Y}}_i^{\left( {k - 1} \right)} = \mathit{\boldsymbol{Y}}_i^{\left( {k - 2} \right)} + {\xi _{k - 1}}\mathit{\boldsymbol{W}}_i^{\left( {k - 1} \right)},\mathit{\boldsymbol{X}}_i^{\left( k \right)} = \mathit{\boldsymbol{Y}}_i^{\left( {k - 1} \right)} + {{\tilde \xi }_k}\mathit{\boldsymbol{\tilde W}}_i^{\left( k \right)};} \end{array} $ |
步骤5 置k:=k+1, 返回步骤1.
2.2 改进的SYMMLQ算法的并行实现在SYMMLQ算法的并行实现过程中, 每迭代一次需要计算
$ {a_k} = \left[ {T\left( {{\mathit{\boldsymbol{Q}}^{\left( k \right)}}} \right),{\mathit{\boldsymbol{Q}}^{\left( k \right)}}} \right].{b_k} = \left\| {T\left( {{\mathit{\boldsymbol{Q}}^{\left( k \right)}}} \right) - {a_k}{\mathit{\boldsymbol{Q}}^{\left( k \right)}} - {b_{k - 1}}{\mathit{\boldsymbol{Q}}^{\left( {k - 1} \right)}}} \right\|, $ |
显然需要2次全归约.为了保证改进的SYMMLQ算法的结构稳定性, 且基本不影响原算法的计算精度, 在经过多次尝试后, 对计算步骤进行重新安排, 将上述计算步骤调整为如下形式:
$ \begin{align} & {{a}_{k}}=\left[ T\left( {{\mathit{\boldsymbol{Q}}}^{\left( k \right)}} \right),{{\mathit{\boldsymbol{Q}}}^{\left( k \right)}} \right],{{\mathit{\boldsymbol{e}}}_{k}}={{\left\| T\left( {{\mathit{\boldsymbol{Q}}}^{\left( k \right)}} \right)-{{a}_{k-1}}{{\mathit{\boldsymbol{Q}}}^{\left( k \right)}}-{{b}_{k-1}}{{\mathit{\boldsymbol{Q}}}^{\left( k-1 \right)}} \right\|}^{2}},{{b}_{k}}= \\ & \sqrt{{{\mathit{\boldsymbol{e}}}_{k}}-{{\left( {{a}_{k}}-{{a}_{k-1}} \right)}^{2}}}. \\ \end{align} $ |
这样, ak, ek的计算只需要1次全归约, 从而达到减少通讯的目的.
调整步骤后, 得到了改进的SYMMLQ算法.虽然这个过程需增加一些计算量, 但相对很小.同时, 改进的算法结构保持不变, 所以编写程序代码时只需调整原算法相对应的循环过程步骤2即可, 其余步骤保持不变.具体实现过程如下:
步骤2 在pi中, 计算
在HPZ80工作站集群并行机上求解3个算例, 分别采用SYMMLQ算法与本文提出的改进的SYMMLQ算法求解, 并对计算结果进行比较.在迭代计算过程中, 所有的初始矩阵X(0)为零矩阵, 终止条件ε=10-6.
例1 Poisson方程第一边值问题
$ \left\{ \begin{array}{l} \Delta u = \frac{{{\partial ^2}u}}{{\partial {x^2}}} + \frac{{{\partial ^2}u}}{{\partial {y^2}}} = - f\left( {x,y} \right),\;\;\;\;0 < x,y < 1\\ u\left| {_{\mathit{\Gamma }{\rm{ = }}0}} \right. \end{array} \right.,f\left( {x,y} \right) = x + y. $ |
其中, Γ为单位正方形区域的边界, 取步长h=1/1201.采用中心差分格式, 可将本题转化为求解矩阵方程AX+XB=F的问题.设矩阵A=(aij)n×n, B=A, F=(fij)n×n,
$ {a_{ij}} = \left\{ \begin{array}{l} 2,i = j,\\ - 1,\left| {i - j} \right| = 1,{f_{ij}} = {h^3}\left( {i + j} \right).\\ 0,\;\;其他. \end{array} \right. $ |
当n=1 200时, 计算结果见表 1.
p | 本文算法 | SYMMLQ算法 | |||||||
10 | 15 | 20 | 25 | 10 | 15 | 20 | 25 | ||
T/s | 878.14 | 622.79 | 482.49 | 423.20 | 1 045.40 | 749.39 | 580.78 | 503.81 | |
K | 2 122 | 2 122 | 2 122 | 2 122 | 2 122 | 2 122 | 2 122 | 2 122 | |
E | - | 0.94 | 0.91 | 0.83 | - | 0.93 | 0.90 | 0.83 | |
E1 | 1.00 | 0.94 | 0.91 | 0.83 | 0.84 | 0.78 | 0.75 | 0.69 | |
Δ | 9.90×10-7 | 9.90×10-7 | 9.90×10-7 | 9.90×10-7 | 9.13×10-7 | 9.13×10-7 | 9.13×10-7 | 9.10×10-7 |
例2 在挡土墙的弹性力学分析中, 由于其结构的形状和受力、约束情况具有一定的特点, 所以将应力分量之间的关系近似表示为平衡偏微分方程,即
$ \frac{{{\partial ^2}{u_1}}}{{\partial {x^2}}} + \frac{{{\partial ^2}{u_2}}}{{\partial {y^2}}} + 2\frac{{{\partial ^2}{u_3}}}{{\partial x\partial y}} = f\left( {x,y} \right). $ |
其中, 假设正应力u1,u2与切应力u3平衡, 即u1=u2=u3=u, 并且f(x, y)=x+y, 应力边界条件为u|Γ=0, Γ为受力边界.于是, 平衡偏微分方程转化为
$ \frac{{{\partial ^2}u}}{{\partial {x^2}}} + \frac{{{\partial ^2}u}}{{\partial {y^2}}} + 2\frac{{{\partial ^2}u}}{{\partial x\partial y}} = x + y. $ |
取步长h=1/1 001, 采用中心差分格式以及一些近似处理, 可将本题转化为求解Sylvester矩阵方程AX+XB=F的问题.设矩阵A=(aij)n×n, B=(bij)n×n, F=(fij)n×n,
$ {a_{ij}} = \left\{ \begin{array}{l} - 4,\;\;\;\;\;\;i = j,\\ 4,\;\;\;\;\;\;\;\;\left| {i = j} \right| = 1,\\ 1,\;\;\;\;\;\;\;\;\left| {i = j} \right| = 2,\\ 0,\;\;\;\;\;\;\;\;其他. \end{array} \right.\;\;{b_{ij}} = \left\{ \begin{array}{l} - 8,\;\;\;\;\;\;i = j,\\ 2,\;\;\;\;\;\;\;\;\left| {i = j} \right| = 1,\\ - 1,\;\;\;\;\;\;\left| {i = j} \right| = 2,\\ 0,\;\;\;\;\;\;\;\;其他. \end{array} \right.{f_{ij}} = 3{h^3}\left( {i + j} \right), $ |
计算结果见表 2.
p | 本文算法 | SYMMLQ算法 | |||||||
10 | 15 | 20 | 25 | 10 | 15 | 20 | 25 | ||
T/s | 767.72 | 619.13 | 511.81 | 473.90 | 1 096.74 | 943.92 | 802.49 | 751.19 | |
K | 53 912 | 53 912 | 53 912 | 53 912 | 53 920 | 53 920 | 53 920 | 53 920 | |
E | - | 0.93 | 0.90 | 0.81 | - | 0.87 | 0.82 | 0.73 | |
E1 | 1.00 | 0.93 | 0.90 | 0.81 | 0.70 | 0.61 | 0.57 | 0.51 | |
Δ | 9.34×10-7 | 9.34×10-7 | 9.34×10-7 | 9.34×10-7 | 9.21×10-7 | 9.21×10-7 | 9.21×10-7 | 9.21×10-7 |
例3 设矩阵A=(aij)n×n, B=(bij)n×n, 其中
$ {a_{ij}} = \left\{ \begin{array}{l} 1,9,\;\;\;\;\;i = j,\\ - 1,\;\;\;\;\;\;\left| {i = j} \right| = 1,\\ 0,\;\;\;\;\;\;\;\;其他. \end{array} \right.\;\;\;{b_{ij}} = \left\{ \begin{array}{l} 1.8,\;\;\;\;\;i = j,\\ - 1,\;\;\;\;\;\left| {i = j} \right| = 1,\\ 0,\;\;\;\;\;\;\;其他. \end{array} \right. $ |
求解Sylvester矩阵方程AX+XB=F.在此取n=1 500, F为任意给定的矩阵.计算结果见表 3.
p | 本文算法 | SYMMLQ算法 | |||||||
10 | 15 | 20 | 25 | 10 | 15 | 20 | 25 | ||
T/s | 432.35 | 380.09 | 351.50 | 338.43 | 635.81 | 565.16 | 536.55 | 526.55 | |
K | 21 680 | 21 680 | 21 680 | 21 680 | 21 682 | 21 682 | 21 682 | 21 682 | |
E | - | 0.91 | 0.82 | 0.73 | - | 0.90 | 0.79 | 0.69 | |
E1 | 1.00 | 0.91 | 0.82 | 0.73 | 0.68 | 0.61 | 0.53 | 0.46 | |
Δ | 8.25×10-7 | 8.25×10-7 | 8.25×10-7 | 8.25×10-7 | 9.44×10-7 | 9.44×10-7 | 9.44×10-7 | 9.44×10-7 |
表 1~3中p表示处理机台数, T/s表示时间, K表示迭代次数, E表示相对并行效率, E1表示绝对并行效率, Δ表示误差‖R(k)‖.由于一台处理机的计算时间较长, 因此采用多台处理机的计算时间, 并且使用多台处理机中最小的计算时间作为基准来计算加速比和并行效率.
3.2 结果分析(1) 改进的SYMMLQ算法比未改进的算法减少了并行计算时间, 这是由于改进后的算法在循环迭代中减少了全归约的次数, 从而缩短了全局通讯所引起的时间消耗.
(2) 改进的SYMMLQ算法的迭代次数与原SYMMLQ算法的迭代次数基本相同, 说明改进的算法没有破坏原有SYMMLQ算法的结构与数值稳定性, 而且由表中误差数据可以看出本文改进的算法对计算的精度影响不大.
(3) 表 3数据表明当计算规模一定时, 处理机增加到30台以上, 并行效率会急剧下降, 这是由于整体内积的计算与全收集而引起的大规模通讯使得消耗增加, 进一步说明本文算法与原算法的可扩放性比较差, 因此为保证较高的并行效率应采用适当数量的处理机.
致谢: 感谢西北工业大学高性能计算研究与发展中心的大力支持![1] | DATTA B. Numerical methods for linear control systems[M]. Berlin: Elsevier Academic Press, 2004. |
[2] | CALVETTI D, REICHEL L. Application of ADI iterative methods to the restoration of noisy images[J]. Siam J Matrix Anal Appl, 1996, 17(1): 165-186 DOI:10.1137/S0895479894273687 |
[3] | ENRIGHT W H. Improving the efficiency of matrix operations in the numerical solution of stiff ordinary differential equations[J]. ACM Trans Math Software, 1978, 4(2): 127-136 DOI:10.1145/355780.355784 |
[4] | ANTOULAS A C. Approximation of large-scale dynamical systems (Advances in design and control)[M]. Philadelphia: Society of Industnal and Applied Mathematics, 2005. |
[5] | BAUR U, BENNER P. Cross-gramian based model reduction for data-sparse systems[J]. Electron Trans Numer Anal, 2008, 31(1): 256-270 |
[6] | SORENSEN D, ANTOULAS A. The sylvester equation and approximate balanced reduction[J]. Linear Algebra Appl, 2002, 351-352(2): 671-700 |
[7] | HALANAY A, RǍSVAN V. Applications of Lyapunov methods in stability[M]. London: Kluwer Academic Publishers, 1993. |
[8] | PEACEMAN D W, RACHFORD H H. The numerical solution of parabolic and elliptic differential equations[J]. J Soc Ind Appl Math, 1995, 3(1): 28-41 |
[9] |
GOLUBG H, LOANC F.
矩阵计算[M]. 北京: 人民邮电出版社, 2011.
GOLUB G H, LOAN C F. Matrix calculations[M]. Beijing: Posts and Telecom Press, 2011. |
[10] |
吴建平, 王正华, 李晓梅.
稀疏线性方程组的高效求解与并行算法[M]. 长沙: 湖南科学技术出版社, 2004: 269-273.
WU Jianping, WANG Zhenghua, LI Xiaomei. The efficient solution and parallel algorithm of spare linear equations[M]. Changsha: Science and Technology Press of Hunan, 2004: 269-273. |
[11] | XIE Yajun, HUANG Na, MA Changfeng. Iterative method to solve the generalized coupled Sylvester-transpose linear matrix equations over reflexive or anti-reflexive matrix[J]. Computers and Mathematics with Applications, 2014, 67(11): 2071-2084 DOI:10.1016/j.camwa.2014.04.012 |
[12] |
蒋尔雄.
矩阵计算[M]. 北京: 科学出版社, 2008.
JIANG Erxiong. Matrix calculations[M]. Beijing: Science Press, 2008. |
[13] |
李大明.
数值线性代数[M]. 北京: 清华大学出版社, 2008.
LI Daming. Numerical linear algebra[M]. Beijing: Tsinghua University Press, 2008. |
[14] |
张凯院, 徐仲.
数值代数[M]. 2版. 北京: 科学出版社, 2010.
ZHANG Kaiyuan, XU Zhong. Numerical algebra[M]. 2ed. Beijing: Scince Press, 2010. |
[15] |
曹方颖, 吕全义. 预处理变形共轭梯度法并行求解矩阵的Moore-Penrose逆[J].
纺织高校基础科学学报, 2013, 26(1): 137-142 CAO Fangyin, LYU Quanyi. A parallel preconditioned modified conjugate gradient method of Moore-Penrose inverse of matrices[J]. Basic Sciences Journal of Textile University, 2013, 26(1): 137-142 |
[16] | PAIGN C C, SAUNDERS M A. Solution of sparse indefinite system of linear equations[J]. Siam J Number Anal, 1975, 12(4): 617-629 DOI:10.1137/0712047 |
[17] |
李晓梅, 吴建平.
数值并行算法与软件[M]. 北京: 科学出版社, 2007.
LI Xiaomei, WU Jianping. Numerical parallel algorithm and software[M]. Beijing: Science Press, 2007. |
[18] |
李建江.
并行计算机及编程基础[M]. 北京: 清华大学出版社, 2011.
LI Jianjiang. Parallel computers and programming base[M]. Beijing: Tsinghua University Press, 2011. |