◇◇新语丝(www.xys.org)(xys3.dxiong.com)(www.xysforum.org)(xys2.dropin.org)◇◇

北京科技大学胡长军教授如此剽窃他人成果

Changjun hu （胡长军教授）一文：“
　　OpenMP Extensions for Irregular Parallel Applications on Clusters ”
　　发表在 3rd International Workshop on OpenMP （IWOMP 2007），并刊登
于 《Lecture Notes in Computer Science （LNCS）》Volume 4935：
http://www.springerlink.com/content/y46605g77724g8m8/

胡文全文剽窃如下:

"Symbolic Communication Set Generation for Irregular Parallel 
Applications" by Guo, Pan and Liu, The Journal of Supercomputing, vol 
25,2003, pp. 199-214 
"Effective OpenMP Extensions for Irregular Applications on Cluster 
Environments" by Guo, Cao, Chang, Li and Liu [GuoCao文] 
"Communication Generation for Aligned and Cyclic(K) Distributions 
Using
  Integer Lattice" by Tseng and Gaudiot [Tseng文] 
"Optimizing Irregular Shared-Memory Applications for 
Distributed-Memory Systems"
  by Basumallik and Eigenmann [Basumallik文] 

剽窃情况俯拾皆是, 略举一二如下:

1.  胡文第1节
Sparse and unstructured computations are widely used in scientific and 
engineering applications.  This means that the data arrays are indexed 
either through the value in other arrays, which are called indirection 
array, or through non-affine subscripts.  Indirect/nonlinear indexing 
causes the data access pattern to be highly irregular.  Such a problem 
is called irregular problem.

抄自GuoCao 文
Abstract:Sparse and unstructured computations are widely used in 
Scientific and  Engineering Applications.
第1节:
This means that the data arrays are indexed either through the value 
in other arrays, which are called indirection arrays/index arrays, or 
through non-affine subscripts.  The use of indirect/nonlinear indexing 
causes the data access patterns ... to be highly irregular.  Such a 
problem is called irregular problem.

2. 胡文第1节
If the array subscript expressions are nonlinear form, which appear in 
some irregular parallel applications, the performance of total 
execution may not be improved using the techniques mentioned above.
 
抄自
GuoPan 文第1节:
However, if the array subscript expressions are not of the linear form 
- called nonlinear - which appears in some irregular parallel 
applications - the above mentioned techniques cannot be applied in 
this situation.
GuoCao文第2节:
However, if irregular loops are not parceled ..., the performance of 
total  execution may not be improved ...

3.  胡文第1节 Fig 1
/* a perfectly nested loop phi */
    DO i_1 = 1, N, S_1
      DO i_2 = L_2(i_1), U_2(i_1), S_2(i_1)
        ...
        DO i_n = L_n(i_1, ..., i_n-1), U_n(i_1, ..., i_n-1), S_n(i_1, 
..., i_n-1)
          A[f(i_1, ..., i_n)] = F(B[g(i_1, ..., i_n)]);
        ENDDO
        ...
      ENDDO
ENDDO

抄自GuoPan 文第2节:
Given a perfectly nested loop L as shown in the following.
   L_1: DO i_1 = X_1, Y_1, Z_1
          ...
   L_n:   DO i_n = X_n, Y_n, Z_n
   S:       A(f(i_1, i_2, ..., i_n)) = F(B(g(i_1, i_2, ..., i_n)));
          ENDDO
          ...
        ENDDO

4. 胡文第1节
For the sake of simplicity, we assume that the data array A and B have 
only one dimension.  In the loop, the array access functions (f and g), 
the lower and upper bound (L, U) and stride (S) may be arbitrary 
symbolic expressions made up of loop-invariant variables and loop 
indices of enclosing loops.General parallel   compiling techniques can 
not be applied to these kinds of irregular applications, because there 
is no affine relationship between the array global addresses of LHS 
(Left Hand Side) and RHS (Right Hand Side).

抄自GuoPan 文
第2节:
For the sake of simplicity, we will assume that the referenced array A 
and B have only one dimension.  The array access function (f and g), 
the loop's lower and upper bounds (X_i, Y_i) and stride (Z_i) may be 
arbitrary symbolic expressions made up of loop-invariant variables and 
loop indices of enclosing loops.
 
第1节:
General affine communication set generation techniques cannot be 
applied to these kinds of irregular applications because there is no 
affine relationship between the array global addresses of LHS and RHS.

5. 胡文第2节
In traditional OpenMP specification, there are four scheduling 
policies available: static scheduling, dynamic scheduling, guided 
scheduling, and runtime scheduling.  In order to reduce communication 
overhead and achieve load balance, we extend irregular scheduling to 
OpenMP.  This scheduling follows owner-compute rule, where each 
iteration will be executed by the processor which own the left hand 
side array reference of the assignment for that iteration.

抄自GuoCao 文第2节:
There are four scheduling policies available in OpenMP: static 
scheduling,dynamic scheduling, guided scheduling, and runtime 
scheduling.  In order to achieve load balance for irregular loops, it 
is better to select dynamic or guided sceduling ... the chunk parcel 
follows the owner-compute rule.  ... each iteration will be executed 
by the processor which owns the left hand side array reference of the 
assignment for that iteration.

6. 胡文第2节
In this example, the compiler will treat the loop as a partial 
ordered,i.e. some iterations are executed in ordered way while some 
other may be executed in parallel.

抄自GuoCao 文第3节:
... in this case the compiler will treat the loop as partially 
ordered,that is,  some iterations are executed sequentially while 
others may be executed in parallel.

7. 胡文第3节
The sending of messages can be performed individually for each remote 
processor as soon as each packing is completed, instead of waiting for 
all message for all processors to be packed.  The nonlocal iterations 
can also be performed based on each received message, instead of 
waiting for all messages to be received, because the nonlocal 
iterations are split into groups based on the sending processors.

抄自Tseng 文第3.1节:
The sending of messages can be performed individually for each remote 
processor as soon as each packing is completed, instead of waiting for 
all messages for all processors to be packed.  ... The nonlocal 
iterations can also be performed based on each received message, 
instead of waiting for all messages to be received.  This is because 
the nonlocal iterations are split into groups based on the sending 
processors.

8. 胡文第5节
In this section, we will present our transformation scheme relies on 
deducing the monotonicity of irregular accesses at compile time.
 
抄自Basumallik 文第1节:
The techniques proposed ... in previous work ... relied on deducing 
certain properties (such as monotonicity) of irregular accesses at 
compile-time.

(XYS20090329)

◇◇新语丝(www.xys.org)(xys3.dxiong.com)(www.xysforum.org)(xys2.dropin.org)◇◇