barrier causes random deadlocks in JTP mode for non-tiny ranges
the following code:
public static class SumKernel extends Kernel {
double[] a;
double[] b;
double[] c;
public void execute(double[] a, double[] b, double[] c) {
this.a = a;
this.b = b;
this.c = c;
execute(Range.create(a.length));
}
@Override
public void run() {
int i = getGlobalId();
c[i] = a[i] + b[i];
globalBarrier();
}
};
public static void main(String[] args) throws InterruptedException {
var random = new Random();
int len = 1000_000;
double[] a = new double[len];
double[] b = new double[len];
for (int i = 0; i < len; i++) {
a[i] = random.nextDouble();
b[i] = random.nextDouble();
}
for (int i = 0; i < 20; i++) {
double[] c = new double[len];
var thread = new Thread(() -> {
var kernel = new SumKernel();
kernel.setExecutionMode(EXECUTION_MODE.JTP);
kernel.execute(a, b, c);
kernel.dispose();
});
System.out.println();
System.out.println("starting...");
var start = System.currentTimeMillis();
thread.start();
thread.join(20_000l);
if ( ! thread.isAlive()) {
System.out.println("finished in " + (System.currentTimeMillis() - start));
continue;
}
System.out.println("still running after 20s, interrupting...");
thread.interrupt();
thread.join(2_000l);
if ( ! thread.isAlive()) {
System.out.println("interrupted successful.");
continue;
}
System.out.println("still running after interrupt, stopping...");
thread.stop();
thread.join(1_000l);
if (thread.isAlive()) {
System.out.println("still running after stop.");
} else {
System.out.println("stop successful.");
}
}
}
gives me an output similar to the below:
starting... WARNING: Aparapi is running on an untested OpenCL platform version: OpenCL 3.0 finished in 5096
starting... finished in 5069
starting... finished in 4945
starting... still running after 20s, interrupting... still running after interrupt, stopping... stop successful.
starting... finished in 5278
starting... still running after 20s, interrupting... still running after interrupt, stopping... stop successful.
starting... finished in 5286
starting... finished in 5318
starting... finished in 5637
starting... finished in 6113
starting... finished in 6939
starting... finished in 7115
starting... finished in 7830
starting... finished in 6700
starting... finished in 6603
starting... finished in 6662
starting... still running after 20s, interrupting... still running after interrupt, stopping... stop successful.
starting... finished in 6820
starting... finished in 6923
starting... finished in 7167
It works perfectly fine on GPU (without kernel.setExecutionMode(EXECUTION_MODE.JTP);
line) or without the globalBarrier();
line.
My environment:
aparapi 3.0.0
openJDK 11 (11.0.11+9-0ubuntu2~18.04)
ubuntu 18.04
Intel(R) Core(TM) i7-7560U CPU (2 physical + 2 virtual cores)
during successful runs, all 4 cores are uniformly utilized at about 50%. During deadlock runs, it starts the same: all 4 cores utilized at about 50% for about 5-7 seconds and then CPU usage goes virtually to 0.