diff --git a/doc/DeviceProposal.md b/doc/DeviceProposal.md new file mode 100644 index 0000000000000000000000000000000000000000..cb91759b5e4a08f8e380941dcba5c0c1829b4b76 --- /dev/null +++ b/doc/DeviceProposal.md @@ -0,0 +1,65 @@ +#DeviceProposal +*How we might use the extension mechanism devices for general Kernel execution. Updated May 9, 2012 by frost.g...@gmail.com* + +At present the first GPU or CPU device (depending on Kernel.ExecutionMode value) is chosen at execution time. This make it easy to execute simple Kernels, but is problematic when using some advanced feature (barriers, local memory) or for sizing buffers appropriate for the target device. I propose that we add API's to allow the developer to specify exactly which device we intend to target. + +In the extension proposal branch we needed to expose a Device class for binding arbitrary OpenCL to a Java interface. I suggest we also be use this to query device information useful for allocating suitable size global buffers/local buffers, and for dispatching Kernel's to specific devices. + +The general pattern would be that we ask Aparapi to give us a Device, probably via a Device factory method. + +Something like:- + + Device device = Device.best(); +We would also offer other useful factory methods `getBestGPU(), getFirstCPU() getJavaMultiThread(), getJavaSequential()` as well as a method to get all device so that the developer can filter themselves. + +Note that as well as real OpenCL devices we also expose 'pseudo' devices such as JavaMultiThread and Sequential. We might also allow pseudo devices to group multiple devices. So getAllGPUDevices() might return a pseudo device for executing across devices. + + Device chosen=null; + for (Device device: devices.getAll()){ + if (device.getVendor().contains("AMD") && device.isGPU()){ + chosen = device; + break; + } + } + +A Device can be queried `(isGPU(), isOpenCL(), isGroup(), isJava(), getOpenCLPlatform(), getMaxMemory(), getLocalSizes())` and may need to be cast to specific types. + +This would allow us to configure buffers. + + Device device = Device.best(); + if (device instanceof OpenCLDevice){ + OpenCLDevice openCLDevice = (OpenCLDevice)device; + char input[] = new char[openCLDevice.getMaxMemory()/4); + } +We can also use the Device as a factory for creating Ranges. + + Range range = device.createRange2D(width, height); +This allows the Range to be created with knowledge of the underlying device. So for example `device.createRange3D(1024, 1024, 1024, 16, 16, 16)` will fail if the device does not allow a local size of (16x16x16). + +A range created using `device.createRangeXX()` would also capture the device that created it. As if we had + + Range range = device.createRange2D(width, height); + // implied range.setDevice(device); + This basically means that the Range locks the device that it can be used with. + + So when we have a Kernel. + + Kernel kernel = new Kernel(){ + @Override public void run(){ + ... + } + } +And we then use + + Device device = Device.firstGPU(); + final char input[] = new char[((OpenCLDevice)device).getMaxMemory()/4); + Kernel kernel = new Kernel(){ + @Override public void run(){ + // uses input[]; + } + }; + range = device.createRange2D(1024, 1024); + kernel.execute(range); +We have forced execution on the first GPU. Java fallback would still be possible (should we forbid this?). + + kernel.execute( Device.firstGPU().getRange2D(width, height)); diff --git a/doc/NewOpenCLBinding.md b/doc/NewOpenCLBinding.md new file mode 100644 index 0000000000000000000000000000000000000000..32e5f4347b94d3e6b300543873850158350bdfde --- /dev/null +++ b/doc/NewOpenCLBinding.md @@ -0,0 +1,51 @@ +#NewOpenCLBinding +*How to use new OpenCL binding mechanism. Updated Mar 6, 2012 by frost.g...@gmail.com* +As a step towards the extension mechanism I needed a way to easily bind OpenCL to an interface. + +Here is what I have come up with. We will use the 'Square' example. + +You first define an interface with OpenCL annotations.. + + interface Squarer extends OpenCL<Squarer>{ + @Kernel("{\n"// + + " const size_t id = get_global_id(0);\n"// + + " out[id] = in[id]*in[id];\n"// + + "}\n")// + public Squarer square(// + Range _range,// + @GlobalReadOnly("in") float[] in,// + @GlobalWriteOnly("out") float[] out); + } + +This describes the API we wish to bind to a set of kernel entrypoints (here we only have one, but we could have many). Then you 'realize' the interface by asking a device to create an implementation of the interface. Device is a new Aparapi class which represents a GPU or CPU OpenCL device. So here we are asking for the first (default) GPU device to realize the interface. + + Squarer squarer = Device.firstGPU(Squarer.class); +Now you can call the implementation directly with a Range. + + squarer.square(Range.create(in.length), in, out); +I think that we will have the easiest OpenCL binding out there... + +Following some conversations/suggestions online http://a-hackers-craic.blogspot.com/2012/03/aparapi.html we could also offer the ability to provide the OpenCL source from a file/url course using interface level Annotations. + +So we could allow. + + @OpenCL.Resource("squarer.cl"); + interface Squarer extends OpenCL<Squarer>{ + public Squarer square(// + Range _range,// + @GlobalReadOnly("in") float[] in,// + @GlobalWriteOnly("out") float[] out); + } +Or if the text is on-hand at compile time in a single constant string + + @OpenCL.Source("... opencl text here"); + interface Squarer extends OpenCL<Squarer>{ + public Squarer square(// + Range _range,// + @GlobalReadOnly("in") float[] in,// + @GlobalWriteOnly("out") float[] out); + } +Finally to allow for creation of dynamicl OpenCL (good for FFT's of various Radii). + + String openclSource = ...; + Squarer squarer = Device.firstGPU(Squarer.class, openclSource); diff --git a/doc/README.md b/doc/README.md index caf04b34ac8b7b98a5b37717007aedbdf35820b4..75e2686865dc7d9ca8ba524d9f218bdbe474f947 100644 --- a/doc/README.md +++ b/doc/README.md @@ -27,8 +27,8 @@ APARAPI Documentation | [ExplicitBufferHandling](ExplicitBufferHandling.md) | How to minimize buffer transfers | | [AparapiPatterns](AparapiPatterns.md) | Examples and code fragments to demonstrate Aparapi fetaures. | | [ProfilingKernelsFromEclipse](ProfilingKernelsFromEclipse.md) | Profiling Kernels with AMD profiler in Eclipse (Indigo) | -| DeviceProposal | How we might use the extension mechanism devices for general Kernel execution.| -| NewOpenCLBinding | How to use new OpenCL binding mechanism. | +| [DeviceProposal](DeviceProposal.md) | How we might use the extension mechanism devices for general Kernel execution.| +| [NewOpenCLBinding](NewOpenCLBinding.md) | How to use new OpenCL binding mechanism. | | AparapiExtensionProposal | A proposed aparapi extension mechanism. | | UsingConstantMemory | How to make use of constant memory in a Kernel | | UsingLocalMemory | How to make use of local memory in a Kernel |