Improve Range API to better bound a Range instance with Device and Kernel, fix work group size computation for JTP and OpenCL (refs #5)
Improve Range API to bind a Range instance with Device and Kernel, fix work group size computation for JTP and OpenCL (refs #5)
- Provide a new API to better accommodate the need to bind a Range instance with actual OpenCL device and respective Kernel to which the range pertains
- Fix Range bug when determining the work group size/local size for a 1D Range on OpenCL
- Fix Range computation for JTP over 1D ranges with no specified local size
- Improved Profiling code to handle cases where the Kernel compilation is decoupled from its execution
- Updated all unit tests to the new API changes, as well as new Profiling behavior