The erlang api for OpenCL.
OpenCL (Open Computing Language) is an open royalty-free standard for general purpose parallel programming across CPUs, GPUs and other processors, giving software developers portable and efficient access to the power of these heterogeneous processing platforms.
OpenCL supports a wide range of applications, ranging from embedded and consumer software to HPC solutions, through a low-level, high-performance, portable abstraction. By creating an efficient, close-to-the-metal programming interface, OpenCL will form the foundation layer of a parallel computing ecosystem of platform-independent tools, middleware and applications.
OpenCL consists of an API for coordinating parallel computation across heterogeneous processors; and a cross-platform programming language with a well-specified computation environment. The OpenCL standard:
The specification is divided into a core specification that any OpenCL compliant implementation must support; a handheld/embedded profile which relaxes the OpenCL compliance requirements for handheld and embedded devices; and a set of optional extensions that are likely to move into the core specification in later revisions of the OpenCL specification.
The documentation is re-used with the following copyright:
Copyright © 2007-2009 The Khronos Group Inc. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and/or associated documentation files (the "Materials"), to deal in the Materials without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Materials, and to permit persons to whom the Materials are furnished to do so, subject to the condition that this copyright notice and permission notice shall be included in all copies or substantial portions of the Materials.cl_addressing_mode() = {none | clamp_to_edge | clamp | repeat}
cl_char() = integer()
cl_context() = {{object, 3, non_neg_integer()}}
cl_context_info() = {{reference_count, cl_uint()}, {devices, [cl_device()]}, {properties, [cl_int()]}}
cl_context_info_key() = {reference_count | devices | properties}
cl_device_id() = {{object, 2, non_neg_integer()}}
cl_device_info() = {cl_device_info_key(), term()}
cl_device_info_key() = {type | vendor_id | max_compute_units | max_work_item_dimensions | max_work_group_size | max_work_item_sizes | preferred_vector_width_char | preferred_vector_width_short | preferred_vector_width_int | preferred_vector_width_long | preferred_vector_width_float | preferred_vector_width_double | max_clock_frequency | address_bits | max_read_image_args | max_write_image_args | max_mem_alloc_size | image2d_max_width | image2d_max_height | image3d_max_width | image3d_max_height | image3d_max_depth | image_support | max_parameter_size | max_samplers | mem_base_addr_align | min_data_type_align_size | single_fp_config | global_mem_cache_type | global_mem_cacheline_size | global_mem_cache_size | global_mem_size | max_constant_buffer_size | max_constant_args | local_mem_type | local_mem_size | error_correction_support | profiling_timer_resolution | endian_little | available | compiler_available | execution_capabilities | queue_properties | name | vendor | driver_version | profile | version | extensions | platform}
cl_device_type() = {gpu | cpu | accelerator | all | default}
cl_device_types() = {cl_device_type() | [cl_device_type()]}
cl_double() = float()
cl_error() = {device_not_found | device_not_available | compiler_not_available | mem_object_allocation_failure | out_of_resources | out_of_host_memory | profiling_info_not_available | mem_copy_overlap | image_format_mismatch | image_format_not_supported | build_program_failure | map_failure | invalid_value | 'invalid_device type' | invalid_platform | invalid_device | invalid_context | invalid_queue_properties | invalid_command_queue | invalid_host_ptr | invalid_mem_object | invalid_image_format_descriptor | invalid_image_size | invalid_sampler | invalid_binary | invalid_build_options | invalid_program | invalid_program_executable | invalid_kernel_name | invalid_kernel_definition | invalid_kernel | invalid_arg_index | invalid_arg_value | invalid_arg_size | invalid_kernel_args | invalid_work_dimension | invalid_work_group_size | 'invalid_work_item size' | invalid_global_offset | invalid_event_wait_list | invalid_event | invalid_operation | invalid_gl_object | invalid_buffer_size | invalid_mip_level | unknown}
cl_event() = {{object, 9, non_neg_integer()}}
cl_filter_mode() = {nearest | linear}
cl_float() = float()
cl_half() = float()
cl_int() = integer()
cl_kernel() = {{object, 8, non_neg_integer()}}
cl_kernel_arg() = integer() | float() | binary()
cl_long() = integer()
cl_mem() = {{object, 5, non_neg_integer()}}
cl_mem_flag() = {read_write | write_only | read_only | use_host_ptr | alloc_host_ptr | copy_host_ptr}
cl_platform_id() = {{object, 1, non_neg_integer()}}
cl_platform_info() = {profile, string()} | {version, string()} | {name, string()} | {vendor, string()} | {extensions, string()}
cl_platform_info_key() = profile | name | vendor | extensions
cl_program() = {{object, 7, non_neg_integer()}}
cl_queue() = {{object, 4, non_neg_integer()}}
cl_queue_property() = {out_of_order_exec_mode_enable | profiling_enabled}
cl_sampler() = {{object, 6, non_neg_integer()}}
cl_short() = integer()
cl_uchar() = non_neg_integer()
cl_uint() = non_neg_integer()
cl_ulong() = non_neg_integer()
cl_ushort() = non_neg_integer()
start_arg() = {{debug, boolean()}}
async_build_program/3 | |
async_finish/1 | |
async_flush/1 | |
async_wait_for_event/1 | Initiate an asynchronous wait operation. |
build_program/3 | Builds (compiles and links) a program executable from the program source or binary. |
context_info/0 | List context info queries. |
create_buffer/3 | Equivalent to create_buffer(Context, Flags, Size, <<>>). |
create_buffer/4 | Creates a buffer object. |
create_context/1 | Creates an OpenCL context. |
create_context_from_type/1 | Create an OpenCL context from a device type that identifies the specific device(s) to use. |
create_image/5 | |
create_image2d/7 | |
create_image3d/9 | |
create_kernel/2 | Creates a kernal object. |
create_kernels_in_program/1 | Creates kernel objects for all kernel functions in a program object. |
create_program_with_binary/3 | Creates a program object for a context, and loads specified binary data into the program object. |
create_program_with_source/2 | Creates a program object for a context, and loads the source code specified by the text strings in the strings array into the program object. |
create_queue/3 | Create a command-queue on a specific device. |
create_sampler/4 | Creates a sampler object. |
device_info/0 | Return a list of possible device info queries. |
device_info_10/1 | |
device_info_11/1 | |
device_info_12/1 | |
enqueue_barrier/1 | A synchronization point that enqueues a barrier operation. |
enqueue_barrier_with_wait_list/2 | |
enqueue_copy_buffer_to_image/7 | |
enqueue_copy_image/6 | |
enqueue_copy_image_to_buffer/7 | |
enqueue_map_buffer/6 | |
enqueue_map_image/6 | |
enqueue_marker/1 | Enqueues a marker command. |
enqueue_marker_with_wait_list/2 | |
enqueue_nd_range_kernel/5 | Enqueues a command to execute a kernel on a device. |
enqueue_nd_range_kernel/6 | |
enqueue_read_buffer/5 | Enqueue commands to read from a buffer object to host memory. |
enqueue_read_image/7 | |
enqueue_task/3 | Enqueues a command to execute a kernel on a device. |
enqueue_task/4 | |
enqueue_unmap_mem_object/3 | |
enqueue_wait_for_events/2 | Enqueues a wait for a specific event or a list of events to complete before any future commands queued in the command-queue are executed. |
enqueue_write_buffer/6 | Enqueue commands to write to a buffer object from host memory. |
enqueue_write_buffer/7 | |
enqueue_write_image/8 | |
enqueue_write_image/9 | |
event_info/0 | Returns all possible event_info items. |
finish/1 | Blocks until all previously queued OpenCL commands in a command-queue are issued to the associated device and have completed. |
flush/1 | Issues all previously queued OpenCL commands in a command-queue to the device associated with the command-queue. |
get_context_info/1 | Get all context info. |
get_context_info/2 | Query information about a context. |
get_device_ids/0 | Equivalent to get_devive_ids(0, all). |
get_device_ids/2 | Obtain the list of devices available on a platform. |
get_device_info/1 | Get all device info. |
get_device_info/2 | Get information about an OpenCL device. |
get_event_info/1 | Returns all specific information about the event object. |
get_event_info/2 | Returns specific information about the event object. |
get_image_info/1 | |
get_image_info/2 | |
get_kernel_info/1 | Returns all information about the kernel object. |
get_kernel_info/2 | Returns specific information about the kernel object. |
get_kernel_workgroup_info/2 | Returns all information about the kernel object that may be specific to a device. |
get_kernel_workgroup_info/3 | Returns specific information about the kernel object that may be specific to a device. |
get_mem_object_info/1 | Used to get all information that is common to all memory objects (buffer and image objects). |
get_mem_object_info/2 | Used to get |
get_platform_ids/0 | Obtain the list of platforms available. |
get_platform_info/1 | Get all information about the OpenCL platform. |
get_platform_info/2 | Get specific information about the OpenCL platform. |
get_program_build_info/2 | Returns all build information for each device in the program object. |
get_program_build_info/3 | Returns specific build information for each device in the program object. |
get_program_info/1 | Returns all information about the program object. |
get_program_info/2 | Returns specific information about the program object. |
get_queue_info/1 | Returns all queue info. |
get_queue_info/2 | Return the specified queue info. |
get_sampler_info/1 | Returns all information about the sampler object. |
get_sampler_info/2 | Returns |
get_supported_image_formats/3 | return a list of image formats [{Order,Type}]. |
image_info/0 | |
kernel_info/0 | |
kernel_workgroup_info/0 | |
mem_object_info/0 | Returns a list of the possible mem info keys. |
noop/0 | Run a no operation towards the NIF object. |
nowait_enqueue_nd_range_kernel/5 | |
nowait_enqueue_task/3 | |
nowait_enqueue_write_buffer/6 | |
nowait_enqueue_write_image/8 | |
platform_info/0 | Returns a list of the possible platform info keys. |
program_build_info/0 | |
program_info/0 | |
queue_info/0 | Returns the list of possible queue info items. |
release_context/1 | Decrement the context reference count. |
release_event/1 | Decrements the event reference count. |
release_kernel/1 | Decrements the kernel reference count. |
release_mem_object/1 | Decrements the memory object reference count. |
release_program/1 | Decrements the program reference count. |
release_queue/1 | Decrements the command_queue reference count. |
release_sampler/1 | Decrements the sampler reference count. |
retain_context/1 | Increment the context reference count. |
retain_event/1 | Increments the event reference count. |
retain_kernel/1 | Increments the program kernel reference count. |
retain_mem_object/1 | Increments the memory object reference count. |
retain_program/1 | Increments the program reference count. |
retain_queue/1 | Increments the command_queue reference count. |
retain_sampler/1 | Increments the sampler reference count. |
sampler_info/0 | |
set_kernel_arg/3 | Used to set the argument value for a specific argument of a kernel. |
set_kernel_arg_size/3 | clErlang special to set kernel arg with size only (local mem etc). |
set_queue_property/3 | Function is deprecated and have been removed. |
start/0 | Start the OpenCL application. |
start/1 | Start the OpenCL application. |
stop/0 | Stop the OpenCL application. |
unload_compiler/0 | Allows the implementation to release the resources allocated by the OpenCL compiler. |
unload_platform_compiler/1 | |
versions/0 | Run a no operation towards the NIF object. |
wait/1 | |
wait/2 | Waits for commands identified by event objects to complete. |
wait_for_event/1 | Equivalent to wait(Event, infinity). |
async_build_program(Program, DeviceList, Options) -> any()
async_finish(Queue) -> any()
async_flush(Queue) -> any()
async_wait_for_event(Event::cl_event()) -> {ok, reference()} | {error, cl_error()}
Initiate an asynchronous wait operation.
Generate a wait operation that will run non blocking. A reference is return that can be used to match the event that is sent when the event has completed or resulted in an error. The event returned has the form{cl_event, Ref, Result}
where Ref is the reference that was returned from the call and
Result may be one of binary() | 'complete' or {error,cl_error()}.
build_program(Program::cl_program(), DeviceList::[cl_device_id()], Options::string()) -> ok | {error, cl_error()}
Builds (compiles and links) a program executable from the program source or binary.
OpenCL allows program executables to be built using the source or the binary.
The build options are categorized as pre-processor options, options for math intrinsics, options that control optimization and miscellaneous options. This specification defines a standard set of options that must be supported by an OpenCL compiler when building program executables online or offline. These may be extended by a set of vendor- or platform-specific options.
These options
control the OpenCL preprocessor which is run on each program source
before actual compilation. -D options are processed in the order
they are given in the options argument to
build_program/3
.
Predefine name
as a macro, with definition 1.
The contents of definition
are tokenized and processed as if they appeared during translation phase three in a #define
directive. In particular, the definition will be truncated by
embedded newline characters.
Add the directory dir
to the list of directories to be
searched for header files.
Treat double precision floating-point constant as single precision constant.
This option controls how single precision and double precision denormalized numbers are handled. If specified as a build option, the single precision denormalized numbers may be flushed to zero and if the optional extension for double precision is supported, double precision denormalized numbers may also be flushed to zero. This is intended to be a performance hint and the OpenCL compiler can choose not to flush denorms to zero if the device supports single precision (or double precision) denormalized numbers.
This option is ignored for single precision numbers if the device does not support single precision denormalized numbers i.e. CL_FP_DENORM bit is not set in CL_DEVICE_SINGLE_FP_CONFIG.
This option is ignored for double precision numbers if the device does not support double precision or if it does support double precison but CL_FP_DENORM bit is not set in CL_DEVICE_DOUBLE_FP_CONFIG.
This flag only applies for scalar and vector single precision floating-point variables and computations on these floating-point variables inside a program. It does not apply to reading from or writing to image objects.
This option disables all optimizations. The default is optimizations are enabled.
This option allows the compiler to assume the strictest aliasing rules.
The following options control compiler behavior regarding floating-point arithmetic. These options trade off between performance and correctness and must be specifically enabled. These options are not turned on by default since it can result in incorrect output for programs which depend on an exact implementation of IEEE 754 rules/specifications for math functions.
Allow a * b + c
to be replaced by a mad
. The mad
computes
a * b + c
with reduced accuracy. For example, some
OpenCL devices implement mad
as truncate
the result of a * b
before adding it to
c
.
Allow optimizations for floating-point arithmetic that ignore
the signedness of zero. IEEE 754 arithmetic specifies the behavior
of distinct +0.0
and -0.0
values, which
then prohibits simplification of expressions such as
x+0.0
or 0.0*x
(even with -clfinite-math
only). This option implies that the sign of a zero result isn't
significant.
Allow optimizations for floating-point arithmetic that (a) assume that arguments and results are valid, (b) may violate IEEE 754 standard and (c) may violate the OpenCL numerical compliance requirements as defined in section 7.4 for single-precision floating-point, section 9.3.9 for double-precision floating-point, and edge case behavior in section 7.5. This option includes the -cl-no-signed-zeros and -cl-mad-enable options.
Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or ±infinity. This option may violate the OpenCL numerical compliance requirements defined in in section 7.4 for single-precision floating-point, section 9.3.9 for double-precision floating-point, and edge case behavior in section 7.5.
Sets the optimization options -cl-finite-math-only and -cl-unsafe-math-optimizations.
This allows optimizations for floating-point arithmetic that may violate the IEEE 754
standard and the OpenCL numerical compliance requirements defined in the specification in section 7.4 for single-precision floating-point, section 9.3.9 for double-precision floating-point,
and edge case behavior in section 7.5. This option causes the preprocessor macro
__FAST_RELAXED_MATH__
to be defined in the OpenCL program.
Inhibit all warning messages.
Make all warnings into errors.
context_info() -> [cl_context_info_key()]
List context info queries.
create_buffer(Context::cl_context(), Flags::cl_mem_flags(), Size::non_neg_integer()) -> {ok, cl_mem()} | {error, cl_error()}
Equivalent to create_buffer(Context, Flags, Size, <<>>).
create_buffer(Context::cl_context(), Flags::[cl_mem_flag()], Size::non_neg_integer(), Data::binary()) -> {ok, cl_mem()} | {error, cl_error()}
Creates a buffer object.
create_context(DeviceList::[cl_device_id()]) -> {ok, cl_context()} | {error, cl_error()}
Creates an OpenCL context.
An OpenCL context is created with one or more devices. Contexts are used by the OpenCL runtime for managing objects such as command-queues, memory, program and kernel objects and for executing kernels on one or more devices specified in the context.
NOTE: create_context/1 and create_context_from_type/1 perform an implicit retain. This is very helpful for 3rd party libraries, which typically get a context passed to them by the application. However, it is possible that the application may delete the context without informing the library. Allowing functions to attach to (i.e. retain) and release a context solves the problem of a context being used by a library no longer being valid.create_context_from_type(Type::cl_device_types()) -> {ok, cl_context()} | {error, cl_error()}
Create an OpenCL context from a device type that identifies the specific device(s) to use.
NOTE: create_context_from_type/1 may return all or a subset of the actual physical devices present in the platform and that match device_type.
create_context/1 and create_context_from_type/1 perform an implicit retain. This is very helpful for 3rd party libraries, which typically get a context passed to them by the application. However, it is possible that the application may delete the context without informing the library. Allowing functions to attach to (i.e. retain) and release a context solves the problem of a context being used by a library no longer being valid.create_image(Context, MemFlags, ImageFormat, ImageDesc, Data) -> any()
create_image2d(Context, MemFlags, ImageFormat, Width, Height, Picth, Data) -> any()
create_image3d(Context, MemFlags, ImageFormat, Width, Height, Depth, RowPicth, SlicePitch, Data) -> any()
create_kernel(Program::cl_program(), Name::string()) -> {ok, cl_kernel()} | {error, cl_error()}
Creates a kernal object.
A kernel is a function declared in a program. A kernel is identified by the __kernel qualifier applied to any function in a program. A kernel object encapsulates the specific __kernel function declared in a program and the argument values to be used when executing this __kernel function.create_kernels_in_program(Program::cl_program()) -> {ok, [cl_kernel()]} | {error, cl_error()}
Creates kernel objects for all kernel functions in a program object.
Creates kernel objects for all kernel functions in program. Kernel objects are not created for any __kernel functions in program that do not have the same function definition across all devices for which a program executable has been successfully built.create_program_with_binary(Context::cl_context(), DeviceList::[cl_device_id()], BinaryList::[binary()]) -> {ok, cl_program()} | {error, cl_error()}
Creates a program object for a context, and loads specified binary data into the program object.
OpenCL allows applications to create a program object using the program source or binary and build appropriate program executables. This allows applications to determine whether they want to use the pre-built offline binary or load and compile the program source and use the executable compiled/linked online as the program executable. This can be very useful as it allows applications to load and build program executables online on its first instance for appropriate OpenCL devices in the system. These executables can now be queried and cached by the application. Future instances of the application launching will no longer need to compile and build the program executables. The cached executables can be read and loaded by the application, which can help significantly reduce the application initialization time.
The binaries and device can be generated by calling:
{ok,P} = cl:create_program_with_source(Context,Source),
ok = cl:build_program(P, DeviceList, Options),
{ok,DeviceList} = cl:get_program_info(P, devices),
{ok,BinaryList} = cl:get_program_info(P, binaries).
create_program_with_source(Context::cl_context(), Source::iodata()) -> {ok, cl_program()} | {error, cl_error()}
Creates a program object for a context, and loads the source code specified by the text strings in the strings array into the program object.
The devices associated with the program object are the devices associated with context.create_queue(Context::cl_context(), Device::cl_device_id(), Properties::[cl_queue_property()]) -> {ok, cl_queue()} | {error, cl_error()}
Create a command-queue on a specific device.
The OpenCL functions that are submitted to a command-queue are enqueued in the order the calls are made but can be configured to execute in-order or out-of-order. The properties argument in clCreateCommandQueue can be used to specify the execution order.
If the 'out_of_order_exec_mode_enable' property of a command-queue is not set, the commands enqueued to a command-queue execute in order. For example, if an application calls clEnqueueNDRangeKernel to execute kernel A followed by a clEnqueueNDRangeKernel to execute kernel B, the application can assume that kernel A finishes first and then kernel B is executed. If the memory objects output by kernel A are inputs to kernel B then kernel B will see the correct data in memory objects produced by execution of kernel A. If the 'out_of_order_exec_mode_enable' property of a commandqueue is set, then there is no guarantee that kernel A will finish before kernel B starts execution.
Applications can configure the commands enqueued to a command-queue to execute out-of-order by setting the 'out_of_order_exec_mode_enable' property of the command-queue. This can be specified when the command-queue is created or can be changed dynamically using clCreateCommandQueue. In out-of-order execution mode there is no guarantee that the enqueued commands will finish execution in the order they were queued. As there is no guarantee that kernels will be executed in order, i.e. based on when the clEnqueueNDRangeKernel calls are made within a command-queue, it is therefore possible that an earlier clEnqueueNDRangeKernel call to execute kernel A identified by event A may execute and/or finish later than a clEnqueueNDRangeKernel call to execute kernel B which was called by the application at a later point in time. To guarantee a specific order of execution of kernels, a wait on a particular event (in this case event A) can be used. The wait for event A can be specified in the event_wait_list argument to clEnqueueNDRangeKernel for kernel B.
In addition, a wait for events or a barrier command can be enqueued to the command-queue. The wait for events command ensures that previously enqueued commands identified by the list of events to wait for have finished before the next batch of commands is executed. The barrier command ensures that all previously enqueued commands in a command-queue have finished execution before the next batch of commands is executed.
Similarly, commands to read, write, copy or map memory objects that are enqueued after clEnqueueNDRangeKernel, clEnqueueTask or clEnqueueNativeKernel commands are not guaranteed to wait for kernels scheduled for execution to have completed (if the 'out_of_order_exec_mode_enable' property is set). To ensure correct ordering of commands, the event object returned by clEnqueueNDRangeKernel, clEnqueueTask or clEnqueueNativeKernel can be used to enqueue a wait for event or a barrier command can be enqueued that must complete before reads or writes to the memory object(s) occur.create_sampler(Context::cl_context(), Normalized::boolean(), AddressingMode::cl_addressing_mode(), FilterMode::cl_filter_mode()) -> {ok, cl_sampler()} | {error, cl_error()}
Creates a sampler object.
A sampler object describes how to sample an image when the image is read in the kernel. The built-in functions to read from an image in a kernel take a sampler as an argument. The sampler arguments to the image read function can be sampler objects created using OpenCL functions and passed as argument values to the kernel or can be samplers declared inside a kernel. In this section we discuss how sampler objects are created using OpenCL functions.device_info() -> [cl_device_info_key()]
Return a list of possible device info queries.
See also: get_device_info/2.
device_info_10(L) -> any()
device_info_11(L) -> any()
device_info_12(L) -> any()
enqueue_barrier(Queue::cl_queue()) -> ok | {error, cl_error()}
A synchronization point that enqueues a barrier operation.
enqueue_barrier/1 is a synchronization point that ensures that all queued commands in command_queue have finished execution before the next batch of commands can begin execution.enqueue_barrier_with_wait_list(Queue::cl_queue(), WaitList::[cl_event()]) -> {ok, cl_event()} | {error, cl_error()}
enqueue_copy_buffer_to_image(Queue, SrcBuffer, DstImage, SrcOffset, DstOrigin, Region, WaitList) -> any()
enqueue_copy_image(QUeue, SrcImage, DstImage, Origin, Region, WaitList) -> any()
enqueue_copy_image_to_buffer(Queue, SrcImage, DstBuffer, Origin, Region, DstOffset, WaitList) -> any()
enqueue_map_buffer(Queue, Buffer, MapFlags, Offset, Size, WaitList) -> any()
enqueue_map_image(Queue, Image, MapFlags, Origin, Region, WaitList) -> any()
enqueue_marker(Queue::cl_queue()) -> {ok, cl_event()} | {error, cl_error()}
Enqueues a marker command.
Enqueues a marker command to command_queue. The marker command returns an event which can be used to queue a wait on this marker event i.e. wait for all commands queued before the marker command to complete.enqueue_marker_with_wait_list(Queue::cl_queue(), WaitList::[cl_event()]) -> {ok, cl_event()} | {error, cl_error()}
enqueue_nd_range_kernel(Queue::cl_queue(), Kernel::cl_kernel(), Global::[non_neg_integer()], Local::[non_neg_integer()], WaitList::[cl_event()]) -> {ok, cl_event()} | {error, cl_error()}
Enqueues a command to execute a kernel on a device.
Work-group instances are executed in parallel across multiple compute units or concurrently on the same compute unit.
Each work-item is uniquely identified by a global identifier. The global ID, which can be read inside the kernel, is computed using the value given by global_work_size and global_work_offset. In OpenCL 1.0, the starting global ID is always (0, 0, ... 0). In addition, a work-item is also identified within a work-group by a unique local ID. The local ID, which can also be read by the kernel, is computed using the value given by local_work_size. The starting local ID is always (0, 0, ... 0).enqueue_nd_range_kernel(Queue, Kernel, Global, Local, WaitList, WantEvent) -> any()
enqueue_read_buffer(Queue::cl_queue(), Buffer::cl_mem(), Offset::non_neg_integer(), Size::non_neg_integer(), WaitList::[cl_event()]) -> {ok, cl_event()} | {error, cl_error()}
Enqueue commands to read from a buffer object to host memory.
Calling enqueue_read_buffer
to read a region of the
buffer object with the Buffer
argument value set to
host_ptr
+ offset
, where
host_ptr
is a pointer to the memory region specified
when the buffer object being read is created with
CL_MEM_USE_HOST_PTR
, must meet the following
requirements in order to avoid undefined behavior:
enqueue_read_image(Queue, Image, Origin, Region, RowPitch, SlicePitch, WaitList) -> any()
enqueue_task(Queue::cl_queue(), Kernel::cl_kernel(), WaitList::[cl_event()]) -> {ok, cl_event()} | {error, cl_error()}
Enqueues a command to execute a kernel on a device.
The kernel is executed using a single work-item.See also: enqueue_nd_range_kernel/5.
enqueue_task(Queue, Kernel, WaitList, WantEvent) -> any()
enqueue_unmap_mem_object(Queue, Mem, WaitList) -> any()
enqueue_wait_for_events(Queue::cl_queue(), WaitList::[cl_event()]) -> ok | {error, cl_error()}
Enqueues a wait for a specific event or a list of events to complete before any future commands queued in the command-queue are executed.
The context associated with events in WaitList and Queue must be the same.enqueue_write_buffer(Queue::cl_queue(), Buffer::cl_mem(), Offset::non_neg_integer(), Size::non_neg_integer(), Data::binary(), WaitList::[cl_event()]) -> {ok, cl_event()} | {error, cl_error()}
Enqueue commands to write to a buffer object from host memory.
Calling enqueue_write_buffer
to update the latest bits
in a region of the buffer object with the Buffer
argument value set to host_ptr
+ offset
,
where host_ptr
is a pointer to the memory region
specified when the buffer object being read is created with
CL_MEM_USE_HOST_PTR
, must meet the following
requirements in order to avoid undefined behavior:
(host_ptr + offset, cb)
contains the latest bits when the enqueued write command begins
execution. enqueue_write_buffer(Queue, Buffer, Offset, Size, Data, WaitList, WantEvent) -> any()
enqueue_write_image(Queue, Image, Origin, Region, RowPitch, SlicePitch, Data, WaitList) -> any()
enqueue_write_image(Queue, Image, Origin, Region, RowPitch, SlicePitch, Data, WaitList, WantEvent) -> any()
event_info() -> any()
Returns all possible event_info items.
finish(Queue::cl_queue()) -> ok | {error, cl_error()}
Blocks until all previously queued OpenCL commands in a command-queue are issued to the associated device and have completed.
finish does not return until all queued commands in command_queue have been processed and completed. clFinish is also a synchronization point.flush(Queue::cl_queue()) -> ok | {error, cl_error()}
Issues all previously queued OpenCL commands in a command-queue to the device associated with the command-queue.
flush only guarantees that all queued commands to command_queue get issued to the appropriate device. There is no guarantee that they will be complete after clFlush returns.get_context_info(Context::cl_context()) -> {ok, [cl_context_info()]} | {error, cl_error()}
Get all context info.
See also: get_context_info/2.
get_context_info(Context::cl_context(), Info::cl_context_info_key()) -> {ok, term()} | {error, cl_error()}
Query information about a context.
get_device_ids() -> {ok, [cl_device_id()]} | {error, cl_error()}
Equivalent to get_devive_ids(0, all).
get_device_ids(Platform::cl_platform_id(), Type::cl_device_types()) -> {ok, [cl_device_id()]} | {error, cl_error()}
Obtain the list of devices available on a platform.
get_device_ids/2 may return all or a subset of the actual physical devices present in the platform and that match device_type.
The application can query specific capabilities of the OpenCL device(s) returned by get_device_ids/2. This can be used by the application to determine which device(s) to use.get_device_info(Device) -> {ok, [cl_device_info()]} | {error, cl_error()}
Get all device info.
See also: get_device_info/2.
get_device_info(DevID::cl_device_id(), Info::cl_device_info_key()) -> {ok, term()} | {error, cl_error()}
Get information about an OpenCL device.
The OpenCL device type. Currently supported values are one of or a combination of: CL_DEVICE_TYPE_CPU, CL_DEVICE_TYPE_GPU, CL_DEVICE_TYPE_ACCELERATOR, or CL_DEVICE_TYPE_DEFAULT.
A unique device vendor identifier. An example of a unique device identifier could be the PCIe ID.
The number of parallel compute cores on the OpenCL device. The minimum value is 1.
Maximum dimensions that specify the global and local work-item IDs used by the data parallel execution model. (@see enqueue_nd_range_kernel/5). The minimum value is 3.
Maximum number of work-items in a work-group executing a kernel using the data parallel execution model. (@see enqueue_nd_range_kernel/5). The minimum value is 1.
Maximum number of work-items that can be specified in each dimension of the work-group to enqueue_nd_range_kernel/5.
Returns n
entries, where n
is the value returned by the query for
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS. The minimum value is (1, 1,
1).
Preferred native vector width size for built-in scalar types that can be put into vectors. The vector width is defined as the number of scalar elements that can be stored in the vector.
If the
Maximum configured clock frequency of the device in MHz.
Max number of simultaneous image objects that can be read by a kernel. The minimum value is 128 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.
Max number of simultaneous image objects that can be written to by a kernel. The minimum value is 8 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.
Max size of memory object allocation in bytes. The minimum value is max (1/4th of CL_DEVICE_GLOBAL_MEM_SIZE, 128*1024*1024)
Max width of 2D image in pixels. The minimum value is 8192 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.
Max height of 2D image in pixels. The minimum value is 8192 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.
Max width of 3D image in pixels. The minimum value is 2048 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.
Max height of 3D image in pixels. The minimum value is 2048 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.
Max depth of 3D image in pixels. The minimum value is 2048 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.
Is CL_TRUE if images are supported by the OpenCL device and CL_FALSE otherwise.
Max size in bytes of the arguments that can be passed to a kernel. The minimum value is 256.
Maximum number of samplers that can be used in a kernel. The minimum value is 16 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.
Describes the alignment in bits of the base address of any allocated memory object.
The smallest alignment in bytes which can be used for any data type.
Describes single precision floating-point capability of the device. This is a bit-field that describes one or more of the following values:
CL_FP_DENORM - denorms are supported
CL_FP_INF_NAN - INF and quiet NaNs are supported
CL_FP_ROUND_TO_NEAREST - round to nearest even rounding mode supported
CL_FP_ROUND_TO_ZERO - round to zero rounding mode supported
CL_FP_ROUND_TO_INF - round to +ve and -ve infinity rounding modes supported
CL_FP_FMA - IEEE754-2008 fused multiply-add is supported
The mandated minimum floating-point capability is CL_FP_ROUND_TO_NEAREST | CL_FP_INF_NAN.
Return type: cl_device_mem_cache_type
Type of global memory cache supported. Valid values are: CL_NONE, CL_READ_ONLY_CACHE, and CL_READ_WRITE_CACHE.
Size of global memory cache line in bytes.
Size of global memory cache in bytes.
Size of global device memory in bytes.
Max size in bytes of a constant buffer allocation. The minimum value is 64 KB.
Max number of arguments
declared with the
Type of local memory supported. This can be set to CL_LOCAL implying dedicated local memory storage such as SRAM, or CL_GLOBAL.
Size of local memory arena in bytes. The minimum value is 16 KB.
Describes the resolution of device timer. This is measured in nanoseconds.
Return type: cl_device_exec_capabilities
Describes the execution capabilities of the device. This is a bit-field that describes one or more of the following values:
CL_EXEC_KERNEL - The OpenCL device can execute OpenCL kernels.
CL_EXEC_NATIVE_KERNEL - The OpenCL device can execute native kernels.
The mandated minimum capability is CL_EXEC_KERNEL.
Describes the command-queue properties supported by the device. This is a bit-field that describes one or more of the following values:
'out_of_order_exec_mode_enable'
'profiling_enable'
These properties are described in the table for create_queue/3 . The mandated minimum capability is 'profiling_enable'.
Device name string.
Vendor name string.
OpenCL software driver version string
OpenCL profile string. Returns the profile name supported by the device (see note). The profile name returned can be one of the following strings:
FULL_PROFILE - if the device supports the OpenCL specification (functionality defined as part of the core specification and does not require any extensions to be supported).
EMBEDDED_PROFILE - if the device supports the OpenCL embedded profile.
OpenCL version string.
Returns a space separated list of extension names (the extension names themselves do not contain any spaces).
The platform associated with this device.
get_event_info(Event) -> any()
Returns all specific information about the event object.
get_event_info(Event, Info) -> any()
Returns specific information about the event object.
get_image_info(Mem) -> any()
get_image_info(Mem, Info) -> any()
get_kernel_info(Kernel) -> any()
Returns all information about the kernel object.
get_kernel_info(Kernel, Info) -> any()
Returns specific information about the kernel object.
get_kernel_workgroup_info(Kernel, Device) -> any()
Returns all information about the kernel object that may be specific to a device.
get_kernel_workgroup_info(Kernel, Device, Info) -> any()
Returns specific information about the kernel object that may be specific to a device.
get_mem_object_info(Mem::cl_mem()) -> {ok, term()} | {error, cl_error()}
Used to get all information that is common to all memory objects (buffer and image objects).
get_mem_object_info(Mem::cl_mem(), InfoType::cl_mem_info_key()) -> {ok, term()} | {error, cl_error()}
Used to get
get_platform_ids() -> {ok, [cl_platform_id()]} | {error, cl_error()}
Obtain the list of platforms available.
get_platform_info(Platform::cl_platform_id()) -> {ok, [cl_platform_info()]} | {error, cl_error()}
Get all information about the OpenCL platform.
See also: get_platform_info/2.
get_platform_info(Platform::cl_platform_id(), Info::cl_platform_info_key()) -> {ok, term()} | {error, cl_error()}
Get specific information about the OpenCL platform.
OpenCL profile string. Returns the profile name supported by the implementation. The profile name returned can be one of the following strings:
FULL_PROFILE - if the implementation supports the OpenCL specification (functionality defined as part of the core specification and does not require any extensions to be supported).
EMBEDDED_PROFILE - if the implementation supports the OpenCL embedded profile. The embedded profile is defined to be a subset for each version of OpenCL.get_program_build_info(Program, Device) -> any()
Returns all build information for each device in the program object.
get_program_build_info(Program, Device, Info) -> any()
Returns specific build information for each device in the program object.
get_program_info(Program) -> any()
Returns all information about the program object.
get_program_info(Program, Info) -> any()
Returns specific information about the program object.
get_queue_info(Queue) -> [queue_info_keys()]
Returns all queue info.
get_queue_info(Queue, Info) -> {ok, term()}
Return the specified queue info
get_sampler_info(Sampler::cl_sampler()) -> {ok, term()} | {error, cl_error()}
Returns all information about the sampler object.
See also: get_sampler_info/2.
get_sampler_info(Sampler::cl_sampler(), InfoType::cl_sampler_info_type()) -> {ok, term()} | {error, cl_error()}
Returns
get_supported_image_formats(Context, Flags, ImageType) -> any()
return a list of image formats [{Order,Type}]
image_info() -> any()
kernel_info() -> any()
kernel_workgroup_info() -> any()
mem_object_info() -> [cl_mem_info_keys()]
Returns a list of the possible mem info keys.
noop() -> ok | {error, cl_error()}
Run a no operation towards the NIF object. This call can be used to messure the call overhead to the NIF objeect.
nowait_enqueue_nd_range_kernel(Queue::cl_queue(), Kernel::cl_kernel(), Global::[non_neg_integer()], Local::[non_neg_integer()], WaitList::[cl_event()]) -> ok | {error, cl_error()}
nowait_enqueue_task(Queue::cl_queue(), Kernel::cl_kernel(), WaitList::[cl_event()]) -> ok | {error, cl_error()}
nowait_enqueue_write_buffer(Queue::cl_queue(), Buffer::cl_mem(), Offset::non_neg_integer(), Size::non_neg_integer(), Data::binary(), WaitList::[cl_event()]) -> ok | {error, cl_error()}
nowait_enqueue_write_image(Queue, Image, Origin, Region, RowPitch, SlicePitch, Data, WaitList) -> any()
platform_info() -> [cl_platform_info_keys()]
Returns a list of the possible platform info keys.
program_build_info() -> any()
program_info() -> any()
queue_info() -> [queue_info_keys()]
Returns the list of possible queue info items.
release_context(Context::cl_context()) -> ok | {error, cl_error()}
Decrement the context reference count.
After the context reference count becomes zero and all the objects attached to context (such as memory objects, command-queues) are released, the context is deleted.release_event(Event::cl_event()) -> ok | {error, cl_error()}
Decrements the event reference count.
Decrements the event reference count. The event object is deleted once the reference count becomes zero, the specific command identified by this event has completed (or terminated) and there are no commands in the command-queues of a context that require a wait for this event to complete.release_kernel(Context::cl_kernel()) -> ok | {error, cl_error()}
Decrements the kernel reference count.
release_mem_object(Mem::cl_mem()) -> ok | {error, cl_error()}
Decrements the memory object reference count.
After the memobj reference count becomes zero and commands queued for execution on a command-queue(s) that use memobj have finished, the memory object is deleted.release_program(Program::cl_program()) -> ok | {error, cl_error()}
Decrements the program reference count.
The program object is deleted after all kernel objects associated with program have been deleted and the program reference count becomes zero.release_queue(Queue::cl_queue()) -> ok | {error, cl_error()}
Decrements the command_queue reference count.
After the command_queue reference count becomes zero and all commands queued to command_queue have finished (e.g., kernel executions, memory object updates, etc.), the command-queue is deleted.release_sampler(Sampler::cl_sampler()) -> ok | {error, cl_error()}
Decrements the sampler reference count.
The sampler object is deleted after the reference count becomes zero and commands queued for execution on a command-queue(s) that use sampler have finished.retain_context(Context::cl_context()) -> ok | {error, cl_error()}
Increment the context reference count.
See also: create_context.
retain_event(Event::cl_event()) -> ok | {error, cl_error()}
Increments the event reference count. NOTE: The OpenCL commands that return an event perform an implicit retain.
retain_kernel(Context::cl_kernel()) -> ok | {error, cl_error()}
Increments the program kernel reference count.
retain_mem_object(Mem::cl_mem()) -> ok | {error, cl_error()}
Increments the memory object reference count.
retain_program(Program::cl_program()) -> ok | {error, cl_error()}
Increments the program reference count.
retain_queue(Queue::cl_queue()) -> ok | {error, cl_error()}
Increments the command_queue reference count.
create_queue/3 performs an implicit retain. This is very helpful for 3rd party libraries, which typically get a command-queue passed to them by the application. However, it is possible that the application may delete the command-queue without informing the library. Allowing functions to attach to (i.e. retain) and release a command-queue solves the problem of a command-queue being used by a library no longer being valid.retain_sampler(Sampler::cl_sampler()) -> ok | {error, cl_error()}
Increments the sampler reference count.
sampler_info() -> any()
set_kernel_arg(Kernel::cl_kernel(), Index::non_neg_integer(), Argument::cl_kernel_arg()) -> ok | {error, cl_error()}
Used to set the argument value for a specific argument of a kernel.
For now set_kernel_arg handles integer and floats
to set any other type use <<Foo:Bar/native...>>
use the macros defined in cl.hrl to get it right (except for padding)
A kernel object does not update the reference count for objects such as memory, sampler objects specified as argument values by set_kernel_arg/3, Users may not rely on a kernel object to retain objects specified as argument values to the kernel.
Implementations shall not allow cl_kernel objects to hold reference counts to cl_kernel arguments, because no mechanism is provided for the user to tell the kernel to release that ownership right. If the kernel holds ownership rights on kernel args, that would make it impossible for the user to tell with certainty when he may safely release user allocated resources associated with OpenCL objects such as the cl_mem backing store used with CL_MEM_USE_HOST_PTR.set_kernel_arg_size(Kernel::cl_kernel(), Index::non_neg_integer(), Size::non_neg_integer()) -> ok | {error, cl_error()}
clErlang special to set kernel arg with size only (local mem etc)
set_queue_property(Queue::cl_queue(), Properties::[cl_queue_property()], Enable::bool()) -> ok | {error, cl_error()}
Function is deprecated and have been removed.
start() -> ok | {error, term()}
Equivalent to start([]).
Start the OpenCL application
start(Args::[start_arg()]) -> ok | {error, term()}
Start the OpenCL application
stop() -> ok | {error, term()}
Equivalent to application:stop(cl).
Stop the OpenCL application
unload_compiler() -> ok | {error, cl_error()}
Allows the implementation to release the resources allocated by the OpenCL compiler.
This is a hint from the application and does not guarantee that the compiler will not be used in the future or that the compiler will actually be unloaded by the implementation. Calls to build_program/3 after unload_compiler/0 will reload the compiler, if necessary, to build the appropriate program executable.unload_platform_compiler(Platform::cl_platform_id()) -> ok | {error, cl_error()}
versions() -> [{Major::integer(), Minor::integer()}]
Run a no operation towards the NIF object. This call can be used to messure the call overhead to the NIF objeect.
wait(Event::cl_event) -> {ok, completed} | {ok, Binary} | {error, cl_error()}
wait(Event::cl_event, Timeout::timeout()) -> {ok, completed} | {ok, Binary} | {error, cl_error()} | {error, timeout}
Waits for commands identified by event objects to complete.
Waits for commands identified by event objects in event_list to complete. A command is considered complete if its execution status is CL_COMPLETE or a negative value.wait_for_event(Event::cl_event) -> {ok, completed} | {ok, Binary} | {error, cl_error()}
Equivalent to wait(Event, infinity).
Generated by EDoc, Oct 8 2019, 16:14:18.