Dim3 grid calculation8/17/2023 Generalizing from this example, we can write col = (blockIdx.x * blockDim.x) + threadIdx.x. col: We've consumed 4*2=8 threads since we're in col 2 plus an extra 2 since we're at col 2 inside the block.Generalizing from this example, we can write row = (blockIdx.y * blockDim.y) + threadIdx.y. row: We've consumed 4 threads since we're in row 1 plus an extra 1 since we're at row 1 inside the block.The pink thread is located in row=1, col=2. The row of this tread can be calculated using the y dimension and the column can be calculate using the x dimension. First, let's examine how we can compute the global coordinates of the pink thread in the grid below. Now that we've seen how to map the 2D grids and threads to the image, let's write a kernel that scales each value in an image by 2. In this example, we assume for simplicity that the dimensions of the blocks are fixed at 16x16.ĭim3 dimBlock(ceil(W/ 16.0), ceil(H/ 16.0), 1) Note that the number of rows corresponds to the height (the y direction) and that the number of columns corresponds to the width (the x direction). Here's some code to launch a kernel to process an image of height H and width W. However, there are only 4,712 pixels in the image, so we would have an if statement to disable the extra threads from doing work. This results in 5*4=20 blocks for a total of 20*16*16=5,120 threads. To process such an image, we would need ceil(76 / 16) = 5 blocks in the x dimension and ceil(62 / 16) = 4 blocks in the y dimension. Assume that we decided to use a 16x16 block, with 16 threads in the x direction and 16 threads in the y direction. For example, consider an image of size 76圆2. It is often convenient to use a 2D grid that consists of 2D blocks to process the pixels in a picture. For example, grayscale images are a 2D array of pixels (H, W) while RGB images are a 3D array of pixels (H, W, C). The choice of 1D, 2D, or 3D thread organizations is usually based on the nature of the data. blockDim: maximum of 1,024 threads, i.e the product of all dimensions cannot exceed 1,024.gridDim: each dimension can vary between 1 and 65,536 (this number will vary with newer GPUs).Obviously, hardware constraints impose limits on the dimensions of the grid and the blocks. Int blocksPerGrid = ceil(n / ( float)threadsPerBlock) We can do this in 2 ways:ĭim3 gridDim(ceil(n / ( float)threadsPerBlock), 1, 1) gridDim) will vary with the size of the input vectors so that the grid will have enough threads to cover all vector elements. For example, suppose we would like to launch our vector addition kernel vecAdd with a set number of threads per block equal to 256. However, for convenience, CUDA C lets us use plain variables or direct mathematical expressions to specify ECPs for 1D grids. Each such parameter is of type dim3, which is a C struct with three unsigned integer fields: x, y, and z.įor 1D and 2D grids and blocks, the unused dimensions should be set to 1 for clarity. The second ECP specifies the dimensions of each block in number of threads. The first ECP specifies the dimensions of the grid in number of blocks. At kernel launch, we specify 2 parameters enclosed within triple signs >. We can choose to use fewer dimensions by setting unused dimensions to 1. In general, a grid is a 3D array of blocks, and each block is a 3D array of threads. the number of blocks in a grid) and the block size blockDim (i.e. The execution configuration parameters (ECPs) in a kernel launch specify the grid size gridDim (i.e. Thus, inside this two-level hierarchy, a thread has a tuple of unique coordinates (blockIdx, threadIdx). Additionally, each thread in a block has a unique index, accessed via threadIdx. Think of the kernel function as specifying the C statements that are executed by each individual thread at runtime.Īll threads in a block share the same block index, accessed via blockIdx. CUDA Thread OrganizationĪll CUDA threads in a grid execute the same kernel function and they rely on special variables to distinguish themselves from each other and to identify the appropriate portion of the data to process. We'll be delving into the details of the organization, resource assignment, synchronization, and scheduling of threads in a grid.
0 Comments
Leave a Reply.AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |