工程GIT地址:https://gitee.com/yaksue/yaksue-graphics
目前工程所能达成的效果是:以一固定色填充窗口(clear命令)。为此,所需要的操作大致有:
创建Device等只需要在开始创建一次的“首要”对象。创建交换链以及相关对象。将交换链的buffer绑定到管线上(RenderTarget)创建命令队列相关的对象(仅D3D12,Vulkan这种先进图形API)为了“同步”而创建的对象(仅D3D12,Vulkan这种先进图形API)在当前工程里,各种图形API做这些事情所花费的代码差别很大。其中OpenGL的代码量太少,所以就不讨论了。而Vulkan的代码量最多,因此我想以Vulkan为中心去梳理当前初始化阶段的代码(顺便为代码做更多的注释)。 在讨论的过程中我想尽量能比较其他图形API里扮演相同角色的东西,但没比较到并不代表其他图形API中不存在相同角色的东西,也有可能只是工程里其他图形API的代码只是“缺省默认”了一些内容。
正如上面看到的,所作的操作可以看作是在创建各种对象,因此我将通过各种Vulkan对象去梳理代码内容。
(参考内容主要来源于Vulkan官方教程,规范,《DX12龙书》。我将用斜体字标注出参考文本,以显示其权威性)
All of the Vulkan functions, enumerations and structs are defined in the vulkan.h header, which is included in the Vulkan SDK developed by LunarG. 所有的Vulkan函数,枚举,和结构体都定义在vulkan.h头文件,它被包含在LunarG开发的VulkanSDK中。
Functions have a lower case vk prefix, types like enumerations and structs have a Vk prefix and enumeration values have a VK_ prefix. 函数都是以小写vk为前缀,枚举和结构体等类型以Vk为前缀,而枚举值以VK_为前缀。
We need two more components to actually render to a window: a window surface (VkSurfaceKHR) and a swap chain (VkSwapchainKHR). Note the KHR postfix, which means that these objects are part of a Vulkan extension. The Vulkan API itself is completely platform agnostic, which is why we need to use the standardized WSI (Window System Interface) extension to interact with the window manager. KHR后缀代表着内容属于Vulkan的“扩展”。例如,窗口surface(VkSurfaceKHR)和交换链(VkSwapchainKHR),由于 Vulkan API 本身是绝对的平台无关的,所以我们需要窗口的扩展来和窗口进行交互。
The API heavily uses structs to provide parameters to functions. For example, object creation generally follows this pattern: API里大量使用结构体作为函数的参数,例如,一个对象的创建:
VkXXXCreateInfo createInfo{}; createInfo.sType = VK_STRUCTURE_TYPE_XXX_CREATE_INFO; createInfo.pNext = nullptr; createInfo.foo = ...; createInfo.bar = ...; VkXXX object; if (vkCreateXXX(&createInfo, nullptr, &object) != VK_SUCCESS) { std::cerr << "failed to create object" << std::endl; return false; }Many structures in Vulkan require you to explicitly specify the type of structure in the sType member. The pNext member can point to an extension structure and will always be nullptr in this tutorial. Functions that create or destroy an object will have a VkAllocationCallbacks parameter that allows you to use a custom allocator for driver memory, which will also be left nullptr in this tutorial. 很多Vulkan中的结构体都要求显式地在sType中指定结构体的类型。pNext可以指向一个扩展性的结构体,在官方教程中将一直是nullptr。对于那些创建或销毁对象的函数,会有一个VkAllocationCallbacks参数,可以允许你指定一个自定义的分配器(for driver memory?),在官方教程中也会一直保持nullptr。
Almost all functions return a VkResult that is either VK_SUCCESS or an error code. The specification describes which error codes each function can return and what they mean. 几乎所有的函数都会返回一个VkResult,要么是VK_SUCCESS要么是一个错误码。规范里描述了每个函数可以返回什么样的错误码而且他们意味着什么。
在D3D11和D3D12中,有一个需要“首先被创建出的对象”,名字叫做:Device。在Vulkan中也有一个VkDevice,但是其含义不同,并且它不是第一个被创建的对象。
The device represents a display adapter. Usually, the display adapter is a physical piece of 3D hardware (e.g., graphics card); however, a system can also have a software display adapter that emulates 3D hardware functionality (e.g., the WARP adapter). The Direct3D 12 device is used to check feature support, and create all other Direct3D interface objects like resources, views, and command lists. Device代表了一个“display adapter”(通常,它对应一块3D硬件例如显卡;但是系统也可以有软件的实现)。Device用来检查特性的支持,创建其他D3D的接口对象,例如resources,views,和command lists。
The very first thing you need to do is initialize the Vulkan library by creating an instance. The instance is the connection between your application and the Vulkan library and creating it involves specifying some details about your application to the driver. 首要的事情就是通过创建一个instance来初始化Vulkan的库。instance连接了你的应用和Vulkan库,创建它包括向驱动指定一些关于你应用的一些细节。
There is no global state in Vulkan and all per-application state is stored in a VkInstance object. Creating a VkInstance object initializes the Vulkan library and allows the application to pass information about itself to the implementation. Vulkan中没有“全局状态”,所有的“per-app”状态都存在VkInstance对象中,创建一个VkInstance初始化了Vulkan库并允许应用可以传递自己的信息给Vulkan。
这里主要是使用vkEnumerateInstanceLayerProperties罗列出所有支持的layer,看其中是否包含了所有我们需要的validationLayers。 例如这里显式支持了13个layer,我们想要的layer是
//想要使用的validationLayers: const std::vector<const char*> validationLayers = { "VK_LAYER_KHRONOS_validation" };是被包含的,所以检查通过。
所需要的扩展有两部分:
GLFW需要的扩展,是窗口相关的。ValidationLayers需要的扩展,其名字被硬编码到VK_EXT_DEBUG_REPORT_EXTENSION_NAME这个宏里了。CreateInfo中基本上就是指定ValidationLayers和扩展了:
//先检查想要使用的validationLayers是否支持 if (enableValidationLayers && !checkValidationLayerSupport()) throw std::runtime_error("validation layers requested, but not available!"); //---------------------------------------------------------------------------------------------------- //首要的事情就是通过创建一个instance来初始化Vulkan的库。 //instance连接了你的应用和Vulkan库,创建它包括向驱动指定一些关于你应用的一些细节。 //---------------------------------------------------------------------------------------------------- //【Instance CreateInfo】 VkInstanceCreateInfo createInfo = {}; createInfo.sType = VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO; //是否启用ValidationLayers if (enableValidationLayers) { createInfo.enabledLayerCount = validationLayers.size(); createInfo.ppEnabledLayerNames = validationLayers.data(); } else createInfo.enabledLayerCount = 0; //指定所需要的扩展 auto extensions = getRequiredExtensions(); createInfo.enabledExtensionCount = extensions.size(); createInfo.ppEnabledExtensionNames = extensions.data(); //创建Instance ThrowIfFailed(vkCreateInstance(&createInfo, nullptr, Instance.replace()));Since Vulkan is a platform agnostic API, it can not interface directly with the window system on its own. To establish the connection between Vulkan and the window system to present results to the screen, we need to use the WSI (Window System Integration) extensions. In this chapter we’ll discuss the first one, which is VK_KHR_surface. It exposes a VkSurfaceKHR object that represents an abstract type of surface to present rendered images to. The surface in our program will be backed by the window that we’ve already opened with GLFW. 由于Vulkan被设计为一个平台无关的API,它自己并不能直接和窗口系统进行交互。为了能让Vulkan和窗口系统连接从而将渲染的结果呈现到屏幕上,我们需要使用WSI(Window System Integration窗口系统集成)扩展。扩展的名字是VK_KHR_surface,它暴露了一个VkSurfaceKHR类型的对象,他是一层抽象,代表了一个可以呈现渲染结果的“表面”。这个“表面”在我们的程序中,是由GLFW的窗口来提供的。
glfwCreateWindowSurface(Instance, window, nullptr, Surface.replace())一个VkPhysicalDevice对应了一块显卡,可以通过vkEnumeratePhysicalDevices罗列出当前机器上的所有显卡。随后,isDeviceSuitable函数将负责判断一个显卡是否是合适的:
//局部函数,查询一个设备是否合适 auto isDeviceSuitable = [this](VkPhysicalDevice Device) { //检查QueueFamilies是否都能找到需要的 if (!findQueueFamilies(Device).isComplete()) return false; //检查扩展是否都支持 if (checkDeviceExtensionSupport(Device) == false) return false; //检查交换链是否合格 SwapChainSupportDetails detail = querySwapChainSupport(Device); bool swapChainAdequate = !detail.formats.empty() && !detail.presentModes.empty(); if (swapChainAdequate == false) return false; //所有检查都通过 return true; };它主要检查三项:
Almost every operation in Vulkan, anything from drawing to uploading textures, requires commands to be submitted to a queue. There are different types of queues that originate from different queue families and each family of queues allows only a subset of commands. For example, there could be a queue family that only allows processing of compute commands or one that only allows memory transfer related commands. We need to check which queue families are supported by the device and which one of these supports the commands that we want to use. For that purpose we’ll add a new function findQueueFamilies that looks for all the queue families we need. 几乎所有的Vulkan操作,包含绘制与加载贴图,都需要一个被提交到queue中的命令。有源自不同“Queue Family”的不同种类的queue,而每个family都只允许一个命令的子集。例如,可能会有一个family只允许运行计算类型的命令。 为此我们需要检查:这个显卡支持哪些“Queue Family”,而其中哪些是包含我们需要的命令的。
findQueueFamilies函数检测一个显卡支持哪些QueueFamilies,并得到其中我们需要的QueueFamilies的序号。 首先,QueueFamilyIndices是自定义的结构体,它是函数返回的结果,表明是否找到想要的QueueFamily,并且其中序号是什么:
struct QueueFamilyIndices { int graphicsFamily = -1; //支持图形命令的QueueFamily int presentFamily = -1; //支持呈现命令的QueueFamily bool isComplete() //如果都不是-1,则表明都找到了 { return (graphicsFamily >= 0) && (presentFamily >= 0); } };接下来进入findQueueFamilies内部,首先它先检查这个显卡支持多少个“Queue Family”: 随后遍历每一个family,看这个family是否支持想要的图形/呈现操作:
for (int i = 0; i < queueFamilyCount; i++) { //看是否支持图形命令: if ((queueFamilies[i].queueCount > 0) && (queueFamilies[i].queueFlags & VK_QUEUE_GRAPHICS_BIT)) result.graphicsFamily = i; //看是否支持呈现命令: VkBool32 presentSupport = false; vkGetPhysicalDeviceSurfaceSupportKHR(Device, i, Surface, &presentSupport); if ((queueFamilies[i].queueCount > 0) && presentSupport) result.presentFamily = i; //如果都找到了,则离开循环 if (result.isComplete()) break; } 对于【图形】命令,它检查了这个family是否有VK_QUEUE_GRAPHICS_BIT这个flag。对于【呈现】命令,它通过vkGetPhysicalDeviceSurfaceSupportKHR来查询。想要显卡支持的扩展是:
const std::vector<const char*> deviceExtensions = { VK_KHR_SWAPCHAIN_EXTENSION_NAME };VK_KHR_SWAPCHAIN_EXTENSION_NAME是一个在vulkan_core.h定义的宏,当前的值是"VK_KHR_swapchain"。
检测显卡是否支持想要的扩展的函数: 首先使用vkEnumerateDeviceExtensionProperties获得所有支持的扩展,看到总共有94个,然后看想要支持的扩展是不是都在其中可以找到。
Just checking if a swap chain is available is not sufficient, because it may not actually be compatible with our window surface. Creating a swap chain also involves a lot more settings than instance and device creation, so we need to query for some more details before we’re able to proceed. There are basically three kinds of properties we need to check: 只是检查交换链是否存在并不足够,因为他可能和当前的“窗口表面”并不兼容。创建一个交换链包含了很多设置,所以我们需要查询更多的细节:
Basic surface capabilities(min/max number of images in swapchain,min/max width and height of images)。“表面”基本的性能(最小/最大的交换链中图像的数量,最小/最大的图形的尺寸)Surface formats (pixel format, color space)。“表面”的格式(像素格式,颜色空间)Available presentation modes。提供的呈现模式。定义一个结构体表示:
struct SwapChainSupportDetails { VkSurfaceCapabilitiesKHR capabilities; //“表面”基本的性能(最小/最大的交换链中图像的数量,最小/最大的图形的尺寸) std::vector<VkSurfaceFormatKHR> formats; //“表面”的格式(像素格式,颜色空间) std::vector<VkPresentModeKHR> presentModes; //提供的呈现模式。 };querySwapChainSupport将负责查询这些信息:
vkGetPhysicalDeviceSurfaceCapabilitiesKHR负责获得“表面”基本的性能: vkGetPhysicalDeviceSurfaceFormatsKHR负责获得“表面”的格式: vkGetPhysicalDeviceSurfacePresentModesKHR负责获得呈现模式: 这三个接口都以一个VkPhysicalDevice和VkSurfaceKHR作为参数。
“逻辑Device”
Device objects represent logical connections to physical devices. Each device exposes a number of queue families each having one or more queues. All queues in a queue family support the same operations. A Vulkan application will first query for all physical devices in a system. Each physical device can then be queried for its capabilities, including its queue and queue family properties. Once an acceptable physical device is identified, an application will create a corresponding logical device. An application must create a separate logical device for each physical device it will use. The created logical device is then the primary interface to the physical device. VkDevice连接了物理设备(显卡)。每一个VkDevice都暴露了一定数量的“queue families”而它们每一个都有一个或多个queue。一个“queue families”中的所有queue都支持相同的操作。 一个Vulkan应用应该首先查询所有的显卡,每一个显卡都会被查询他的性能,包括其支持的“queue families”和queue这些属性。一旦一个可以接受的显卡被找到,应用就需要创建对应的“逻辑Device”。应用必须为每一个需要的显卡都创建一个对应的“逻辑Device”。创建出的“逻辑Device”将会作为显卡的主要接口。
这里创建的过程,基本上就是使用之前查询到的信息:
//找到显卡的 图形,呈现 命令所在的Family: int graphicsFamilyIndex = findQueueFamilies(physicalDevice).graphicsFamily; int presentFamilyIndex = findQueueFamilies(physicalDevice).presentFamily; //【DeviceQueue CreateInfo】 std::vector<VkDeviceQueueCreateInfo> queueCreateInfos; //指定要创建的队列: std::set<int> uniqueQueueFamilies = { graphicsFamilyIndex, presentFamilyIndex }; float queuePriority = 1.0f; //优先级:Vulkan lets you assign priorities to queues to influence the scheduling of command buffer execution using floating point numbers between 0.0 and 1.0. for (int queueFamily : uniqueQueueFamilies) { VkDeviceQueueCreateInfo queueCreateInfo = {}; queueCreateInfo.sType = VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO; queueCreateInfo.queueFamilyIndex = queueFamily; queueCreateInfo.queueCount = 1; queueCreateInfo.pQueuePriorities = &queuePriority; queueCreateInfos.push_back(queueCreateInfo); } //显卡特性(当前为空) VkPhysicalDeviceFeatures deviceFeatures = {}; //【Device CreateInfo】 VkDeviceCreateInfo createInfo = {}; createInfo.sType = VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO; //队列: createInfo.pQueueCreateInfos = queueCreateInfos.data(); createInfo.queueCreateInfoCount = (uint32_t)uniqueQueueFamilies.size(); //显卡特性: createInfo.pEnabledFeatures = &deviceFeatures; //硬件扩展: createInfo.enabledExtensionCount = deviceExtensions.size(); createInfo.ppEnabledExtensionNames = deviceExtensions.data(); //ValidationLayers: if (enableValidationLayers) { createInfo.enabledLayerCount = validationLayers.size(); createInfo.ppEnabledLayerNames = validationLayers.data(); } else createInfo.enabledLayerCount = 0; //创建 ThrowIfFailed(vkCreateDevice(physicalDevice, &createInfo, nullptr, Device.replace()));Creating a logical device also creates the queues associated with that device. 创建一个逻辑“Device”同时也会创建与之联系的queue。
不过我们需要一个句柄来存放这些队列:
//创建逻辑Device的同时也创建出了与之联系的队列,下面获得他们的句柄: vkGetDeviceQueue(Device, graphicsFamilyIndex, 0, &GraphicsQueue); vkGetDeviceQueue(Device, presentFamilyIndex, 0, &PresentQueue);“交换链”是一个共有的概念,但是当前工程里Vulkan在创建交换链时花费的代码相对更多。
在D3D12中,交换链的信息被指定在DXGI_SWAP_CHAIN_DESC1中:
DXGI_SWAP_CHAIN_DESC1 swapChainDesc = {}; swapChainDesc.BufferCount = SwapChainBufferCount; swapChainDesc.Width = WindowWidth; swapChainDesc.Height = WindowHeight; swapChainDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM; swapChainDesc.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT; swapChainDesc.SwapEffect = DXGI_SWAP_EFFECT_FLIP_DISCARD; swapChainDesc.SampleDesc.Count = 1;在【1.3.3】中,得到了一个SwapChainSupportDetails,其中存放了对于当前显卡和Surface所支持的细节设置。对于这些每一个设置,都有一个心中最“理想”的值,因此需要一些逻辑去选择出最想要的设置。
Each VkSurfaceFormatKHR entry contains a format and a colorSpace member. The format member specifies the color channels and types. For example, VK_FORMAT_B8G8R8A8_SRGB means that we store the B, G, R and alpha channels in that order with an 8 bit unsigned integer for a total of 32 bits per pixel. The colorSpace member indicates if the SRGB color space is supported or not using the VK_COLOR_SPACE_SRGB_NONLINEAR_KHR flag. For the color space we’ll use SRGB if it is available, because it results in more accurate perceived colors. 每个VkSurfaceFormatKHR都包含一个“格式”和“颜色空间”成员。“格式”制定了颜色通道和种类。例如:VK_FORMAT_B8G8R8A8_SRGB指定了以 B,G,R和alpha通道这样的顺序,并且每个通道8位的格式来存储一个像素。而“颜色空间”通过VK_COLOR_SPACE_SRGB_NONLINEAR_KHR这个flag来指明是否支持SRGB。我们将尽量使用SRGB,因为他会得到一个更加精确的颜色结果。
auto chooseSwapSurfaceFormat = [](const std::vector<VkSurfaceFormatKHR>& availableFormats)->VkSurfaceFormatKHR { if (availableFormats.size() == 1 && availableFormats[0].format == VK_FORMAT_UNDEFINED) return { VK_FORMAT_B8G8R8A8_UNORM,VK_COLOR_SPACE_SRGB_NONLINEAR_KHR }; for (const auto availableFormat : availableFormats) { //尽量使用SRGB,因为他会得到一个更精确的颜色结果 if (availableFormat.format == VK_FORMAT_B8G8R8A8_UNORM && availableFormat.colorSpace == VK_COLOR_SPACE_SRGB_NONLINEAR_KHR) return availableFormat; } return availableFormats[0]; };有四种呈现模式:
VK_PRESENT_MODE_IMMEDIATE_KHR: Images submitted by your application are transferred to the screen right away, which may result in tearing.VK_PRESENT_MODE_FIFO_KHR: The swap chain is a queue where the display takes an image from the front of the queue when the display is refreshed and the program inserts rendered images at the back of the queue. If the queue is full then the program has to wait. This is most similar to vertical sync as found in modern games. The moment that the display is refreshed is known as “vertical blank”.VK_PRESENT_MODE_FIFO_RELAXED_KHR: This mode only differs from the previous one if the application is late and the queue was empty at the last vertical blank. Instead of waiting for the next vertical blank, the image is transferred right away when it finally arrives. This may result in visible tearing.VK_PRESENT_MODE_MAILBOX_KHR: This is another variation of the second mode. Instead of blocking the application when the queue is full, the images that are already queued are simply replaced with the newer ones. This mode can be used to implement triple buffering, which allows you to avoid tearing with significantly less latency issues than standard vertical sync that uses double buffering.其中,VK_PRESENT_MODE_FIFO_KHR是被保证绝对支持的。不过尽量还是想使用VK_PRESENT_MODE_MAILBOX_KHR:
auto chooseSwapPresentMode = [](const std::vector<VkPresentModeKHR> availablePresentModes)->VkPresentModeKHR { for (const auto& availablePresentMode : availablePresentModes) if (availablePresentMode == VK_PRESENT_MODE_MAILBOX_KHR) //尽量想使用VK_PRESENT_MODE_MAILBOX_KHR return availablePresentMode; //VK_PRESENT_MODE_FIFO_KHR是绝对保证支持的 return VK_PRESENT_MODE_FIFO_KHR; };Vulkan tells us to match the resolution of the window by setting the width and height in the currentExtent member. However, some window managers do allow us to differ here and this is indicated by setting the width and height in currentExtent to a special value: the maximum value of uint32_t. Vulkan将对currentExtent设置长宽,来告诉我们去匹配窗口的分辨率。不过,一些窗口系统确实会允许我们在这里设置一个不同的长宽,这时,它会将currentExtent设置为一个特殊值:uint32_t的最大值。
auto chooseSwapExtent = [=](const VkSurfaceCapabilitiesKHR& capabilities)->VkExtent2D { //如果是uint32_t的最大值,表示这里可以填和窗口分辨率不同的值: if (capabilities.currentExtent.width == std::numeric_limits<uint32_t>::max()) { //获得窗口的尺寸: int width, height; glfwGetWindowSize(window, &width, &height); //夹定到最大值与最小值之间 VkExtent2D actualExtent = { width, height }; actualExtent.width = std::max(capabilities.minImageExtent.width, std::min(capabilities.maxImageExtent.width, actualExtent.width)); actualExtent.height = std::max(capabilities.minImageExtent.height, std::min(capabilities.maxImageExtent.height, actualExtent.height)); return actualExtent; } else //这里表示Vulkan想要和窗口的尺寸匹配,所以直接返回尺寸 return capabilities.currentExtent; };调试可知,这里Vulkan还是想要匹配窗口分辨率的
其中需要注意的是关于队列的处理。 We need to specify how to handle swap chain images that will be used across multiple queue families. That will be the case in our application if the graphics queue family is different from the presentation queue. We’ll be drawing on the images in the swap chain from the graphics queue and then submitting them on the presentation queue. There are two ways to handle images that are accessed from multiple queues: 我们需要指定如何去掌控在横跨在队列之间的交换链上的image。如果图形命令所在的queue family和呈现命令所在的queue family不一样,将需要处理:我们需要在图形的队列上去绘制,然后在呈现的队列上提交。Vulkan里有两种用于处理在多个队列之间需要共同访问的image的模式:
VK_SHARING_MODE_EXCLUSIVE: An image is owned by one queue family at a time and ownership must be explicitly transfered before using it in another queue family. This option offers the best performance.VK_SHARING_MODE_CONCURRENT: Imagescanbeusedacrossmultiplequeue families without explicit ownership transfers.If the queue families differ, then we’ll be using the concurrent mode in this tutorial to avoid having to do the ownership chapters,because these involve some concepts that are better explained at a later time. Concurrent mode requires you to specify in advance between which queue families ownership will be shared using the queueFamilyIndexCount and pQueueFamilyIndices parameters. If the graphics queue family and presentation queue family are the same, which will be the case on most hardware, then we should stick to exclusive mode, because concurrent mode requires you to specify at least two distinct queue families. 如果图形和呈现的family不同,则我们需要使用concurrent模式来避免ownership chapters(?),因为这会需要后边章节的一些概念。concurrent模式要求预先使用queueFamilyIndexCount和pQueueFamilyIndices参数指定在哪个队列系列所有权之间共享。而如果图形和呈现的family相同(也是大多数硬件的情况),我们就可以使用exclusive模式了,因为concurrent模式要求你指定至少两个不同的queue families。
调试可以看到,当前我的硬件属于“大多数情况”。
Images represent multidimensional - up to 3 - arrays of data which can be used for various purposes (e.g. attachments, textures), by binding them to a graphics or compute pipeline via descriptor sets, or by directly specifying them as parameters to certain commands. Image代表了最多3维的数组数据,有广泛用途,例如 attachments,textures。他们可以通过descriptor sets绑定到图形/计算管线上进行使用,或者直接被指定为某个命令的参数。 Image将被放在VkImage句柄中。可以通过vkCreateImage创建。
这里,获得交换链的Image,是通过vkGetSwapchainImagesKHR接口。
//获得交换链对应的Image: vkGetSwapchainImagesKHR(Device, SwapChain, &imageCount, nullptr); swapChainImages.resize(imageCount); vkGetSwapchainImagesKHR(Device, SwapChain, &imageCount, swapChainImages.data());Image objects are not directly accessed by pipeline shaders for reading or writing image data. Instead, image views representing contiguous ranges of the image subresources and containing additional metadata are used for that purpose. Views must be created on images of compatible types, and must represent a valid subset of image subresources. Image对象并不能直接被管线上的shader读或写数据。取而代之的是,image views 代表了Image中一块连续的子资源,并且包含了附加的元数据。image views 必须 在Image上或者兼容的类型上创建,并且必须代表了一个有效的Image的资源的子集。
这点,D3D12很相似,在《DX12龙书》的【4.1.6 Resources and Descriptors】中,介绍了“显卡资源”和Descriptor。Descriptor和View有相似的概念。
这里,工程里定义了一个函数createImageView来为一个Image创建ImageView。
void VulkanInterface::createImageView(VkImage image, VkFormat format, VkImageAspectFlags aspectFlags, VDeleter<VkImageView>& imageView) { VkImageViewCreateInfo viewInfo = {}; viewInfo.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO; viewInfo.image = image; viewInfo.viewType = VK_IMAGE_VIEW_TYPE_2D; viewInfo.format = format; viewInfo.subresourceRange.aspectMask = aspectFlags; viewInfo.subresourceRange.baseMipLevel = 0; viewInfo.subresourceRange.levelCount = 1; viewInfo.subresourceRange.baseArrayLayer = 0; viewInfo.subresourceRange.layerCount = 1; ThrowIfFailed(vkCreateImageView(Device, &viewInfo, nullptr, imageView.replace())); }关于VkImageViewCreateInfo,详见规范中的VkImageViewCreateInfo介绍
随后,为交换链的Image创建ImageView:
swapChainImageViews.resize(swapChainImages.size(), VDeleter<VkImageView>{Device, vkDestroyImageView}); for (uint32_t i = 0; i < swapChainImages.size(); i++) createImageView(swapChainImages[i], swapChainImageFormat, VK_IMAGE_ASPECT_COLOR_BIT, swapChainImageViews[i]);对于D3D11和D3D12,是直接将交换链的buffer指定为RenderTarget的。而Vulkan中并没有RenderTarget这个名词,将用其他概念来完成类似的操作。
对于D3D11,是从交换链哪里得到buffer,随后用它创建一个ID3D11RenderTargetView,然后将其设置为输出合并阶段(Output-Merger Stage)的RenderTarget。
//从SwapChain那里得到BackBuffer ID3D11Texture2D* pBackBuffer = NULL; hr = SwapChain->GetBuffer(0, __uuidof(ID3D11Texture2D), (LPVOID*)&pBackBuffer); if (FAILED(hr)) return false; //创建一个 render target view hr = Device->CreateRenderTargetView(pBackBuffer, NULL, &RenderTargetView); pBackBuffer->Release(); if (FAILED(hr)) return false; //输出合并阶段(Output-Merger Stage)设置RenderTarget ImmediateContext->OMSetRenderTargets(1, &RenderTargetView, NULL);和D3D11类似,使用交换链的GetBuffer方法获得buffer,然后使用CreateRenderTargetView创建一个RenderTarget(ID3D12Resource)。不同的是,D3D12中对于显卡资源有一个Descriptor的概念,它相当于一个“View”,而之前的“RenderTargetView”现在就存在一个RTVHeap(DescriptorHeap)中。 随后,在录制命令的时候(比如Clear)用到:
void D3D12Interface::CmdClear(float r, float g, float b, float a) { CD3DX12_CPU_DESCRIPTOR_HANDLE rtvHandle(RTVHeap->GetCPUDescriptorHandleForHeapStart(), CurrentBackBufferIndex, RTVDescriptorSize); // Record commands. const float clearColor[] = { r, g, b, a }; CommandList->ClearRenderTargetView(rtvHandle, clearColor, 0, nullptr); }A render pass represents a collection of attachments, subpasses, and dependencies between the subpasses, and describes how the attachments are used over the course of the subpasses. The use of a render pass in a command buffer is a render pass instance. 一个render pass是一些数据的集合,包含了:attachments,subpasses,subpasses之间的依赖,并描述在subpasses中如何去使用attachments。使用render pass的方式是将一个实例设置到 command buffer 中。
An attachment description describes the properties of an attachment including its format, sample count, and how its contents are treated at the beginning and end of each render pass instance. 一个 VkAttachmentDescription 描述了一个attachment的属性,包括:他的格式,sample数目,在每个render pass实例的开始与结束时如何对待它的内容。
当前“颜色”是唯一一个 attachment,之后的“深度”会成为另一个 attachment。
//颜色attachment: VkAttachmentDescription colorAttachment = {}; colorAttachment.format = swapChainImageFormat; //格式需要和交换链一致 colorAttachment.samples = VK_SAMPLE_COUNT_1_BIT; //不多采样,所以是1 colorAttachment.loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR; //渲染前如何对待数据 colorAttachment.storeOp = VK_ATTACHMENT_STORE_OP_STORE; //渲染后如何对待数据 colorAttachment.stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE; //stencil数据(当前不需要关心) colorAttachment.stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE; //stencil数据(当前不需要关心) colorAttachment.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED; //渲染前Image的布局 colorAttachment.finalLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR; //渲染后Image会转换的布局 //颜色attachment的引用 VkAttachmentReference colorAttachmentRef = {}; colorAttachmentRef.attachment = 0; //颜色attachment的索引 colorAttachmentRef.layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL; //Vulkan会在subpass开始时自动将Image转换为这个布局A single render pass can consist of multiple subpasses. Subpasses are subsequent rendering operations that depend on the contents of framebuffers in previous passes, for example a sequence of post-processing effects that are applied one after another. If you group these rendering operations into one render pass, then Vulkan is able to reorder the operations and conserve memory bandwidth for possibly better performance. 一个 render pass 是由多个 subpass 所组成的。subpass 是一个渲染操作序列的一个子集,操作将基于上一个pass所渲染到framebuffer上的内容,例如一个接一个的后处理效果序列。如果你将这些渲染操作都组合到一个render pass 中,则Vulkan有能力对这些操作进行重新排序,为了能节省内存带宽以获得更好的性能。
//描述一个subpass: VkSubpassDescription subpass = {}; subpass.pipelineBindPoint = VK_PIPELINE_BIND_POINT_GRAPHICS; //指明这是一个图形subpass(相对,可以有计算subpass) //指定这个subpass的颜色attachment: subpass.colorAttachmentCount = 1; subpass.pColorAttachments = &colorAttachmentRef;Remember that the subpasses in a render pass automatically take care of image layout transitions. These transitions are controlled by subpass dependencies, which specify memory and execution dependencies between subpasses. We have only a single subpass right now, but the operations right before and right after this subpass also count as implicit “subpasses”. 正如之前所说,render pass 中的 subpass 会自动地进行 Image 的布局的转换。这些转换是由subpass的依赖所控制的,他会指定subpass之间内存和执行的依赖关系。虽然,我们现在只有一个subpass,但是这个subpass之前和之后的操作也隐式地算作一个“subpass”。
//subpass之间的依赖关系 VkSubpassDependency dependency = {}; dependency.srcSubpass = VK_SUBPASS_EXTERNAL; //依赖关系中的第一个subpass,VK_SUBPASS_EXTERNAL 表示在我的subpass之前的一个隐式的subpass dependency.dstSubpass = 0; //依赖关系中的第二个subpass,0表示我的subpass //The next two fields specify the operations to wait on and the stages in which these operations occur. //We need to wait for the swap chain to finish reading from the image before we can access it. This can be accomplished by waiting on the color attachment output stage itself. dependency.srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT; dependency.srcAccessMask = 0; //The operations that should wait on this are in the color attachment stage and involve the writing of the color attachment. //These settings will prevent the transition from happening until it's actually necessary (and allowed): when we want to start writing colors to it. dependency.dstStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT; dependency.dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_READ_BIT | VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;在CreateInfo中将上面的内容都组装起来
//attachment数组: std::array<VkAttachmentDescription, 1/*2*/> attachments = { colorAttachment/*, depthAttachment*/ }; //【RenderPass CreateInfo】 VkRenderPassCreateInfo renderPassInfo = {}; renderPassInfo.sType = VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO; renderPassInfo.attachmentCount = attachments.size(); renderPassInfo.pAttachments = attachments.data(); renderPassInfo.subpassCount = 1; renderPassInfo.pSubpasses = &subpass; renderPassInfo.dependencyCount = 1; renderPassInfo.pDependencies = &dependency; ThrowIfFailed(vkCreateRenderPass(Device, &renderPassInfo, nullptr, renderPass.replace()));The attachments specified during render pass creation are bound by wrapping them into a VkFramebuffer object. A framebuffer object references all of the VkImageView objects that represent the attachments. attachments 绑定到renderpass是通过将其封装到一个VkFramebuffer中的,一个VkFramebuffer需要引用所有的代表attachments的VkImageView对象。
在这里,只有一个颜色attachment,它对应的VkImageView是交换链的ImageView。(之后还会有深度的attachment,因此就会有深度缓冲的ImageView)。
swapChainFramebuffers.resize(swapChainImageViews.size(), VDeleter<VkFramebuffer>{Device, vkDestroyFramebuffer}); for (size_t i = 0; i < swapChainImageViews.size(); i++) { std::array<VkImageView, 1/*2*/> attachments = { swapChainImageViews[i] //, depthImageView }; //【Framebuffer CreateInfo】 VkFramebufferCreateInfo framebufferInfo = {}; framebufferInfo.sType = VK_STRUCTURE_TYPE_FRAMEBUFFER_CREATE_INFO; framebufferInfo.renderPass = renderPass; framebufferInfo.attachmentCount = attachments.size(); framebufferInfo.pAttachments = attachments.data(); framebufferInfo.width = swapChainExtent.width; framebufferInfo.height = swapChainExtent.height; framebufferInfo.layers = 1; ThrowIfFailed(vkCreateFramebuffer(Device, &framebufferInfo, nullptr, swapChainFramebuffers[i].replace())); }当使用vkCmdBeginRenderPass录制一个开始RenderPass的命令的时候,需要指定一个VkRenderPassBeginInfo,这时候就可以将上面创建的RenderPass和FrameBuffer传递过去了:
VkRenderPassBeginInfo renderPassInfo = {}; renderPassInfo.sType = VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO; renderPassInfo.renderPass = renderPass; renderPassInfo.framebuffer = swapChainFramebuffers[CurrentCommandListIndex];Commands in Vulkan, like drawing operations and memory transfers, are not executed directly using function calls. You have to record all of the operations you want to perform in command buffer objects. The advantage of this is that all of the hard work of setting up the drawing commands can be done in advance and in multiple threads. After that, you just have to tell Vulkan to execute the commands in the main loop. Vulkan中的命令,例如绘制操作和内存转移,并不是直接用函数调用来执行的。你必须将所有的操作都录制到一个command buffer对象中。好处是,所有设置绘制命令的累活可以提前并在多线程中去做。随后,你只需要在主循环中告诉Vulkan去执行那些命令就行了。
这一部分,D3D12和Vulkan具有高度的相似性,我将对比讨论。
首先,“命令"是占用内存,而这内存的分配则需要一个对象来维护。
对于D3D12,是ID3D12CommandAllocator:
Device->CreateCommandAllocator(D3D12_COMMAND_LIST_TYPE_DIRECT, IID_PPV_ARGS(&CommandAllocator));对于Vulkan,是VkCommandPool:
//【CommandPool CreateInfo】 VkCommandPoolCreateInfo poolInfo = {}; poolInfo.sType = VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO; poolInfo.queueFamilyIndex = findQueueFamilies(physicalDevice).graphicsFamily; poolInfo.flags = VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT; ThrowIfFailed(vkCreateCommandPool(Device, &poolInfo, nullptr, CommandPool.replace()));在D3D12中,命令将容纳在ID3D12GraphicsCommandList对象中:
Device->CreateCommandList(0, D3D12_COMMAND_LIST_TYPE_DIRECT, CommandAllocator.Get(), nullptr, IID_PPV_ARGS(&CommandList));对于Vulkan,命令容纳在VkCommandBuffer中:
CommandBuffers.resize(swapChainFramebuffers.size()); VkCommandBufferAllocateInfo allocInfo = {}; allocInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO; allocInfo.commandPool = CommandPool; allocInfo.level = VK_COMMAND_BUFFER_LEVEL_PRIMARY; allocInfo.commandBufferCount = (uint32_t)CommandBuffers.size(); ThrowIfFailed(vkAllocateCommandBuffers(Device, &allocInfo, CommandBuffers.data()));创建他们的时候都需要给他们在上一步创建的分配内存的对象。
在录制命令方面,他们都需要在开始录制和结束录制时调用一些函数。
录制时,D3D12是调用ID3D12GraphicsCommandList的成员函数:
CommandList->ClearRenderTargetView(rtvHandle, clearColor, 0, nullptr);而Vulkan是调用一系列前缀为vkCmd的接口,当然也需要将VkCommandBuffer传递给它。
//Vulkan没有专门的Clear命令,只有在开始新的RenderPass时能指定新的ClearColor //因此,暂时创建一个新的RenderPass用来模仿ClearColor的行为 VkRenderPassBeginInfo renderPassInfo = {}; renderPassInfo.sType = VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO; //指定RenderPass: renderPassInfo.renderPass = renderPass; //指定FrameBuffer: renderPassInfo.framebuffer = swapChainFramebuffers[CurrentCommandListIndex]; renderPassInfo.renderArea.offset = { 0, 0 }; renderPassInfo.renderArea.extent = swapChainExtent; std::array<VkClearValue, 1/*2*/> clearValues = {}; clearValues[0].color = { r, g, b, a }; //clearValues[1].depthStencil = { 1.0f, 0 }; renderPassInfo.clearValueCount = clearValues.size(); renderPassInfo.pClearValues = clearValues.data(); vkCmdBeginRenderPass(CommandBuffers[CurrentCommandListIndex], &renderPassInfo, VK_SUBPASS_CONTENTS_INLINE); vkCmdEndRenderPass(CommandBuffers[CurrentCommandListIndex]);为了执行命令,需要“队列”。
对于D3D11,是创建的ID3D12CommandQueue
D3D12_COMMAND_QUEUE_DESC queueDesc = {}; queueDesc.Flags = D3D12_COMMAND_QUEUE_FLAG_NONE; queueDesc.Type = D3D12_COMMAND_LIST_TYPE_DIRECT; Device->CreateCommandQueue(&queueDesc, IID_PPV_ARGS(&CommandQueue));对于Vulkan,是创建逻辑设备(VkDevice)时一同创建出的,随后可以得到队列的句柄VkQueue:
vkGetDeviceQueue(Device, graphicsFamilyIndex, 0, &GraphicsQueue); vkGetDeviceQueue(Device, presentFamilyIndex, 0, &PresentQueue);D3D12:
ID3D12CommandList* ppCommandLists[] = { CommandList.Get() }; CommandQueue->ExecuteCommandLists(_countof(ppCommandLists), ppCommandLists);Vulkan:
//获得下一帧的Image序号 vkAcquireNextImageKHR(Device, SwapChain, std::numeric_limits<uint64_t>::max(), imageAvailableSemaphore, VK_NULL_HANDLE, &CurrentBackBufferIndex); //【Submit Info】 VkSubmitInfo submitInfo = {}; submitInfo.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO; //等待的信号: VkSemaphore waitSemaphores[] = { imageAvailableSemaphore }; VkPipelineStageFlags waitStages[] = { VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT }; submitInfo.waitSemaphoreCount = 1; submitInfo.pWaitSemaphores = waitSemaphores; submitInfo.pWaitDstStageMask = waitStages; //命令: submitInfo.commandBufferCount = 1; submitInfo.pCommandBuffers = &CommandBuffers[CurrentBackBufferIndex]; //执行完发出的信号: VkSemaphore signalSemaphores[] = { renderFinishedSemaphore }; submitInfo.signalSemaphoreCount = 1; submitInfo.pSignalSemaphores = signalSemaphores; //提交队列 ThrowIfFailed(vkQueueSubmit(GraphicsQueue, 1, &submitInfo, VK_NULL_HANDLE));“并行”提高了程序的效率,但是还是存在需要“同步”的情况(尽管想尽量避免,因为它破坏了并行)。 因为有两个并行的处理器:CPU和GPU,而且他们内部又有多个并行的线程,所以让“同步”变得比较复杂。需要考虑:CPU和CPU之间的同步,CPU和GPU之间的同步,GPU和GPU之间的同步。
当前的工程里,D3D12和Vulkan都有同步相关的代码,但是他们所作的“同步”是不一样的。D3D12的Fence是用来同步CPU和GPU的,而Vulkan的Semaphore是用来同步绘制命令和呈现命令的,不过当前Vulkan代码里也有同步CPU和GPU的操作:vkDeviceWaitIdle。
目前我不准备讨论太多这方面的东西。一是因为我对此理解不深。二是,当前同步CPU和GPU的操作(指D3D12的Fence和Vulkan的vkDeviceWaitIdle)似乎从性能考虑都不是最佳的。因此我期望在未来有更深的理解时再专门讨论这个问题。