port from perforce

2026-04-18 22:31:51 +02:00
commit 8d0ab5b7cc
8409 changed files with 3972376 additions and 0 deletions
--- a/ruins64k/tools/NvPerfUtility/CREDITS.md
+++ b/ruins64k/tools/NvPerfUtility/CREDITS.md
@@ -0,0 +1,2 @@
+## Attributions / Licenses
+- Vulkan and the Vulkan logo are trademarks of the [Khronos Group Inc.](http://www.khronos.org)
--- a/ruins64k/tools/NvPerfUtility/LICENSE
+++ b/ruins64k/tools/NvPerfUtility/LICENSE
@@ -0,0 +1,53 @@
+Apache License
+
+Version 2.0, January 2004
+
+http://www.apache.org/licenses/
+
+TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+1. Definitions.
+
+"License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.
+
+"Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.
+
+"Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
+
+"You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.
+
+"Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
+
+"Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
+
+"Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).
+
+"Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.
+
+"Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."
+
+"Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.
+
+2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
+
+3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.
+
+4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
+
+You must give any other recipients of the Work or Derivative Works a copy of this License; and
+You must cause any modified files to carry prominent notices stating that You changed the files; and
+You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
+If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.
+
+You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.
+5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
+
+6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.
+
+7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
+
+8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
+
+9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
+
+END OF TERMS AND CONDITIONS
--- a/ruins64k/tools/NvPerfUtility/build/NvPerfSDK.props
+++ b/ruins64k/tools/NvPerfUtility/build/NvPerfSDK.props
@@ -0,0 +1,28 @@
+<?xml version="1.0" encoding="utf-8"?>
+<Project ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
+    <ImportGroup Label="PropertySheets" />
+    <PropertyGroup Label="UserMacros">
+        <_Relative_NvPerf_host_dll>bin/x64/nvperf_host.dll</_Relative_NvPerf_host_dll>
+        <_Possible_NvPerf_Dir_0>$([MSBuild]::NormalizePath('$(MSBuildThisFileDirectory)../../NvPerf/'))</_Possible_NvPerf_Dir_0>
+        <_Possible_NvPerf_Dir_1>$([MSBuild]::NormalizePath('$(MSBuildThisFileDirectory)../../../NvPerf/'))</_Possible_NvPerf_Dir_1>
+        <NvPerfSdkPath></NvPerfSdkPath>
+        <NvPerfSdkPath Condition="'$(NvPerfSdkPath)'=='' And Exists('$(_Possible_NvPerf_Dir_0)/$(_Relative_NvPerf_host_dll)')">$(_Possible_NvPerf_Dir_0)</NvPerfSdkPath>
+        <NvPerfSdkPath Condition="'$(NvPerfSdkPath)'=='' And Exists('$(_Possible_NvPerf_Dir_1)/$(_Relative_NvPerf_host_dll)')">$(_Possible_NvPerf_Dir_1)</NvPerfSdkPath>
+        <NvPerfSdkPath Condition="'$(NvPerfSdkPath)'=='' And '$(NVPERF_SDK_PATH)' != '' And Exists('$(NVPERF_SDK_PATH)/$(_Relative_NvPerf_host_dll)')">$(NVPERF_SDK_PATH)</NvPerfSdkPath>
+        <NvPerfSdkPath Condition="'$(NvPerfSdkPath)'!=''">$([MSBuild]::NormalizePath($(NvPerfSdkPath)))</NvPerfSdkPath>
+        <NvPerfUtilityPath>$([MSBuild]::NormalizePath('$(MSBuildThisFileDirectory)../../NvPerfUtility/'))</NvPerfUtilityPath>
+    </PropertyGroup>
+    <ItemDefinitionGroup />
+    <ItemGroup />
+    <Target Name="PrintNvPerfLocation" BeforeTargets="ClCompile">
+        <Message
+            Condition="'$(NvPerfSdkPath)'!=''"
+            Text="NvPerf SDK found: NvPerfSdkPath = $(NvPerfSdkPath)" />
+        <Error
+            Condition="'$(NvPerfSdkPath)'==''"
+            Text="NvPerf SDK could not be found; please unzip the SDK into one of the following locations:
+    $(_Possible_NvPerf_Dir_0)
+    $(_Possible_NvPerf_Dir_1)
+  or set environment variable NVPERF_SDK_PATH" />
+    </Target>
+</Project>
--- a/ruins64k/tools/NvPerfUtility/build/NvPerfUtility.vcxproj
+++ b/ruins64k/tools/NvPerfUtility/build/NvPerfUtility.vcxproj
@@ -0,0 +1,118 @@
+<?xml version="1.0" encoding="utf-8"?>
+<Project DefaultTargets="Build" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
+  <ItemGroup Label="ProjectConfigurations">
+    <ProjectConfiguration Include="Debug|x64">
+      <Configuration>Debug</Configuration>
+      <Platform>x64</Platform>
+    </ProjectConfiguration>
+    <ProjectConfiguration Include="Release|x64">
+      <Configuration>Release</Configuration>
+      <Platform>x64</Platform>
+    </ProjectConfiguration>
+  </ItemGroup>
+  <PropertyGroup Label="Globals">
+    <VCProjectVersion>16.0</VCProjectVersion>
+    <Keyword>Win32Proj</Keyword>
+    <ProjectGuid>{ea22d2ac-ebf7-43e4-adb7-0f320c46692e}</ProjectGuid>
+    <RootNamespace>NvPerfUtility</RootNamespace>
+    <WindowsTargetPlatformVersion>10.0</WindowsTargetPlatformVersion>
+  </PropertyGroup>
+  <Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="Configuration">
+    <ConfigurationType>Utility</ConfigurationType>
+    <UseDebugLibraries>true</UseDebugLibraries>
+    <PlatformToolset>v142</PlatformToolset>
+    <CharacterSet>Unicode</CharacterSet>
+  </PropertyGroup>
+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="Configuration">
+    <ConfigurationType>Utility</ConfigurationType>
+    <UseDebugLibraries>false</UseDebugLibraries>
+    <PlatformToolset>v142</PlatformToolset>
+    <WholeProgramOptimization>true</WholeProgramOptimization>
+    <CharacterSet>Unicode</CharacterSet>
+  </PropertyGroup>
+  <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
+  <ImportGroup Label="ExtensionSettings">
+  </ImportGroup>
+  <ImportGroup Label="Shared">
+  </ImportGroup>
+  <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
+    <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
+  </ImportGroup>
+  <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
+    <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
+  </ImportGroup>
+  <Import Project="NvPerfSDK.props" />
+  <PropertyGroup Label="UserMacros" />
+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
+    <LinkIncremental>true</LinkIncremental>
+    <IncludePath>$(NvPerfUtilityPath)/include;$(NvPerfSdkPath)/include;$(IncludePath)</IncludePath>
+  </PropertyGroup>
+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
+    <LinkIncremental>false</LinkIncremental>
+    <IncludePath>$(NvPerfUtilityPath)/include;$(NvPerfSdkPath)/include;$(IncludePath)</IncludePath>
+  </PropertyGroup>
+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
+    <ClCompile>
+      <WarningLevel>Level3</WarningLevel>
+      <SDLCheck>true</SDLCheck>
+      <PreprocessorDefinitions>_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
+      <ConformanceMode>true</ConformanceMode>
+    </ClCompile>
+    <Link>
+      <SubSystem>Console</SubSystem>
+      <GenerateDebugInformation>true</GenerateDebugInformation>
+    </Link>
+  </ItemDefinitionGroup>
+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
+    <ClCompile>
+      <WarningLevel>Level3</WarningLevel>
+      <FunctionLevelLinking>true</FunctionLevelLinking>
+      <IntrinsicFunctions>true</IntrinsicFunctions>
+      <SDLCheck>true</SDLCheck>
+      <PreprocessorDefinitions>NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
+      <ConformanceMode>true</ConformanceMode>
+    </ClCompile>
+    <Link>
+      <SubSystem>Console</SubSystem>
+      <EnableCOMDATFolding>true</EnableCOMDATFolding>
+      <OptimizeReferences>true</OptimizeReferences>
+      <GenerateDebugInformation>true</GenerateDebugInformation>
+    </Link>
+  </ItemDefinitionGroup>
+  <ItemGroup>
+    <ClInclude Include="../include/NvPerfCounterConfiguration.h" />
+    <ClInclude Include="../include/NvPerfCounterData.h" />
+    <ClInclude Include="../include/NvPerfD3D.h" />
+    <ClInclude Include="../include/NvPerfD3D12.h" />
+    <ClInclude Include="../include/NvPerfDeviceProperties.h" />
+    <ClInclude Include="../include/NvPerfInit.h" />
+    <ClInclude Include="../include/NvPerfMetricsConfigBuilder.h" />
+    <ClInclude Include="../include/NvPerfMetricsEvaluator.h" />
+    <ClInclude Include="../include/NvPerfRangeProfiler.h" />
+    <ClInclude Include="../include/NvPerfRangeProfilerD3D12.h" />
+    <ClInclude Include="../include/NvPerfRangeProfilerVulkan.h" />
+    <ClInclude Include="../include/NvPerfReportDefinition.h" />
+    <ClInclude Include="../include/NvPerfReportDefinitionGA10X.h" />
+    <ClInclude Include="../include/NvPerfReportDefinitionGV100.h" />
+    <ClInclude Include="../include/NvPerfReportDefinitionHAL.h" />
+    <ClInclude Include="../include/NvPerfReportDefinitionTU10X.h" />
+    <ClInclude Include="../include/NvPerfReportDefinitionTU11X.h" />
+    <ClInclude Include="../include/NvPerfReportGenerator.h" />
+    <ClInclude Include="../include/NvPerfReportGeneratorD3D12.h" />
+    <ClInclude Include="../include/NvPerfReportGeneratorVulkan.h" />
+    <ClInclude Include="../include/NvPerfVulkan.h" />
+    <ClInclude Include="$(NvPerfSdkPath)/include/nvperf_d3d12_host.h" />
+    <ClInclude Include="$(NvPerfSdkPath)/include/nvperf_d3d12_target.h" />
+    <ClInclude Include="$(NvPerfSdkPath)/include/nvperf_device_host.h" />
+    <ClInclude Include="$(NvPerfSdkPath)/include/nvperf_device_target.h" />
+    <ClInclude Include="$(NvPerfSdkPath)/include/nvperf_host.h" />
+    <ClInclude Include="$(NvPerfSdkPath)/include/nvperf_target.h" />
+    <ClInclude Include="$(NvPerfSdkPath)/include/nvperf_versions_target.h" />
+    <ClInclude Include="$(NvPerfSdkPath)/include/nvperf_vulkan_host.h" />
+    <ClInclude Include="$(NvPerfSdkPath)/include/nvperf_vulkan_target.h" />
+  </ItemGroup>
+  <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
+  <ImportGroup Label="ExtensionTargets">
+  </ImportGroup>
+</Project>
--- a/ruins64k/tools/NvPerfUtility/build/NvPerfUtility.vcxproj.filters
+++ b/ruins64k/tools/NvPerfUtility/build/NvPerfUtility.vcxproj.filters
@@ -0,0 +1,35 @@
+<?xml version="1.0" encoding="utf-8"?>
+<Project ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
+  <ItemGroup>
+    <ClInclude Include="$(NvPerfSdkPath)/include/nvperf_d3d12_host.h" />
+    <ClInclude Include="$(NvPerfSdkPath)/include/nvperf_d3d12_target.h" />
+    <ClInclude Include="$(NvPerfSdkPath)/include/nvperf_device_host.h" />
+    <ClInclude Include="$(NvPerfSdkPath)/include/nvperf_device_target.h" />
+    <ClInclude Include="$(NvPerfSdkPath)/include/nvperf_host.h" />
+    <ClInclude Include="$(NvPerfSdkPath)/include/nvperf_target.h" />
+    <ClInclude Include="$(NvPerfSdkPath)/include/nvperf_versions_target.h" />
+    <ClInclude Include="$(NvPerfSdkPath)/include/nvperf_vulkan_host.h" />
+    <ClInclude Include="$(NvPerfSdkPath)/include/nvperf_vulkan_target.h" />
+    <ClInclude Include="../include/NvPerfCounterConfiguration.h" />
+    <ClInclude Include="../include/NvPerfCounterData.h" />
+    <ClInclude Include="../include/NvPerfD3D.h" />
+    <ClInclude Include="../include/NvPerfD3D12.h" />
+    <ClInclude Include="../include/NvPerfDeviceProperties.h" />
+    <ClInclude Include="../include/NvPerfInit.h" />
+    <ClInclude Include="../include/NvPerfMetricsConfigBuilder.h" />
+    <ClInclude Include="../include/NvPerfMetricsEvaluator.h" />
+    <ClInclude Include="../include/NvPerfRangeProfiler.h" />
+    <ClInclude Include="../include/NvPerfRangeProfilerD3D12.h" />
+    <ClInclude Include="../include/NvPerfRangeProfilerVulkan.h" />
+    <ClInclude Include="../include/NvPerfReportDefinition.h" />
+    <ClInclude Include="../include/NvPerfReportDefinitionGA10X.h" />
+    <ClInclude Include="../include/NvPerfReportDefinitionGV100.h" />
+    <ClInclude Include="../include/NvPerfReportDefinitionHAL.h" />
+    <ClInclude Include="../include/NvPerfReportDefinitionTU10X.h" />
+    <ClInclude Include="../include/NvPerfReportDefinitionTU11X.h" />
+    <ClInclude Include="../include/NvPerfReportGenerator.h" />
+    <ClInclude Include="../include/NvPerfReportGeneratorD3D12.h" />
+    <ClInclude Include="../include/NvPerfReportGeneratorVulkan.h" />
+    <ClInclude Include="../include/NvPerfVulkan.h" />
+  </ItemGroup>
+</Project>
--- a/ruins64k/tools/NvPerfUtility/build/ReportGenerator/copyright.txt
+++ b/ruins64k/tools/NvPerfUtility/build/ReportGenerator/copyright.txt
@@ -0,0 +1,15 @@
+/*
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
--- a/ruins64k/tools/NvPerfUtility/build/ReportGenerator/profiler_report_generator.py
+++ b/ruins64k/tools/NvPerfUtility/build/ReportGenerator/profiler_report_generator.py
@@ -0,0 +1,232 @@
+# Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from profiler_report_types import *
+import argparse
+import sys
+import os
+
+#===============================================================================
+# Dep File generation, compatible with Make or Ninja build systems
+#===============================================================================
+
+def get_loaded_module_file_names():
+    module_file_names = set()
+    for name, module in sys.modules.items():
+        path = getattr(module, "__file__", None)
+        if not path:
+            continue
+        path = os.path.realpath(path)
+        if path.endswith("<frozen>"):
+            continue
+        if not os.path.isabs(path):
+            path = os.path.abspath(path)
+        if not os.path.isfile(path):
+            continue # filter out directories
+        module_file_names.add(path)
+    return sorted(list(module_file_names))
+
+def gen_depfile(target_file_path, buildroot):
+    target_path_canonicalized = os.path.normpath(os.path.normcase(target_file_path))
+    buildroot_canonicalized = os.path.normpath(os.path.normcase(buildroot))
+    target_path_final = target_file_path
+    if target_path_canonicalized.startswith(buildroot_canonicalized):
+        target_path_final = target_file_path[len(buildroot_canonicalized):]
+    if target_path_final[0] in ('\\', '/'):
+        target_path_final = target_path_final[1:]
+
+    module_file_names = get_loaded_module_file_names()
+
+    depfile_contents = []
+    depfile_contents.append(target_path_final + ':\\')
+    for module_file_name in module_file_names:
+        depfile_contents.append('\t' + module_file_name + ' \\')
+
+    return '\n'.join(depfile_contents)
+
+# target_file_path : the file being generated
+# buildroot        : root directory of the build system; this prefix is removed from target_file_path to pacify ninja
+# depfile_path     : the depfile to be written
+def write_depfile(target_file_path, buildroot, depfile_path):
+    with open(depfile_path, 'w', encoding='utf-8') as out_fd:
+        depfile_str = gen_depfile(target_file_path, buildroot)
+        out_fd.write(depfile_str)
+
+#===============================================================================
+# C++ header generation
+#===============================================================================
+
+def write_cpp_file(out_fd, report_definition):
+    out_fd.write(r'''
+    namespace {} {{
+'''.format(report_definition.name))
+
+    out_fd.write(r'''
+        inline ReportDefinition GetReportDefinition()
+        {''')
+
+    # counters
+    if len(report_definition.required_counters):
+        out_fd.write(r'''
+            static const char* const RequiredCounters[] = {
+''')
+        for counter in report_definition.required_counters:
+            out_fd.write(r'''                "{}",
+'''.format(counter))
+        out_fd.write(r'''            };
+''')
+
+    # ratios
+    if len(report_definition.required_ratios):
+        out_fd.write(r'''
+            static const char* const RequiredRatios[] = {
+''')
+        for ratio in report_definition.required_ratios:
+            out_fd.write(r'''                "{}",
+'''.format(ratio))
+        out_fd.write(r'''            };
+''')
+
+    # throughputs
+    if len(report_definition.required_throughputs):
+        out_fd.write(r'''
+            static const char* const RequiredThroughputs[] = {
+''')
+        for throughput in report_definition.required_throughputs:
+            out_fd.write(r'''                "{}",
+'''.format(throughput))
+        out_fd.write(r'''            };
+''')
+
+    # html template
+    assert(len(report_definition.html));
+    out_fd.write(r'''
+            static const unsigned char ReportContents[] = {''')
+    barray = bytearray(report_definition.html, 'utf-8')
+    formatted_string = []
+    for index, b in enumerate(barray):
+        if index % 20 == 0:
+            formatted_string += '\n                '
+        assert(b <= 0xFF)
+        formatted_string.append('0x{:02x}, '.format(b))
+    out_fd.write("".join(formatted_string))
+    out_fd.write(r'''0x0
+            };
+''')
+
+    out_fd.write(r'''
+            ReportDefinition reportDefinition = {''')
+
+    if len(report_definition.required_counters):
+        out_fd.write(r'''
+                RequiredCounters,
+                sizeof(RequiredCounters) / sizeof(RequiredCounters[0]),''')
+    else:
+        out_fd.write(r'''
+                nullptr,
+                0,''')
+
+    if len(report_definition.required_ratios):
+        out_fd.write(r'''
+                RequiredRatios,
+                sizeof(RequiredRatios) / sizeof(RequiredRatios[0]),''')
+    else:
+        out_fd.write(r'''
+                nullptr,
+                0,''')
+
+    if len(report_definition.required_throughputs):
+        out_fd.write(r'''
+                RequiredThroughputs,
+                sizeof(RequiredThroughputs) / sizeof(RequiredThroughputs[0]),''')
+    else:
+        out_fd.write(r'''
+                nullptr,
+                0,''')
+    out_fd.write(r'''
+                (const char*)ReportContents
+            };
+            return reportDefinition;
+        }
+''')
+    out_fd.write(r'''
+    }} // namespace {}
+
+
+'''.format(report_definition.name))
+
+#===============================================================================
+# Main
+#===============================================================================
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='Generate HTML report definition')
+    parser.add_argument('--chip', type=str, required=True, help='chip name, e.g. tu10x')
+    parser.add_argument('--outDir', type=str, required=True, help='output directory')
+    parser.add_argument('--pypath', default=[], action='append', required=False, help="Python module paths.")
+    parser.add_argument('--buildroot', default='', required=False, help="build root dir for depfile")
+    parser.add_argument('--copyright', type=str, help="Copyright header.")
+
+    args = parser.parse_args()
+    sys.path.extend(args.pypath)
+    sys.path.extend(".")
+    chip = args.chip
+    report_module_name = "report_" + args.chip
+    try:
+        report_module = __import__(report_module_name)
+    except ImportError:
+        raise ImportError('Module "{}" is not found, this could happen due to invalid chip name or insufficient --pypath.'.format(report_module_name))
+    per_range_report_definition = report_module.get_per_range_report_definition()
+    summary_report_definition = report_module.get_summary_report_definition()
+
+    if not os.path.isdir(args.outDir):
+        raise Exception('Invalid argument for --outDir: {}'.format(args.outDir))
+
+    # Per-range report: debug html(this can be used for inspection, the debug mode also lists the metrics that are used by each table)
+    range_debug_html_file_name = 'NvPerfReportDefinition{}_range_debug.html'.format(chip.upper())
+    range_debug_html_file_path = os.path.join(args.outDir, range_debug_html_file_name)
+    with open(range_debug_html_file_path, 'w', encoding='utf-8') as out_fd:
+        out_fd.write(per_range_report_definition.html)
+
+    # Summary report: debug html
+    summary_debug_html_file_name = 'NvPerfReportDefinition{}_summary_debug.html'.format(chip.upper())
+    summary_debug_html_file_path = os.path.join(args.outDir, summary_debug_html_file_name)
+    with open(summary_debug_html_file_path, 'w', encoding='utf-8') as out_fd:
+        out_fd.write(summary_report_definition.html)
+
+
+    # CPP file
+    cpp_file_name = 'NvPerfReportDefinition{}.h'.format(chip.upper())
+    cpp_file_path = os.path.join(args.outDir, cpp_file_name)
+    with open(cpp_file_path, 'w') as out_fd:
+        if args.copyright:
+            with open(args.copyright, 'r') as copyright_fd:
+                out_fd.write(copyright_fd.read())
+
+        out_fd.write(r'''
+#pragma once
+
+#include "NvPerfReportDefinition.h"
+
+namespace nv {{ namespace perf {{ namespace {} {{
+'''.format(args.chip))
+        # per-range report
+        write_cpp_file(out_fd, per_range_report_definition)
+        # summary report
+        write_cpp_file(out_fd, summary_report_definition)
+        out_fd.write(r'''
+} } }''')
+
+    # Emit a single depfile using the C++ file as the representative output.
+    write_depfile(cpp_file_path, args.buildroot, cpp_file_path + '.d')
--- a/ruins64k/tools/NvPerfUtility/build/ReportGenerator/profiler_report_types.py
+++ b/ruins64k/tools/NvPerfUtility/build/ReportGenerator/profiler_report_types.py
@@ -0,0 +1,69 @@
+# Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+class DataTable:
+    def __init__(dtable, name, html, jsfunc, jscall, required_counters, required_ratios, required_throughputs, workflow):
+        dtable.name = name
+        dtable.html = html
+        dtable.jsfunc = jsfunc
+        dtable.jscall = jscall
+        dtable.required_counters = required_counters
+        dtable.required_ratios = required_ratios
+        dtable.required_throughputs = required_throughputs
+        dtable.workflow = workflow
+
+class DataSection:
+    def __init__(section, dtables, inter_table_spacing=True, title=None):
+        section.dtables = dtables
+        section.inter_table_spacing = inter_table_spacing
+        section.title = title
+
+class ReportDefinition:
+    def __init__(rd, name, html, required_counters, required_ratios, required_throughputs):
+        rd.name = name
+        rd.html = html
+        rd.required_counters = required_counters
+        rd.required_ratios = required_ratios
+        rd.required_throughputs = required_throughputs
+
+def get_data_tables(sections):
+    dtables = [dtable for section in sections for dtable in section.dtables]
+    return dtables
+
+def get_required_counters(sections):
+    required_counters = set()
+    dtables = get_data_tables(sections)
+    for dtable in dtables:
+        for counter in dtable.required_counters:
+            required_counters.add(counter)
+    required_counters = sorted(list(required_counters))
+    return required_counters
+
+def get_required_ratios(sections):
+    required_ratios = set()
+    dtables = get_data_tables(sections)
+    for dtable in dtables:
+        for ratio in dtable.required_ratios:
+            required_ratios.add(ratio)
+    required_ratios = sorted(list(required_ratios))
+    return required_ratios
+
+def get_required_throughputs(sections):
+    required_throughputs = set()
+    dtables = get_data_tables(sections)
+    for dtable in dtables:
+        for throughput in dtable.required_throughputs:
+            required_throughputs.add(throughput)
+    required_throughputs = sorted(list(required_throughputs))
+    return required_throughputs
--- a/ruins64k/tools/NvPerfUtility/build/ReportGenerator/pub/ampere/report_ga10x.py
+++ b/ruins64k/tools/NvPerfUtility/build/ReportGenerator/pub/ampere/report_ga10x.py
@@ -0,0 +1,80 @@
+# Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from profiler_report_types import *
+import pub.tables_common as tables_common
+import pub.ampere.tables_ga10x as tables_ga10x
+
+def get_per_range_report_definition():
+    sections = [
+        DataSection([
+            tables_ga10x.DevicePropertiesGenerator().make_data_table(),
+            tables_common.ClocksGenerator().make_data_table(),
+        ], inter_table_spacing=False),
+        DataSection([
+            tables_common.TopLevelStatsGenerator().make_data_table(),
+            tables_ga10x.TopThroughputsGenerator().make_data_table(),
+            tables_common.CacheHitRates().make_data_table(),
+        ], title='Overview Section'),
+        DataSection([
+            tables_common.MainMemoryGenerator().make_data_table(),
+            tables_ga10x.L2TrafficByMemoryApertureShortBreakdownGenerator(show_generic_workflow=True).make_data_table(),
+            tables_ga10x.L2TrafficBySrcBreakdownGenerator(show_generic_workflow=False).make_data_table(),
+            tables_common.L1TexThroughputsGenerator().make_data_table(),
+            tables_common.L1TexTrafficBreakdownGenerator().make_data_table(),
+        ], title='Memory Performance Section'),
+        DataSection([
+            tables_ga10x.SmThroughputsGenerator().make_data_table(),
+            tables_ga10x.SmInstExecutedGenerator().make_data_table(),
+            tables_common.SmShaderExecutionGenerator().make_data_table(),
+            tables_ga10x.SmResourceUsageGenerator().make_data_table(),
+            tables_common.SmWarpLaunchStallsGenerator().make_data_table(),
+            tables_common.WarpIssueStallsGenerator().make_data_table(),
+        ], title='Shader Performance Section'),
+        DataSection([
+            tables_common.PrimitiveDataflowGenerator().make_data_table(),
+            tables_ga10x.RasterDataflowGenerator().make_data_table(),
+        ], title='3D Pipeline Section'),
+        DataSection([
+            tables_ga10x.L2TrafficByMemoryApertureBreakdownGenerator(show_generic_workflow=False).make_data_table(),
+            tables_ga10x.L2TrafficByOperationBreakdownGenerator(show_generic_workflow=False).make_data_table(),
+        ], title='Additional L2 Traffic Breakdowns Section'),
+        DataSection([
+            tables_common.AdditionalMetricsGenerator().make_data_table(),
+            tables_common.AllCountersGenerator().make_data_table(),
+            tables_common.AllRatiosGenerator().make_data_table(),
+            tables_common.AllThroughputsGenerator().make_data_table(),
+        ], title='Exhaustive Listings Section'),
+    ]
+    html = tables_common.generate_range_html_common(sections)
+    required_counters = get_required_counters(sections)
+    required_ratios = get_required_ratios(sections)
+    required_throughputs = get_required_throughputs(sections)
+    return ReportDefinition('PerRangeReport', html, required_counters, required_ratios, required_throughputs)
+
+def get_summary_report_definition():
+    sections = [
+        DataSection([
+            tables_common.CollectionInfoGenerator().make_data_table(),
+        ]),
+        DataSection([
+            tables_ga10x.RangesSummaryGenerator().make_data_table(),
+        ], title='Summary of Measured Ranges'),
+    ]
+    html = tables_common.generate_summary_html_common(sections)
+    required_counters = get_required_counters(sections)
+    required_ratios = get_required_ratios(sections)
+    required_throughputs = get_required_throughputs(sections)
+    return ReportDefinition('SummaryReport', html, required_counters, required_ratios, required_throughputs)
+
--- a/ruins64k/tools/NvPerfUtility/build/ReportGenerator/pub/ampere/tables_ga10x.py
+++ b/ruins64k/tools/NvPerfUtility/build/ReportGenerator/pub/ampere/tables_ga10x.py
@@ -0,0 +1,646 @@
+# Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from profiler_report_types import *
+import pub.tables_common as tables_common
+
+class DevicePropertiesGenerator(tables_common.DevicePropertiesGenerator):
+    def __init__(gen):
+        super().__init__()
+        gen.l2cacheSizePerLts = 128
+
+class TopThroughputsGenerator(tables_common.TopThroughputsGenerator):
+    def __init__(gen):
+        super().__init__()
+        gen.rows += [
+            gen.Row('Shader'      , '<a href="#SM-Instruction-Throughput">SM (Shader Cores)</a>'    , "getThroughputPct('sm__throughput')"),
+            gen.Row('Memory'      , '<a href="#L1TEX-Throughput">L1TEX Cache</a>'                   , "getThroughputPct('l1tex__throughput')"),
+            gen.Row('Memory'      , '<a href="#L2-Sector-Traffic">L2 Cache</a>'                     , "getThroughputPct('lts__throughput')"),
+            gen.Row('Memory'      , '<a href="#Main-Memory-Throughput">DRAM</a>'                    , "getThroughputPct('dram__throughput')"),
+            gen.Row('Memory'      , '<a href="#Main-Memory-Throughput">PCIe</a>'                    , "getThroughputPct('pcie__throughput')"),
+            gen.Row('World Pipe'  , '<a href="#Primitive-Data-Flow">PDA Index Fetch</a>'            , "getThroughputPct('pda__throughput')"),
+            gen.Row('World Pipe'  , '<a href="#Primitive-Data-Flow">Vertex Attr. Fetch</a>'         , "getThroughputPct('vaf__throughput')"),
+            gen.Row('World Pipe'  , '<a href="#Primitive-Data-Flow">Primitive Engine</a>'           , "getThroughputPct('pes__throughput')"),
+            gen.Row('Screen Pipe' , '<a href="#Raster-Data-Flow">RASTER</a>'                        , "getThroughputPct('raster__throughput')"),
+            gen.Row('Screen Pipe' , '<a href="#Raster-Data-Flow">PROP (Pre-ROP)</a>'                , "getThroughputPct('prop__throughput')"),
+            gen.Row('Screen Pipe' , '<a href="#Raster-Data-Flow">ZROP (Depth-Test)</a>'             , "getThroughputPct('zrop__throughput')"),
+            gen.Row('Screen Pipe' , '<a href="#Raster-Data-Flow">CROP (Color Blend)</a>'            , "getThroughputPct('crop__throughput')"),
+        ]
+        gen.required_throughputs += [
+            'crop__throughput',
+            'dram__throughput',
+            'l1tex__throughput',
+            'lts__throughput',
+            'pcie__throughput',
+            'pda__throughput',
+            'pes__throughput',
+            'prop__throughput',
+            'raster__throughput',
+            'sm__throughput',
+            'vaf__throughput',
+            'zrop__throughput',
+        ]
+
+class SmThroughputsGenerator(tables_common.SmThroughputsGenerator):
+    def __init__(gen):
+        super().__init__()
+        gen.pipes = [
+            gen.Pipe('adu'          , 'Computed branches and indexed constants'),
+            gen.Pipe('alu'          , 'INT32 except multiply; FP32 comparison'),
+            gen.Pipe('cbu'          , 'Divergent branches and control flow'),
+            gen.Pipe('fma'          , 'FP32 mul/add, FP16 mul/add'),
+            gen.Pipe('fmaheavy'     , 'FP32 mul/add and INT32 multiply'),
+            gen.Pipe('fp64'         , 'FP64 mul/add'),
+            gen.Pipe('ipa'          , 'Pixel shader attribute interpolation'),
+            gen.Pipe('lsu'          , 'Global, local, shared memory, and misc'),
+            gen.Pipe('tensor'       , 'Tensor matrix multiply (FP16, INT8/4/1)'),
+            gen.Pipe('tex'          , 'Texture and surface memory'),
+            gen.Pipe('uniform'      , 'Warp-level scalar operations'),
+            gen.Pipe('xu'           , 'Transcendentals and float/int conversion'),
+        ]
+        for pipe in gen.pipes:
+            gen.required_counters.extend(pipe.get_counter_names())
+
+
+class SmInstExecutedGenerator(tables_common.SmInstExecutedGenerator):
+    def __init__(gen):
+        super().__init__()
+        gen.pipes = [
+            gen.Pipe('total'        , 'All instructions', True),
+            gen.Pipe('adu'          , 'Computed branches and indexed constants'),
+            gen.Pipe('alu'          , 'INT32 except multiply; FP32 comparison', True),
+            gen.Pipe('cbu'          , 'Divergent branches and control flow'),
+            gen.Pipe('fma'          , 'FP32 mul/add, FP16 mul/add', True),
+            gen.Pipe('fmaheavy'     , 'FP32 mul/add and INT32 multiply', True),
+            gen.Pipe('fp64'         , 'FP64 mul/add', True),
+            gen.Pipe('ipa'          , 'Pixel shader attribute interpolation', True),
+            gen.Pipe('lsu'          , 'Global, local, shared memory, and misc', True),
+            gen.Pipe('tensor'       , 'Tensor matrix multiply (FP16, INT8/4/1)'),
+            gen.Pipe('tex'          , 'Texture and surface memory'),
+            gen.Pipe('uniform'      , 'Warp-level scalar operations'),
+            gen.Pipe('xu'           , 'Transcendentals and float/int conversion', True),
+        ]
+        for pipe in gen.pipes:
+            gen.required_counters.extend(pipe.get_counter_names())
+
+class L2TrafficByMemoryApertureShortBreakdownGenerator(tables_common.L2TrafficBreakdownGenerator):
+    def __init__(gen, show_generic_workflow):
+        super().__init__(show_generic_workflow=show_generic_workflow)
+        gen.name = 'L2SectorTrafficBreakdownByMemoryApertureShort'
+        gen.table_id = 'L2-Sector-Traffic-By-Memory-Aperture-Short'
+        gen.column_names = [
+            ('Memory Aperture', 'To Memory'),
+            ('Op', 'Op'),
+        ]
+        gen.workflow += ' This table decomposes L2 bandwidth to each destination, per operation. A <a href="#L2-Sector-Traffic-By-Memory-Aperture">more detailed version of this table</a> can be found below.'
+
+        gen.nodes = [
+            gen.Node('DRAM' , ['lts__average_t_sector_aperture_device'], [
+                gen.Node('Reads',      ['lts__average_t_sector_aperture_device_op_read'],  []),
+                gen.Node('Writes',     ['lts__average_t_sector_aperture_device_op_write'], []),
+                gen.Node('Atomics',    ['lts__average_t_sector_aperture_device_op_atom'], []),
+                gen.Node('Reductions', ['lts__average_t_sector_aperture_device_op_red'],  []),
+            ]),
+            gen.Node('System Memory', ['lts__average_t_sector_aperture_sysmem'], [
+                gen.Node('Reads',      ['lts__average_t_sector_aperture_sysmem_op_read'],  []),
+                gen.Node('Writes',     ['lts__average_t_sector_aperture_sysmem_op_write'], []),
+                gen.Node('Atomics',    ['lts__average_t_sector_aperture_sysmem_op_atom'], []),
+                gen.Node('Reductions', ['lts__average_t_sector_aperture_sysmem_op_red'],  []),
+            ]),
+            gen.Node('Peer Memory', ['lts__average_t_sector_aperture_peer'], [
+                gen.Node('Reads',      ['lts__average_t_sector_aperture_peer_op_read'],  []),
+                gen.Node('Writes',     ['lts__average_t_sector_aperture_peer_op_write'], []),
+                gen.Node('Atomics',    ['lts__average_t_sector_aperture_peer_op_atom'], []),
+                gen.Node('Reductions', ['lts__average_t_sector_aperture_peer_op_red'],  []),
+            ]),
+        ]
+
+        gen.required_ratios = gen.get_required_ratios()
+
+class L2TrafficBySrcBreakdownGenerator(tables_common.L2TrafficBreakdownGenerator):
+    def __init__(gen, show_generic_workflow):
+        super().__init__(show_generic_workflow=show_generic_workflow)
+        gen.name = 'L2SectorTrafficBreakdownBySource'
+        gen.table_id = 'L2-Sector-Traffic-By-Source'
+        gen.column_names = [
+            ('Source Breakdown', 'From Source'),
+            ('Unit Breakdown', 'From Unit'),
+            ('Memory Aperture', 'To Memory'),
+            ('Op', 'Op'),
+        ]
+        gen.workflow += ' This table decomposes L2 bandwidth from each source unit, to each destination, per operation. See also: these tables that prioritize <a href="#L2-Sector-Traffic-By-Memory-Aperture">destination Memory Aperture</a> and <a href="#L2-Sector-Traffic-By-Operation">Operation</a>.'
+
+        gen.nodes = [
+            gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc'], [
+                gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex'], [
+                    gen.Node('DRAM' , ['lts__average_t_sector_srcunit_tex_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_tex_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_tex_aperture_device_op_write'], []),
+                        gen.Node('Atomics',    ['lts__average_t_sector_srcunit_tex_aperture_device_op_atom'], []),
+                        gen.Node('Reductions', ['lts__average_t_sector_srcunit_tex_aperture_device_op_red'],  []),
+                    ]),
+                    gen.Node('Peer Memory', ['lts__average_t_sector_srcunit_tex_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_tex_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_tex_aperture_peer_op_write'], []),
+                        gen.Node('Atomics',    ['lts__average_t_sector_srcunit_tex_aperture_peer_op_atom'], []),
+                        gen.Node('Reductions', ['lts__average_t_sector_srcunit_tex_aperture_peer_op_red'],  []),
+                    ]),
+                    gen.Node('System Memory', ['lts__average_t_sector_srcunit_tex_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_write'], []),
+                        gen.Node('Atomics',    ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_atom'], []),
+                        gen.Node('Reductions', ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_red'],  []),
+                    ]),
+                ]),
+                gen.Node('L1.5 Constant Cache', ['lts__average_t_sector_srcunit_gcc'], [
+                    gen.Node('DRAM' , ['lts__average_t_sector_srcunit_gcc_aperture_device'], [
+                        gen.Node('Reads' , ['lts__average_t_sector_srcunit_gcc_aperture_device'], [])
+                    ]),
+                    gen.Node('Peer Memory', ['lts__average_t_sector_srcunit_gcc_aperture_peer'], [
+                        gen.Node('Reads' , ['lts__average_t_sector_srcunit_gcc_aperture_peer'], [])
+                    ]),
+                    gen.Node('System Memory', ['lts__average_t_sector_srcunit_gcc_aperture_sysmem'], [
+                        gen.Node('Reads' , ['lts__average_t_sector_srcunit_gcc_aperture_sysmem'], [])
+                    ]),
+                ]),
+                gen.Node('Primitive Engine', ['lts__average_t_sector_srcunit_pe'], [
+                    gen.Node('DRAM' , ['lts__average_t_sector_srcunit_pe_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_pe_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_pe_aperture_device_op_write'], []),
+                    ]),
+                    gen.Node('Peer Memory', ['lts__average_t_sector_srcunit_pe_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_pe_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_pe_aperture_peer_op_write'], []),
+                    ]),
+                    gen.Node('System Memory', ['lts__average_t_sector_srcunit_pe_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_pe_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_pe_aperture_sysmem_op_write'], []),
+                    ]),
+                ]),
+                gen.Node('Raster', ['lts__average_t_sector_srcunit_raster'], [
+                    gen.Node('DRAM' , ['lts__average_t_sector_srcunit_raster_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_raster_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_raster_aperture_device_op_write'], []),
+                    ]),
+                    gen.Node('Peer Memory', ['lts__average_t_sector_srcunit_raster_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_raster_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_raster_aperture_peer_op_write'], []),
+                    ]),
+                    gen.Node('System Memory', ['lts__average_t_sector_srcunit_raster_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_raster_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_raster_aperture_sysmem_op_write'], []),
+                    ]),
+                ]),
+                gen.Node('ZROP', ['lts__average_t_sector_srcunit_zrop'], [
+                    gen.Node('DRAM' , ['lts__average_t_sector_srcunit_zrop_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_zrop_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_zrop_aperture_device_op_write'], []),
+                    ]),
+                    gen.Node('Peer Memory' , ['lts__average_t_sector_srcunit_zrop_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_zrop_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_zrop_aperture_peer_op_write'], []),
+                    ]),
+                    gen.Node('System Memory' , ['lts__average_t_sector_srcunit_zrop_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_zrop_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_zrop_aperture_sysmem_op_write'], []),
+                    ]),
+                ]),
+                gen.Node('CROP', ['lts__average_t_sector_srcunit_crop'], [
+                    gen.Node('DRAM' , ['lts__average_t_sector_srcunit_crop_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_crop_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_crop_aperture_device_op_write'], []),
+                    ]),
+                    gen.Node('Peer Memory' , ['lts__average_t_sector_srcunit_crop_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_crop_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_crop_aperture_peer_op_write'], []),
+                    ]),
+                    gen.Node('System Memory' , ['lts__average_t_sector_srcunit_crop_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_crop_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_crop_aperture_sysmem_op_write'], []),
+                    ]),
+                ]),
+            ]),
+            gen.Node('FBP Units', ['lts__average_t_sector_srcnode_fbp'], [
+                gen.Node('all FBP Units', ['lts__average_t_sector_srcnode_fbp'], [
+                    gen.Node('DRAM', ['lts__average_t_sector_srcnode_fbp_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcnode_fbp_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcnode_fbp_aperture_device_op_write'], []),
+                    ]),
+                    gen.Node('Peer Memory', ['lts__average_t_sector_srcnode_fbp_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcnode_fbp_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcnode_fbp_aperture_peer_op_write'], []),
+                    ]),
+                    gen.Node('System Memory', ['lts__average_t_sector_srcnode_fbp_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcnode_fbp_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcnode_fbp_aperture_sysmem_op_write'], []),
+                    ]),
+                ]),
+            ]),
+            gen.Node('HUB Units', [], [
+                gen.Node('all HUB Units', [], [
+                    gen.Node('DRAM' , ['lts__average_t_sector_srcnode_hub_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcnode_hub_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcnode_hub_aperture_device_op_write'], []),
+                    ]),
+                    gen.Node('Peer Memory', ['lts__average_t_sector_srcnode_hub_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcnode_hub_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcnode_hub_aperture_peer_op_write'], []),
+                    ]),
+                    gen.Node('System Memory', ['lts__average_t_sector_srcnode_hub_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcnode_hub_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcnode_hub_aperture_sysmem_op_write'], []),
+                    ]),
+                ]),
+            ]),
+        ]
+
+        gen.required_ratios = gen.get_required_ratios()
+
+class L2TrafficByMemoryApertureBreakdownGenerator(tables_common.L2TrafficBreakdownGenerator):
+    def __init__(gen, show_generic_workflow):
+        super().__init__(show_generic_workflow=show_generic_workflow)
+        gen.name = 'L2SectorTrafficBreakdownByMemoryAperture'
+        gen.table_id = 'L2-Sector-Traffic-By-Memory-Aperture'
+        gen.column_names = [
+            ('Memory Aperture', 'To Memory'),
+            ('Source Breakdown', 'From Source'),
+            ('Unit Breakdown', 'From Unit'),
+            ('Op', 'Op'),
+        ]
+        gen.workflow += ' This is an extended breakdown of <a href="#L2-Sector-Traffic-By-Memory-Aperture-Short">L2 Traffic by destination</a>. It decomposes L2 bandwidth to each destination, from each source unit, per operation.'
+
+        gen.nodes = [
+            gen.Node('DRAM' , ['lts__average_t_sector_aperture_device'], [
+                gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_device'], [
+                    gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_tex_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_tex_aperture_device_op_write'], []),
+                        gen.Node('Atomics',    ['lts__average_t_sector_srcunit_tex_aperture_device_op_atom'], []),
+                        gen.Node('Reductions', ['lts__average_t_sector_srcunit_tex_aperture_device_op_red'],  []),
+                    ]),
+                    gen.Node('L1.5 Constant Cache', ['lts__average_t_sector_srcunit_gcc'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_gcc'],  []),
+                    ]),
+                    gen.Node('Primitive Engine', ['lts__average_t_sector_srcunit_pe_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_pe_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_pe_aperture_device_op_write'], []),
+                    ]),
+                    gen.Node('Raster', ['lts__average_t_sector_srcunit_raster'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_raster_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_raster_aperture_device_op_write'], []),
+                    ]),
+                    gen.Node('ZROP', ['lts__average_t_sector_srcunit_zrop_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_zrop_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_zrop_aperture_device_op_write'], []),
+                    ]),
+                    gen.Node('CROP', ['lts__average_t_sector_srcunit_crop_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_crop_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_crop_aperture_device_op_write'], []),
+                    ]),
+                ]),
+                gen.Node('FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_device'], [
+                    gen.Node('all FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcnode_fbp_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcnode_fbp_aperture_device_op_write'], []),
+                    ]),
+                ]),
+                gen.Node('HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_device'], [
+                    gen.Node('all HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcnode_hub_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcnode_hub_aperture_device_op_write'], []),
+                    ]),
+                ]),
+            ]),
+            gen.Node('Peer Memory', ['lts__average_t_sector_aperture_peer'], [
+                gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_peer'], [
+                    gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_tex_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_tex_aperture_peer_op_write'], []),
+                        gen.Node('Atomics',    ['lts__average_t_sector_srcunit_tex_aperture_peer_op_atom'], []),
+                        gen.Node('Reductions', ['lts__average_t_sector_srcunit_tex_aperture_peer_op_red'],  []),
+                    ]),
+                    gen.Node('L1.5 Constant Cache', ['lts__average_t_sector_srcunit_gcc'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_gcc'],  []),
+                    ]),
+                    gen.Node('Primitive Engine', ['lts__average_t_sector_srcunit_pe_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_pe_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_pe_aperture_peer_op_write'], []),
+                    ]),
+                    gen.Node('Raster', ['lts__average_t_sector_srcunit_raster'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_raster_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_raster_aperture_peer_op_write'], []),
+                    ]),
+                    gen.Node('ZROP', ['lts__average_t_sector_srcunit_zrop_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_zrop_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_zrop_aperture_peer_op_write'], []),
+                    ]),
+                    gen.Node('CROP', ['lts__average_t_sector_srcunit_crop_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_crop_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_crop_aperture_peer_op_write'], []),
+                    ]),
+                ]),
+                gen.Node('FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_peer'], [
+                    gen.Node('all FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcnode_fbp_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcnode_fbp_aperture_peer_op_write'], []),
+                    ]),
+                ]),
+                gen.Node('HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_peer'], [
+                    gen.Node('all HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcnode_hub_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcnode_hub_aperture_peer_op_write'], []),
+                    ]),
+                ]),
+            ]),
+            gen.Node('System Memory', ['lts__average_t_sector_aperture_sysmem'], [
+               gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_sysmem'], [
+                    gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_write'], []),
+                        gen.Node('Atomics',    ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_atom'], []),
+                        gen.Node('Reductions', ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_red'],  []),
+                    ]),
+                    gen.Node('L1.5 Constant Cache', ['lts__average_t_sector_srcunit_gcc'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_gcc'],  []),
+                    ]),
+                    gen.Node('Primitive Engine', ['lts__average_t_sector_srcunit_pe_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_pe_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_pe_aperture_sysmem_op_write'], []),
+                    ]),
+                    gen.Node('Raster', ['lts__average_t_sector_srcunit_raster'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_raster_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_raster_aperture_sysmem_op_write'], []),
+                    ]),
+                    gen.Node('ZROP', ['lts__average_t_sector_srcunit_zrop_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_zrop_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_zrop_aperture_sysmem_op_write'], []),
+                    ]),
+                    gen.Node('CROP', ['lts__average_t_sector_srcunit_crop_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_crop_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_crop_aperture_sysmem_op_write'], []),
+                    ]),
+                ]),
+                gen.Node('FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_sysmem'], [
+                    gen.Node('all FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcnode_fbp_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcnode_fbp_aperture_sysmem_op_write'], []),
+                    ]),
+                ]),
+                gen.Node('HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_sysmem'], [
+                    gen.Node('all HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcnode_hub_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcnode_hub_aperture_sysmem_op_write'], []),
+                    ]),
+                ]),
+            ]),
+        ]
+
+        gen.required_ratios = gen.get_required_ratios()
+
+class L2TrafficByOperationBreakdownGenerator(tables_common.L2TrafficBreakdownGenerator):
+    def __init__(gen, show_generic_workflow):
+        super().__init__(show_generic_workflow=show_generic_workflow)
+        gen.name = 'L2SectorTrafficBreakdownByOperation'
+        gen.table_id = 'L2-Sector-Traffic-By-Operation'
+        gen.column_names = [
+            ('Op', 'Op'),
+            ('Memory Aperture', 'To Memory'),
+            ('Source Breakdown', 'From Source'),
+            ('Unit Breakdown', 'From Unit'),
+        ]
+        gen.workflow += ' This table decomposes L2 bandwidth per operation, to each destination, from each source unit.'
+
+        gen.nodes = [
+            gen.Node('Reads',      ['lts__average_t_sector_op_read'],  [
+                gen.Node('DRAM', ['lts__average_t_sector_aperture_device_op_read'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_device_op_read'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_device_op_read'], []),
+                        gen.Node('L1.5 Constant Cache', ['lts__average_t_sector_srcunit_gcc_aperture_device'], []),
+                        gen.Node('Primitive Engine', ['lts__average_t_sector_srcunit_pe_aperture_device_op_read'], []),
+                        gen.Node('Raster', ['lts__average_t_sector_srcunit_raster_aperture_device_op_read'], []),
+                        gen.Node('ZROP', ['lts__average_t_sector_srcunit_zrop_aperture_device_op_read'], []),
+                        gen.Node('CROP', ['lts__average_t_sector_srcunit_crop_aperture_device_op_read'], []),
+                    ]),
+                    gen.Node('FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_device_op_read'], [
+                        gen.Node('all FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_device_op_read'], []),
+                    ]),
+                    gen.Node('HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_device_op_read'], [
+                        gen.Node('all HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_device_op_read'], []),
+                    ]),
+                ]),
+                gen.Node('Peer Memory', ['lts__average_t_sector_aperture_peer_op_read'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_peer_op_read'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_peer_op_read'], []),
+                        gen.Node('L1.5 Constant Cache', ['lts__average_t_sector_srcunit_gcc_aperture_peer'], []),
+                        gen.Node('Primitive Engine', ['lts__average_t_sector_srcunit_pe_aperture_peer_op_read'], []),
+                        gen.Node('Raster', ['lts__average_t_sector_srcunit_raster_aperture_peer_op_read'], []),
+                        gen.Node('ZROP', ['lts__average_t_sector_srcunit_zrop_aperture_peer_op_read'], []),
+                        gen.Node('CROP', ['lts__average_t_sector_srcunit_crop_aperture_peer_op_read'], []),
+                    ]),
+                    gen.Node('FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_peer_op_read'], [
+                        gen.Node('all FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_peer_op_read'], []),
+                    ]),
+                    gen.Node('HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_peer_op_read'], [
+                        gen.Node('all HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_peer_op_read'], []),
+                    ]),
+                ]),
+                gen.Node('System Memory', ['lts__average_t_sector_aperture_sysmem_op_read'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_sysmem_op_read'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_read'], []),
+                        gen.Node('L1.5 Constant Cache', ['lts__average_t_sector_srcunit_gcc_aperture_sysmem'], []),
+                        gen.Node('Primitive Engine', ['lts__average_t_sector_srcunit_pe_aperture_sysmem_op_read'], []),
+                        gen.Node('Raster', ['lts__average_t_sector_srcunit_raster_aperture_sysmem_op_read'], []),
+                        gen.Node('ZROP', ['lts__average_t_sector_srcunit_zrop_aperture_sysmem_op_read'], []),
+                        gen.Node('CROP', ['lts__average_t_sector_srcunit_crop_aperture_sysmem_op_read'], []),
+                    ]),
+                    gen.Node('FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_sysmem_op_read'], [
+                        gen.Node('all FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_sysmem_op_read'], []),
+                    ]),
+                    gen.Node('HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_sysmem_op_read'], [
+                        gen.Node('all HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_sysmem_op_read'], []),
+                    ]),
+                ]),
+            ]),
+            gen.Node('Writes',     ['lts__average_t_sector_op_write'], [
+                gen.Node('DRAM', ['lts__average_t_sector_aperture_device_op_write'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_device_op_write'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_device_op_write'], []),
+                        gen.Node('Primitive Engine', ['lts__average_t_sector_srcunit_pe_aperture_device_op_write'], []),
+                        gen.Node('Raster', ['lts__average_t_sector_srcunit_raster_aperture_device_op_write'], []),
+                        gen.Node('ZROP', ['lts__average_t_sector_srcunit_zrop_aperture_device_op_write'], []),
+                        gen.Node('CROP', ['lts__average_t_sector_srcunit_crop_aperture_device_op_write'], []),
+                    ]),
+                    gen.Node('FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_device_op_write'], [
+                        gen.Node('all FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_device_op_write'], []),
+                    ]),
+                    gen.Node('HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_device_op_write'], [
+                        gen.Node('all HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_device_op_write'], []),
+                    ]),
+                ]),
+                gen.Node('Peer Memory', ['lts__average_t_sector_aperture_peer_op_write'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_peer_op_write'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_peer_op_write'], []),
+                        gen.Node('Primitive Engine', ['lts__average_t_sector_srcunit_pe_aperture_peer_op_write'], []),
+                        gen.Node('Raster', ['lts__average_t_sector_srcunit_raster_aperture_peer_op_write'], []),
+                        gen.Node('ZROP', ['lts__average_t_sector_srcunit_zrop_aperture_peer_op_write'], []),
+                        gen.Node('CROP', ['lts__average_t_sector_srcunit_crop_aperture_peer_op_write'], []),
+                    ]),
+                    gen.Node('FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_peer_op_write'], [
+                        gen.Node('all FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_peer_op_write'], []),
+                    ]),
+                    gen.Node('HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_peer_op_write'], [
+                        gen.Node('all HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_peer_op_write'], []),
+                    ]),
+                ]),
+                gen.Node('System Memory', ['lts__average_t_sector_aperture_sysmem_op_write'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_sysmem_op_write'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_write'], []),
+                        gen.Node('Primitive Engine', ['lts__average_t_sector_srcunit_pe_aperture_sysmem_op_write'], []),
+                        gen.Node('Raster', ['lts__average_t_sector_srcunit_raster_aperture_sysmem_op_write'], []),
+                        gen.Node('ZROP', ['lts__average_t_sector_srcunit_zrop_aperture_sysmem_op_write'], []),
+                        gen.Node('CROP', ['lts__average_t_sector_srcunit_crop_aperture_sysmem_op_write'], []),
+                    ]),
+                    gen.Node('FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_sysmem_op_write'], [
+                        gen.Node('all FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_sysmem_op_write'], []),
+                    ]),
+                    gen.Node('HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_sysmem_op_write'], [
+                        gen.Node('all HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_sysmem_op_write'], []),
+                    ]),
+                ]),
+            ]),
+            gen.Node('Atomics',    ['lts__average_t_sector_op_atom'], [
+                gen.Node('DRAM', ['lts__average_t_sector_aperture_device_op_atom'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_device_op_atom'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_device_op_atom'], []),
+                    ]),
+                ]),
+                gen.Node('Peer Memory', ['lts__average_t_sector_aperture_peer_op_atom'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_peer_op_atom'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_peer_op_atom'], []),
+                    ]),
+                ]),
+                gen.Node('System Memory', ['lts__average_t_sector_aperture_sysmem_op_atom'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_sysmem_op_atom'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_atom'], []),
+                    ]),
+                ]),
+            ]),
+            gen.Node('Reductions', ['lts__average_t_sector_op_red'], [
+                gen.Node('DRAM', ['lts__average_t_sector_aperture_device_op_red'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_device_op_red'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_device_op_red'], []),
+                    ]),
+                ]),
+                gen.Node('Peer Memory', ['lts__average_t_sector_aperture_peer_op_red'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_peer_op_red'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_peer_op_red'], []),
+                    ]),
+                ]),
+                gen.Node('System Memory', ['lts__average_t_sector_aperture_sysmem_op_red'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_sysmem_op_red'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_red'], []),
+                    ]),
+                ]),
+            ]),
+        ]
+
+        gen.required_ratios = gen.get_required_ratios()
+
+class RasterDataflowGenerator(tables_common.RasterDataflowGenerator):
+    def __init__(gen):
+        super().__init__()
+        gen.zrop_pixels_input = r'''getCounterValue('prop__prop2zrop_pixels_realtime', 'sum')'''
+        gen.crop_pixels_input = r'''getCounterValue('prop__prop2crop_pixels_realtime', 'sum')'''
+        gen.required_counters.extend([
+            'prop__prop2zrop_pixels_realtime',
+            'prop__prop2crop_pixels_realtime'
+        ])
+
+class SmResourceUsageGenerator(tables_common.SmResourceUsageGenerator):
+    def __init__(gen):
+        super().__init__()
+        # TODO: this is using single-pass counters; use ordinary counters when available
+        # NOTE: the gfx column is a hack, for rows that cannot separately measure VTG vs. PS
+        gen.rows = [
+            #       resource                  total                                                       gfx                                                                     vtg                                                                         ps                                                                          cs
+            gen.Row('Warps'                 , 'sm__warps_active'                                        , 'NotApplicable'                                                       , 'tpc__warps_active_shader_vtg'                                            , 'tpc__warps_active_shader_ps'                                             , 'tpc__warps_active_shader_cs'                                                 ),
+            gen.Row('Registers'             , 'tpc__sm_rf_registers_allocated'                          , 'NotApplicable'                                                       , 'tpc__sm_rf_registers_allocated_shader_vtg'                               , 'tpc__sm_rf_registers_allocated_shader_ps'                                , 'tpc__sm_rf_registers_allocated_shader_cs'                                    ),
+            gen.Row('Attr/ShMem'            , 'NotApplicable'                                           , 'NotApplicable'                                                       , 'tpc__l1tex_mem_shared_data_isbe_bytes_allocated'                         , 'tpc__l1tex_mem_shared_data_tram_bytes_allocated'                         , 'tpc__l1tex_mem_shared_data_compute_bytes_allocated'                          ),
+            gen.Row('CTAs'                  , 'NotApplicable'                                           , 'NotApplicable'                                                       , 'NotApplicable'                                                           , 'NotApplicable'                                                           , 'sm__ctas_active'                                                             ),
+        ]
+        gen.required_counters = [
+            'sm__warps_active',
+            'tpc__warps_active_shader_vtg',
+            'tpc__warps_active_shader_ps',
+            'tpc__warps_active_shader_cs',
+            'tpc__sm_rf_registers_allocated',
+            'tpc__sm_rf_registers_allocated_shader_vtg',
+            'tpc__sm_rf_registers_allocated_shader_ps',
+            'tpc__sm_rf_registers_allocated_shader_cs',
+            'tpc__l1tex_mem_shared_data_isbe_bytes_allocated',
+            'tpc__l1tex_mem_shared_data_tram_bytes_allocated',
+            'tpc__l1tex_mem_shared_data_compute_bytes_allocated',
+            'sm__ctas_active',
+        ]
+
+class RangesSummaryGenerator(tables_common.RangesSummaryGenerator):
+    def __init__(gen):
+        super().__init__()
+        gen.cols = [
+            gen.Col('Duration μs'   , "getCounterValue('gpu__time_duration', 'avg')"                                                                            , 'format_avg'  , 'ra'),
+            gen.Col('GR Active%'    , "getCounterPct('gr__cycles_active', 'avg')"                                                                               , 'format_pct'  , 'ra'),
+            gen.Col('3D?'           , "getCounterValue('fe__draw_count', 'sum') ? '&#x2713;' : ''"                                                              , ''            , 'ra'),
+            gen.Col('Comp?'         , "getCounterValue('gr__dispatch_count', 'sum') ? '&#x2713;' : ''"                                                          , ''            , 'ra'),
+            gen.Col('#WFI'          , "getCounterValue('fe__output_ops_cmd_go_idle', 'sum')"                                                                    , 'format_sum'  , 'ra'),
+            gen.Col('#Prims'        , "getCounterValue('pda__input_prims', 'sum')"                                                                              , 'format_sum'  , 'ra'),
+            gen.Col('#Pixels-Z'     , "getCounterValue('prop__prop2zrop_pixels_realtime', 'sum')"                                                               , 'format_sum'  , 'ra'),
+            gen.Col('#Pixels-C'     , "getCounterValue('prop__prop2crop_pixels_realtime', 'sum')"                                                               , 'format_sum'  , 'ra'),
+            gen.Col('SM%'           , "getThroughputPct('sm__throughput')"                                                                                      , 'format_pct'  , 'ra'),
+            gen.Col('L1TEX%'        , "getThroughputPct('l1tex__throughput')"                                                                                   , 'format_pct'  , 'ra'),
+            gen.Col('L2%'           , "getThroughputPct('lts__throughput')"                                                                                     , 'format_pct'  , 'ra'),
+            gen.Col('DRAM%'         , "getThroughputPct('dram__throughput')"                                                                                    , 'format_pct'  , 'ra'),
+            gen.Col('PCIe%'         , "getThroughputPct('pcie__throughput')"                                                                                    , 'format_pct'  , 'ra'),
+            gen.Col('PD%'           , "getThroughputPct('pda__throughput')"                                                                                     , 'format_pct'  , 'ra'),
+            gen.Col('PE%'           , "Math.max(getThroughputPct('vaf__throughput'), getThroughputPct('vpc__throughput'), getThroughputPct('pes__throughput'))" , 'format_pct'  , 'ra'),
+            gen.Col('RSTR%'         , "getThroughputPct('raster__throughput')"                                                                                  , 'format_pct'  , 'ra'),
+            gen.Col('PROP%'         , "getThroughputPct('prop__throughput')"                                                                                    , 'format_pct'  , 'ra'),
+            gen.Col('ZROP%'         , "getThroughputPct('zrop__throughput')"                                                                                    , 'format_pct'  , 'ra'),
+            gen.Col('CROP%'         , "getThroughputPct('crop__throughput')"                                                                                    , 'format_pct'  , 'ra'),
+        ]
+        gen.required_counters = [
+            'fe__draw_count',
+            'fe__output_ops_cmd_go_idle',
+            'gpu__time_duration',
+            'gr__cycles_active',
+            'gr__dispatch_count',
+            'pda__input_prims',
+            'prop__prop2crop_pixels_realtime',
+            'prop__prop2zrop_pixels_realtime',
+        ]
+        gen.required_ratios = []
+        gen.required_throughputs = [
+            'crop__throughput',
+            'dram__throughput',
+            'l1tex__throughput',
+            'lts__throughput',
+            'pcie__throughput',
+            'pda__throughput',
+            'pes__throughput',
+            'prop__throughput',
+            'raster__throughput',
+            'sm__throughput',
+            'vaf__throughput',
+            'vpc__throughput',
+            'zrop__throughput',
+        ]
--- a/ruins64k/tools/NvPerfUtility/build/ReportGenerator/pub/tables_common.py
+++ b/ruins64k/tools/NvPerfUtility/build/ReportGenerator/pub/tables_common.py
--- a/ruins64k/tools/NvPerfUtility/build/ReportGenerator/pub/turing/report_tu10x.py
+++ b/ruins64k/tools/NvPerfUtility/build/ReportGenerator/pub/turing/report_tu10x.py
@@ -0,0 +1,79 @@
+# Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from profiler_report_types import *
+import pub.tables_common as tables_common
+import pub.turing.tables_turing as tables_turing
+
+def get_per_range_report_definition():
+    sections = [
+        DataSection([
+            tables_turing.DevicePropertiesGenerator().make_data_table(),
+            tables_common.ClocksGenerator().make_data_table(),
+        ], inter_table_spacing=False),
+        DataSection([
+            tables_common.TopLevelStatsGenerator().make_data_table(),
+            tables_turing.TopThroughputsGenerator().make_data_table(),
+            tables_common.CacheHitRates().make_data_table(),
+        ], title='Overview Section'),
+        DataSection([
+            tables_common.MainMemoryGenerator().make_data_table(),
+            tables_turing.L2TrafficByMemoryApertureShortBreakdownGenerator(show_generic_workflow=True).make_data_table(),
+            tables_turing.L2TrafficBySrcBreakdownGenerator(show_generic_workflow=False).make_data_table(),
+            tables_common.L1TexThroughputsGenerator().make_data_table(),
+            tables_common.L1TexTrafficBreakdownGenerator().make_data_table(),
+        ], title='Memory Performance Section'),
+        DataSection([
+            tables_turing.SmThroughputsGenerator_tu10x().make_data_table(),
+            tables_turing.SmInstExecutedGenerator().make_data_table(),
+            tables_common.SmShaderExecutionGenerator().make_data_table(),
+            tables_turing.SmResourceUsageGenerator().make_data_table(),
+            tables_common.SmWarpLaunchStallsGenerator().make_data_table(),
+            tables_common.WarpIssueStallsGenerator().make_data_table(),
+        ], title='Shader Performance Section'),
+        DataSection([
+            tables_common.PrimitiveDataflowGenerator().make_data_table(),
+            tables_turing.RasterDataflowGenerator().make_data_table(),
+        ], title='3D Pipeline Section'),
+        DataSection([
+            tables_turing.L2TrafficByMemoryApertureBreakdownGenerator(show_generic_workflow=False).make_data_table(),
+            tables_turing.L2TrafficByOperationBreakdownGenerator(show_generic_workflow=False).make_data_table(),
+        ], title='Additional L2 Traffic Breakdowns Section'),
+        DataSection([
+            tables_common.AdditionalMetricsGenerator().make_data_table(),
+            tables_common.AllCountersGenerator().make_data_table(),
+            tables_common.AllRatiosGenerator().make_data_table(),
+            tables_common.AllThroughputsGenerator().make_data_table(),
+        ], title='Exhaustive Listings Section'),
+    ]
+    html = tables_common.generate_range_html_common(sections)
+    required_counters = get_required_counters(sections)
+    required_ratios = get_required_ratios(sections)
+    required_throughputs = get_required_throughputs(sections)
+    return ReportDefinition('PerRangeReport', html, required_counters, required_ratios, required_throughputs)
+
+def get_summary_report_definition():
+    sections = [
+        DataSection([
+            tables_common.CollectionInfoGenerator().make_data_table(),
+        ]),
+        DataSection([
+            tables_turing.RangesSummaryGenerator().make_data_table(),
+        ], title='Summary of Measured Ranges'),
+    ]
+    html = tables_common.generate_summary_html_common(sections)
+    required_counters = get_required_counters(sections)
+    required_ratios = get_required_ratios(sections)
+    required_throughputs = get_required_throughputs(sections)
+    return ReportDefinition('SummaryReport', html, required_counters, required_ratios, required_throughputs)
--- a/ruins64k/tools/NvPerfUtility/build/ReportGenerator/pub/turing/report_tu11x.py
+++ b/ruins64k/tools/NvPerfUtility/build/ReportGenerator/pub/turing/report_tu11x.py
@@ -0,0 +1,79 @@
+# Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from profiler_report_types import *
+import pub.tables_common as tables_common
+import pub.turing.tables_turing as tables_turing
+
+def get_per_range_report_definition():
+    sections = [
+        DataSection([
+            tables_turing.DevicePropertiesGenerator().make_data_table(),
+            tables_common.ClocksGenerator().make_data_table(),
+        ], inter_table_spacing=False),
+        DataSection([
+            tables_common.TopLevelStatsGenerator().make_data_table(),
+            tables_turing.TopThroughputsGenerator().make_data_table(),
+            tables_common.CacheHitRates().make_data_table(),
+        ], title='Overview Section'),
+        DataSection([
+            tables_common.MainMemoryGenerator().make_data_table(),
+            tables_turing.L2TrafficByMemoryApertureShortBreakdownGenerator(show_generic_workflow=True).make_data_table(),
+            tables_turing.L2TrafficBySrcBreakdownGenerator(show_generic_workflow=False).make_data_table(),
+            tables_common.L1TexThroughputsGenerator().make_data_table(),
+            tables_common.L1TexTrafficBreakdownGenerator().make_data_table(),
+        ], title='Memory Performance Section'),
+        DataSection([
+            tables_turing.SmThroughputsGenerator_tu11x().make_data_table(),
+            tables_turing.SmInstExecutedGenerator().make_data_table(),
+            tables_common.SmShaderExecutionGenerator().make_data_table(),
+            tables_turing.SmResourceUsageGenerator().make_data_table(),
+            tables_common.SmWarpLaunchStallsGenerator().make_data_table(),
+            tables_common.WarpIssueStallsGenerator().make_data_table(),
+        ], title='Shader Performance Section'),
+        DataSection([
+            tables_common.PrimitiveDataflowGenerator().make_data_table(),
+            tables_turing.RasterDataflowGenerator().make_data_table(),
+        ], title='3D Pipeline Section'),
+        DataSection([
+            tables_turing.L2TrafficByMemoryApertureBreakdownGenerator(show_generic_workflow=False).make_data_table(),
+            tables_turing.L2TrafficByOperationBreakdownGenerator(show_generic_workflow=False).make_data_table(),
+        ], title='Additional L2 Traffic Breakdown'),
+        DataSection([
+            tables_common.AdditionalMetricsGenerator().make_data_table(),
+            tables_common.AllCountersGenerator().make_data_table(),
+            tables_common.AllRatiosGenerator().make_data_table(),
+            tables_common.AllThroughputsGenerator().make_data_table(),
+        ], title='Exhaustive Listings Section'),
+    ]
+    html = tables_common.generate_range_html_common(sections)
+    required_counters = get_required_counters(sections)
+    required_ratios = get_required_ratios(sections)
+    required_throughputs = get_required_throughputs(sections)
+    return ReportDefinition('PerRangeReport', html, required_counters, required_ratios, required_throughputs)
+
+def get_summary_report_definition():
+    sections = [
+        DataSection([
+            tables_common.CollectionInfoGenerator().make_data_table(),
+        ]),
+        DataSection([
+            tables_turing.RangesSummaryGenerator().make_data_table(),
+        ], title='Summary of Measured Ranges'),
+    ]
+    html = tables_common.generate_summary_html_common(sections)
+    required_counters = get_required_counters(sections)
+    required_ratios = get_required_ratios(sections)
+    required_throughputs = get_required_throughputs(sections)
+    return ReportDefinition('SummaryReport', html, required_counters, required_ratios, required_throughputs)
--- a/ruins64k/tools/NvPerfUtility/build/ReportGenerator/pub/turing/tables_turing.py
+++ b/ruins64k/tools/NvPerfUtility/build/ReportGenerator/pub/turing/tables_turing.py
@@ -0,0 +1,610 @@
+# Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from profiler_report_types import *
+import pub.tables_common as tables_common
+
+class DevicePropertiesGenerator(tables_common.DevicePropertiesGenerator):
+    def __init__(gen):
+        super().__init__()
+        gen.l2cacheSizePerLts = 128
+
+class TopThroughputsGenerator(tables_common.TopThroughputsGenerator):
+    def __init__(gen):
+        super().__init__()
+        gen.rows += [
+            gen.Row('Shader'      , '<a href="#SM-Instruction-Throughput">SM (Shader Cores)</a>'    , "getThroughputPct('sm__throughput')"),
+            gen.Row('Memory'      , '<a href="#L1TEX-Throughput">L1TEX Cache</a>'                   , "getThroughputPct('l1tex__throughput')"),
+            gen.Row('Memory'      , '<a href="#L2-Sector-Traffic">L2 Cache</a>'                     , "getThroughputPct('lts__throughput')"),
+            gen.Row('Memory'      , '<a href="#Main-Memory-Throughput">DRAM</a>'                    , "getThroughputPct('dram__throughput')"),
+            gen.Row('Memory'      , '<a href="#Main-Memory-Throughput">PCIe</a>'                    , "getThroughputPct('pcie__throughput')"),
+            gen.Row('World Pipe'  , '<a href="#Primitive-Data-Flow">PDA Index Fetch</a>'            , "getThroughputPct('pda__throughput')"),
+            gen.Row('World Pipe'  , '<a href="#Primitive-Data-Flow">Vertex Attr. Fetch</a>'         , "getThroughputPct('vaf__throughput')"),
+            gen.Row('World Pipe'  , '<a href="#Primitive-Data-Flow">Primitive Engine</a>'           , "getThroughputPct('pes__throughput')"),
+            gen.Row('Screen Pipe' , '<a href="#Raster-Data-Flow">RASTER</a>'                        , "getThroughputPct('raster__throughput')"),
+            gen.Row('Screen Pipe' , '<a href="#Raster-Data-Flow">PROP (Pre-ROP)</a>'                , "getThroughputPct('prop__throughput')"),
+            gen.Row('Screen Pipe' , '<a href="#Raster-Data-Flow">ZROP (Depth-Test)</a>'             , "getThroughputPct('zrop__throughput')"),
+            gen.Row('Screen Pipe' , '<a href="#Raster-Data-Flow">CROP (Color Blend)</a>'            , "getThroughputPct('crop__throughput')"),
+        ]
+        gen.required_throughputs += [
+            'crop__throughput',
+            'dram__throughput',
+            'l1tex__throughput',
+            'lts__throughput',
+            'pcie__throughput',
+            'pda__throughput',
+            'pes__throughput',
+            'prop__throughput',
+            'raster__throughput',
+            'sm__throughput',
+            'vaf__throughput',
+            'zrop__throughput',
+        ]
+
+class SmThroughputsGenerator_tu10x(tables_common.SmThroughputsGenerator):
+    def __init__(gen):
+        super().__init__()
+        gen.pipes += [
+            gen.Pipe('adu'          , 'Computed branches and indexed constants'),
+            gen.Pipe('alu'          , 'INT32 except multiply; FP32 comparison'),
+            gen.Pipe('cbu'          , 'Divergent branches and control flow'),
+            gen.Pipe('fma'          , 'FP32 mul/add and INT32 multiply'),
+            gen.Pipe('fp16'         , 'FP16 mul/add'),
+            gen.Pipe('fp64'         , 'FP64 mul/add'),
+            gen.Pipe('ipa'          , 'Pixel shader attribute interpolation'),
+            gen.Pipe('lsu'          , 'Global, local, shared memory, and misc'),
+            gen.Pipe('shared'       , 'Shared Pipe Dispatch (FP16,Tensor)', 'sm__pipe_shared_cycles_active', hasInstExecuted=False),
+            gen.Pipe('tensor'       , 'Tensor matrix multiply (FP16, INT8/4/1)'),
+            gen.Pipe('tex'          , 'Texture and surface memory'),
+            gen.Pipe('uniform'      , 'Warp-level scalar operations'),
+            gen.Pipe('xu'           , 'Transcendentals and float/int conversion'),
+        ]
+        for pipe in gen.pipes:
+            gen.required_counters.extend(pipe.get_counter_names())
+
+class SmThroughputsGenerator_tu11x(tables_common.SmThroughputsGenerator):
+    def __init__(gen):
+        super().__init__()
+        gen.pipes = []
+        gen.pipes += [
+            gen.Pipe('adu'          , 'Computed branches and indexed constants'),
+            gen.Pipe('alu'          , 'INT32 except multiply; FP32 comparison'),
+            gen.Pipe('cbu'          , 'Divergent branches and control flow'),
+            gen.Pipe('fma'          , 'FP32 mul/add and INT32 multiply'),
+            gen.Pipe('fp16'         , 'FP16 mul/add'),
+            gen.Pipe('fp64'         , 'FP64 mul/add'),
+            gen.Pipe('ipa'          , 'Pixel shader attribute interpolation'),
+            gen.Pipe('lsu'          , 'Global, local, shared memory, and misc'),
+            gen.Pipe('tensor'       , 'Tensor matrix multiply (FP16, INT8/4/1)'),
+            gen.Pipe('tex'          , 'Texture and surface memory'),
+            gen.Pipe('uniform'      , 'Warp-level scalar operations'),
+            gen.Pipe('xu'           , 'Transcendentals and float/int conversion'),
+        ]
+        for pipe in gen.pipes:
+            gen.required_counters.extend(pipe.get_counter_names())
+
+class SmInstExecutedGenerator(tables_common.SmInstExecutedGenerator):
+    def __init__(gen):
+        super().__init__()
+        gen.pipes = [
+            gen.Pipe('total'        , 'All instructions', True),
+            gen.Pipe('adu'          , 'Computed branches and indexed constants'),
+            gen.Pipe('alu'          , 'INT32 except multiply; FP32 comparison', True),
+            gen.Pipe('cbu'          , 'Divergent branches and control flow'),
+            gen.Pipe('fma'          , 'FP32 mul/add and INT32 multiply', True),
+            gen.Pipe('fp16'         , 'FP16 mul/add', True),
+            gen.Pipe('fp64'         , 'FP64 mul/add', True),
+            gen.Pipe('ipa'          , 'Pixel shader attribute interpolation', True),
+            gen.Pipe('lsu'          , 'Global, local, shared memory, and misc', True),
+            gen.Pipe('tensor'       , 'Tensor matrix multiply (FP16, INT8/4/1)'),
+            gen.Pipe('tex'          , 'Texture and surface memory'),
+            gen.Pipe('uniform'      , 'Warp-level scalar operations'),
+            gen.Pipe('xu'           , 'Transcendentals and float/int conversion', True),
+        ]
+        for pipe in gen.pipes:
+            gen.required_counters.extend(pipe.get_counter_names())
+
+class L2TrafficByMemoryApertureShortBreakdownGenerator(tables_common.L2TrafficBreakdownGenerator):
+    def __init__(gen, show_generic_workflow):
+        super().__init__(show_generic_workflow=show_generic_workflow)
+        gen.name = 'L2SectorTrafficBreakdownByMemoryApertureShort'
+        gen.table_id = 'L2-Sector-Traffic-By-Memory-Aperture-Short'
+        gen.column_names = [
+            ('Memory Aperture', 'To Memory'),
+            ('Op', 'Op'),
+        ]
+        gen.workflow += ' This table decomposes L2 bandwidth to each destination, per operation. A <a href="#L2-Sector-Traffic-By-Memory-Aperture">more detailed version of this table</a> can be found below.'
+
+        gen.nodes = [
+            gen.Node('DRAM' , ['lts__average_t_sector_aperture_device'], [
+                gen.Node('Reads',      ['lts__average_t_sector_aperture_device_op_read'],  []),
+                gen.Node('Writes',     ['lts__average_t_sector_aperture_device_op_write'], []),
+                gen.Node('Atomics',    ['lts__average_t_sector_aperture_device_op_atom'], []),
+                gen.Node('Reductions', ['lts__average_t_sector_aperture_device_op_red'],  []),
+            ]),
+            gen.Node('System Memory', ['lts__average_t_sector_aperture_sysmem'], [
+                gen.Node('Reads',      ['lts__average_t_sector_aperture_sysmem_op_read'],  []),
+                gen.Node('Writes',     ['lts__average_t_sector_aperture_sysmem_op_write'], []),
+                gen.Node('Atomics',    ['lts__average_t_sector_aperture_sysmem_op_atom'], []),
+                gen.Node('Reductions', ['lts__average_t_sector_aperture_sysmem_op_red'],  []),
+            ]),
+            gen.Node('Peer Memory', ['lts__average_t_sector_aperture_peer'], [
+                gen.Node('Reads',      ['lts__average_t_sector_aperture_peer_op_read'],  []),
+                gen.Node('Writes',     ['lts__average_t_sector_aperture_peer_op_write'], []),
+                gen.Node('Atomics',    ['lts__average_t_sector_aperture_peer_op_atom'], []),
+                gen.Node('Reductions', ['lts__average_t_sector_aperture_peer_op_red'],  []),
+            ]),
+        ]
+
+        gen.required_ratios = gen.get_required_ratios()
+
+class L2TrafficBySrcBreakdownGenerator(tables_common.L2TrafficBreakdownGenerator):
+    def __init__(gen, show_generic_workflow):
+        super().__init__(show_generic_workflow=show_generic_workflow)
+        gen.name = 'L2SectorTrafficBreakdownBySource'
+        gen.table_id = 'L2-Sector-Traffic-By-Source'
+        gen.column_names = [
+            ('Source Breakdown', 'From Source'),
+            ('Unit Breakdown', 'From Unit'),
+            ('Memory Aperture', 'To Memory'),
+            ('Op', 'Op'),
+        ]
+        gen.workflow += ' This table decomposes L2 bandwidth from each source unit, to each destination, per operation. See also: these tables that prioritize <a href="#L2-Sector-Traffic-By-Memory-Aperture">destination Memory Aperture</a> and <a href="#L2-Sector-Traffic-By-Operation">Operation</a>.'
+
+        gen.nodes = [
+            gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc'], [
+                gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex'], [
+                    gen.Node('DRAM' , ['lts__average_t_sector_srcunit_tex_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_tex_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_tex_aperture_device_op_write'], []),
+                        gen.Node('Atomics',    ['lts__average_t_sector_srcunit_tex_aperture_device_op_atom'], []),
+                        gen.Node('Reductions', ['lts__average_t_sector_srcunit_tex_aperture_device_op_red'],  []),
+                    ]),
+                    gen.Node('Peer Memory', ['lts__average_t_sector_srcunit_tex_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_tex_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_tex_aperture_peer_op_write'], []),
+                        gen.Node('Atomics',    ['lts__average_t_sector_srcunit_tex_aperture_peer_op_atom'], []),
+                        gen.Node('Reductions', ['lts__average_t_sector_srcunit_tex_aperture_peer_op_red'],  []),
+                    ]),
+                    gen.Node('System Memory', ['lts__average_t_sector_srcunit_tex_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_write'], []),
+                        gen.Node('Atomics',    ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_atom'], []),
+                        gen.Node('Reductions', ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_red'],  []),
+                    ]),
+                ]),
+                gen.Node('Primitive Engine', ['lts__average_t_sector_srcunit_pe'], [
+                    gen.Node('DRAM' , ['lts__average_t_sector_srcunit_pe_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_pe_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_pe_aperture_device_op_write'], []),
+                    ]),
+                    gen.Node('Peer Memory', ['lts__average_t_sector_srcunit_pe_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_pe_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_pe_aperture_peer_op_write'], []),
+                    ]),
+                    gen.Node('System Memory', ['lts__average_t_sector_srcunit_pe_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_pe_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_pe_aperture_sysmem_op_write'], []),
+                    ]),
+                ]),
+                gen.Node('other GPC units', ['lts__average_t_sector_srcunit_gpcother'], [
+                    gen.Node('DRAM' , ['lts__average_t_sector_srcunit_gpcother_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_gpcother_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_gpcother_aperture_device_op_write'], []),
+                    ]),
+                    gen.Node('Peer Memory', ['lts__average_t_sector_srcunit_gpcother_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_gpcother_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_gpcother_aperture_peer_op_write'], []),
+                    ]),
+                    gen.Node('System Memory', ['lts__average_t_sector_srcunit_gpcother_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_gpcother_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_gpcother_aperture_sysmem_op_write'], []),
+                    ]),
+                ]),
+            ]),
+            gen.Node('FBP Units', ['lts__average_t_sector_srcnode_fbp'], [
+                gen.Node('ZROP', ['lts__average_t_sector_srcunit_zrop'], [
+                    gen.Node('DRAM' , ['lts__average_t_sector_srcunit_zrop_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_zrop_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_zrop_aperture_device_op_write'], []),
+                    ]),
+                    gen.Node('Peer Memory' , ['lts__average_t_sector_srcunit_zrop_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_zrop_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_zrop_aperture_peer_op_write'], []),
+                    ]),
+                    gen.Node('System Memory' , ['lts__average_t_sector_srcunit_zrop_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_zrop_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_zrop_aperture_sysmem_op_write'], []),
+                    ]),
+                ]),
+                gen.Node('CROP', ['lts__average_t_sector_srcunit_crop'], [
+                    gen.Node('DRAM' , ['lts__average_t_sector_srcunit_crop_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_crop_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_crop_aperture_device_op_write'], []),
+                    ]),
+                    gen.Node('Peer Memory' , ['lts__average_t_sector_srcunit_crop_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_crop_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_crop_aperture_peer_op_write'], []),
+                    ]),
+                    gen.Node('System Memory' , ['lts__average_t_sector_srcunit_crop_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_crop_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_crop_aperture_sysmem_op_write'], []),
+                    ]),
+                ]),
+            ]),
+            gen.Node('HUB Units', [], [ # lts__average_t_sector_srcnode_fbp is buggy, use sum of children instead
+                gen.Node('all HUB Units', [], [ # lts__average_t_sector_srcnode_fbp is buggy, use sum of children instead
+                    gen.Node('DRAM' , ['lts__average_t_sector_srcnode_hub_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcnode_hub_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcnode_hub_aperture_device_op_write'], []),
+                    ]),
+                    gen.Node('Peer Memory', ['lts__average_t_sector_srcnode_hub_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcnode_hub_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcnode_hub_aperture_peer_op_write'], []),
+                    ]),
+                    gen.Node('System Memory', ['lts__average_t_sector_srcnode_hub_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcnode_hub_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcnode_hub_aperture_sysmem_op_write'], []),
+                    ]),
+                ]),
+            ]),
+        ]
+
+        gen.required_ratios = gen.get_required_ratios()
+
+class L2TrafficByMemoryApertureBreakdownGenerator(tables_common.L2TrafficBreakdownGenerator):
+    def __init__(gen, show_generic_workflow):
+        super().__init__(show_generic_workflow=show_generic_workflow)
+        gen.name = 'L2SectorTrafficBreakdownByMemoryAperture'
+        gen.table_id = 'L2-Sector-Traffic-By-Memory-Aperture'
+        gen.column_names = [
+            ('Memory Aperture', 'To Memory'),
+            ('Source Breakdown', 'From Source'),
+            ('Unit Breakdown', 'From Unit'),
+            ('Op', 'Op'),
+        ]
+        gen.workflow += ' This is an extended breakdown of <a href="#L2-Sector-Traffic-By-Memory-Aperture-Short">L2 Traffic by destination</a>. It decomposes L2 bandwidth to each destination, from each source unit, per operation.'
+
+        gen.nodes = [
+            gen.Node('DRAM' , ['lts__average_t_sector_aperture_device'], [
+                gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_device'], [
+                    gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_tex_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_tex_aperture_device_op_write'], []),
+                        gen.Node('Atomics',    ['lts__average_t_sector_srcunit_tex_aperture_device_op_atom'], []),
+                        gen.Node('Reductions', ['lts__average_t_sector_srcunit_tex_aperture_device_op_red'],  []),
+                    ]),
+                    gen.Node('Primitive Engine', ['lts__average_t_sector_srcunit_pe_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_pe_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_pe_aperture_device_op_write'], []),
+                    ]),
+                    gen.Node('other GPC units', ['lts__average_t_sector_srcunit_gpcother_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_gpcother_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_gpcother_aperture_device_op_write'], []),
+                    ]),
+                ]),
+                gen.Node('FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_device'], [
+                    gen.Node('ZROP', ['lts__average_t_sector_srcunit_zrop_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_zrop_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_zrop_aperture_device_op_write'], []),
+                    ]),
+                    gen.Node('CROP', ['lts__average_t_sector_srcunit_crop_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_crop_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_crop_aperture_device_op_write'], []),
+                    ]),
+                ]),
+                gen.Node('HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_device'], [
+                    gen.Node('all HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcnode_hub_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcnode_hub_aperture_device_op_write'], []),
+                    ]),
+                ]),
+            ]),
+            gen.Node('Peer Memory', ['lts__average_t_sector_aperture_peer'], [
+                gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_peer'], [
+                    gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_tex_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_tex_aperture_peer_op_write'], []),
+                        gen.Node('Atomics',    ['lts__average_t_sector_srcunit_tex_aperture_peer_op_atom'], []),
+                        gen.Node('Reductions', ['lts__average_t_sector_srcunit_tex_aperture_peer_op_red'],  []),
+                    ]),
+                    gen.Node('Primitive Engine', ['lts__average_t_sector_srcunit_pe_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_pe_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_pe_aperture_peer_op_write'], []),
+                    ]),
+                    gen.Node('other GPC units', ['lts__average_t_sector_srcunit_gpcother_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_gpcother_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_gpcother_aperture_peer_op_write'], []),
+                    ]),
+                ]),
+                gen.Node('FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_peer'], [
+                    gen.Node('ZROP', ['lts__average_t_sector_srcunit_zrop_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_zrop_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_zrop_aperture_peer_op_write'], []),
+                    ]),
+                    gen.Node('CROP', ['lts__average_t_sector_srcunit_crop_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_crop_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_crop_aperture_peer_op_write'], []),
+                    ]),
+                ]),
+                gen.Node('HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_peer'], [
+                    gen.Node('all HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcnode_hub_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcnode_hub_aperture_peer_op_write'], []),
+                    ]),
+                ]),
+            ]),
+            gen.Node('System Memory', ['lts__average_t_sector_aperture_sysmem'], [
+                gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_sysmem'], [
+                    gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_write'], []),
+                        gen.Node('Atomics',    ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_atom'], []),
+                        gen.Node('Reductions', ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_red'],  []),
+                    ]),
+                    gen.Node('Primitive Engine', ['lts__average_t_sector_srcunit_pe_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_pe_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_pe_aperture_sysmem_op_write'], []),
+                    ]),
+                    gen.Node('other GPC units', ['lts__average_t_sector_srcunit_gpcother_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_gpcother_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_gpcother_aperture_sysmem_op_write'], []),
+                    ]),
+                ]),
+                gen.Node('FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_sysmem'], [
+                    gen.Node('ZROP', ['lts__average_t_sector_srcunit_zrop_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_zrop_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_zrop_aperture_sysmem_op_write'], []),
+                    ]),
+                    gen.Node('CROP', ['lts__average_t_sector_srcunit_crop_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_crop_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_crop_aperture_sysmem_op_write'], []),
+                    ]),
+                ]),
+                gen.Node('HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_sysmem'], [
+                    gen.Node('all HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcnode_hub_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcnode_hub_aperture_sysmem_op_write'], []),
+                    ]),
+                ]),
+            ]),
+        ]
+
+        gen.required_ratios = gen.get_required_ratios()
+
+class L2TrafficByOperationBreakdownGenerator(tables_common.L2TrafficBreakdownGenerator):
+    def __init__(gen, show_generic_workflow):
+        super().__init__(show_generic_workflow=show_generic_workflow)
+        gen.name = 'L2SectorTrafficBreakdownByOperation'
+        gen.table_id = 'L2-Sector-Traffic-By-Operation'
+        gen.column_names = [
+            ('Op', 'Op'),
+            ('Memory Aperture', 'To Memory'),
+            ('Source Breakdown', 'From Source'),
+            ('Unit Breakdown', 'From Unit'),
+        ]
+        gen.workflow += ' This table decomposes L2 bandwidth per operation, to each destination, from each source unit.'
+
+        gen.nodes = [
+            gen.Node('Reads',      ['lts__average_t_sector_op_read'],  [
+                gen.Node('DRAM', ['lts__average_t_sector_aperture_device_op_read'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_device_op_read'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_device_op_read'], []),
+                        gen.Node('Primitive Engine', ['lts__average_t_sector_srcunit_pe_aperture_device_op_read'], []),
+                        gen.Node('other GPC units', ['lts__average_t_sector_srcunit_gpcother_aperture_device_op_read'], []),
+                    ]),
+                    gen.Node('FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_device_op_read'], [
+                        gen.Node('ZROP', ['lts__average_t_sector_srcunit_zrop_aperture_device_op_read'], []),
+                        gen.Node('CROP', ['lts__average_t_sector_srcunit_crop_aperture_device_op_read'], []),
+                    ]),
+                    gen.Node('HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_device_op_read'], [
+                        gen.Node('all HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_device_op_read'], []),
+                    ]),
+                ]),
+                gen.Node('Peer Memory', ['lts__average_t_sector_aperture_peer_op_read'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_peer_op_read'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_peer_op_read'], []),
+                        gen.Node('Primitive Engine', ['lts__average_t_sector_srcunit_pe_aperture_peer_op_read'], []),
+                        gen.Node('other GPC units', ['lts__average_t_sector_srcunit_gpcother_aperture_peer_op_read'], []),
+                    ]),
+                    gen.Node('FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_peer_op_read'], [
+                        gen.Node('ZROP', ['lts__average_t_sector_srcunit_zrop_aperture_peer_op_read'], []),
+                        gen.Node('CROP', ['lts__average_t_sector_srcunit_crop_aperture_peer_op_read'], []),
+                    ]),
+                    gen.Node('HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_peer_op_read'], [
+                        gen.Node('all HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_peer_op_read'], []),
+                    ]),
+                ]),
+                gen.Node('System Memory', ['lts__average_t_sector_aperture_sysmem_op_read'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_sysmem_op_read'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_read'], []),
+                        gen.Node('Primitive Engine', ['lts__average_t_sector_srcunit_pe_aperture_sysmem_op_read'], []),
+                        gen.Node('other GPC units', ['lts__average_t_sector_srcunit_gpcother_aperture_sysmem_op_read'], []),
+                    ]),
+                    gen.Node('FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_sysmem_op_read'], [
+                        gen.Node('ZROP', ['lts__average_t_sector_srcunit_zrop_aperture_sysmem_op_read'], []),
+                        gen.Node('CROP', ['lts__average_t_sector_srcunit_crop_aperture_sysmem_op_read'], []),
+                    ]),
+                    gen.Node('HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_sysmem_op_read'], [
+                        gen.Node('all HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_sysmem_op_read'], []),
+                    ]),
+                ]),
+            ]),
+            gen.Node('Writes',     ['lts__average_t_sector_op_write'], [
+                gen.Node('DRAM', ['lts__average_t_sector_aperture_device_op_write'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_device_op_write'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_device_op_write'], []),
+                        gen.Node('Primitive Engine', ['lts__average_t_sector_srcunit_pe_aperture_device_op_write'], []),
+                        gen.Node('other GPC units', ['lts__average_t_sector_srcunit_gpcother_aperture_device_op_write'], []),
+                    ]),
+                    gen.Node('FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_device_op_write'], [
+                        gen.Node('ZROP', ['lts__average_t_sector_srcunit_zrop_aperture_device_op_write'], []),
+                        gen.Node('CROP', ['lts__average_t_sector_srcunit_crop_aperture_device_op_write'], []),
+                    ]),
+                    gen.Node('HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_device_op_write'], [
+                        gen.Node('all HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_device_op_write'], []),
+                    ]),
+                ]),
+                gen.Node('Peer Memory', ['lts__average_t_sector_aperture_peer_op_write'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_peer_op_write'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_peer_op_write'], []),
+                        gen.Node('Primitive Engine', ['lts__average_t_sector_srcunit_pe_aperture_peer_op_write'], []),
+                        gen.Node('other GPC units', ['lts__average_t_sector_srcunit_gpcother_aperture_peer_op_write'], []),
+                    ]),
+                    gen.Node('FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_peer_op_write'], [
+                        gen.Node('ZROP', ['lts__average_t_sector_srcunit_zrop_aperture_peer_op_write'], []),
+                        gen.Node('CROP', ['lts__average_t_sector_srcunit_crop_aperture_peer_op_write'], []),
+                    ]),
+                    gen.Node('HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_peer_op_write'], [
+                        gen.Node('all HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_peer_op_write'], []),
+                    ]),
+                ]),
+                gen.Node('System Memory', ['lts__average_t_sector_aperture_sysmem_op_write'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_sysmem_op_write'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_write'], []),
+                        gen.Node('Primitive Engine', ['lts__average_t_sector_srcunit_pe_aperture_sysmem_op_write'], []),
+                        gen.Node('other GPC units', ['lts__average_t_sector_srcunit_gpcother_aperture_sysmem_op_write'], []),
+                    ]),
+                    gen.Node('FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_sysmem_op_write'], [
+                        gen.Node('ZROP', ['lts__average_t_sector_srcunit_zrop_aperture_sysmem_op_write'], []),
+                        gen.Node('CROP', ['lts__average_t_sector_srcunit_crop_aperture_sysmem_op_write'], []),
+                    ]),
+                    gen.Node('HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_sysmem_op_write'], [
+                        gen.Node('all HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_sysmem_op_write'], []),
+                    ]),
+                ]),
+            ]),
+            gen.Node('Atomics',    ['lts__average_t_sector_op_atom'], [
+                gen.Node('DRAM', ['lts__average_t_sector_aperture_device_op_atom'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_device_op_atom'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_device_op_atom'], []),
+                    ]),
+                ]),
+                gen.Node('Peer Memory', ['lts__average_t_sector_aperture_peer_op_atom'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_peer_op_atom'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_peer_op_atom'], []),
+                    ]),
+                ]),
+                gen.Node('System Memory', ['lts__average_t_sector_aperture_sysmem_op_atom'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_sysmem_op_atom'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_atom'], []),
+                    ]),
+                ]),
+            ]),
+            gen.Node('Reductions', ['lts__average_t_sector_op_red'], [
+                gen.Node('DRAM', ['lts__average_t_sector_aperture_device_op_red'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_device_op_red'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_device_op_red'], []),
+                    ]),
+                ]),
+                gen.Node('Peer Memory', ['lts__average_t_sector_aperture_peer_op_red'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_peer_op_red'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_peer_op_red'], []),
+                    ]),
+                ]),
+                gen.Node('System Memory', ['lts__average_t_sector_aperture_sysmem_op_red'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_sysmem_op_red'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_red'], []),
+                    ]),
+                ]),
+            ]),
+        ]
+
+        gen.required_ratios = gen.get_required_ratios()
+
+class RasterDataflowGenerator(tables_common.RasterDataflowGenerator):
+    def __init__(gen):
+        super().__init__()
+        gen.zrop_pixels_input = r'''getCounterValue('prop__prop2xbar_zrop_pixels_realtime', 'sum')'''
+        gen.crop_pixels_input = r'''getCounterValue('prop__prop2xbar_crop_pixels_realtime', 'sum')'''
+        gen.required_counters.extend([
+            'prop__prop2xbar_zrop_pixels_realtime',
+            'prop__prop2xbar_crop_pixels_realtime'
+        ])
+
+class SmResourceUsageGenerator(tables_common.SmResourceUsageGenerator):
+    def __init__(gen):
+        super().__init__()
+        gen.rows = [
+            #       resource                  total                                           gfx                                             vtg                                                 ps                                                  cs
+            gen.Row('Warps'                 , 'sm__warps_active'                            , 'NotApplicable'                               , 'tpc__warps_active_shader_vtg_realtime'           , 'tpc__warps_active_shader_ps_realtime'            , 'tpc__warps_active_shader_cs_realtime'                ),
+            gen.Row('Registers'             , 'tpc__sm_rf_registers_allocated'              , 'NotApplicable'                               , 'tpc__sm_rf_registers_allocated_shader_vtg'       , 'tpc__sm_rf_registers_allocated_shader_ps'        , 'tpc__sm_rf_registers_allocated_shader_cs'            ),
+            gen.Row('Attr/ShMem'            , 'NotApplicable'                               , 'NotApplicable'                               , 'tpc__l1tex_mem_shared_data_isbe_bytes_allocated' , 'tpc__l1tex_mem_shared_data_tram_bytes_allocated' , 'tpc__l1tex_mem_shared_data_compute_bytes_allocated'  ),
+            gen.Row('CTAs'                  , 'NotApplicable'                               , 'NotApplicable'                               , 'NotApplicable'                                   , 'NotApplicable'                                   , 'sm__ctas_active'                                     ),
+        ]
+        gen.required_counters = [
+            'sm__ctas_active',
+            'sm__warps_active',
+            'tpc__warps_active_shader_cs_realtime',
+            'tpc__warps_active_shader_ps_realtime',
+            'tpc__warps_active_shader_vtg_realtime',
+            'tpc__sm_rf_registers_allocated',
+            'tpc__sm_rf_registers_allocated_shader_vtg',
+            'tpc__sm_rf_registers_allocated_shader_ps',
+            'tpc__sm_rf_registers_allocated_shader_cs',
+            'tpc__l1tex_mem_shared_data_isbe_bytes_allocated',
+            'tpc__l1tex_mem_shared_data_tram_bytes_allocated',
+            'tpc__l1tex_mem_shared_data_compute_bytes_allocated',
+        ]
+
+class RangesSummaryGenerator(tables_common.RangesSummaryGenerator):
+    def __init__(gen):
+        super().__init__()
+        gen.cols = [
+            gen.Col('Duration μs'   , "getCounterValue('gpu__time_duration', 'avg')"                                                                            , 'format_avg'  , 'ra'),
+            gen.Col('GR Active%'    , "getCounterPct('gr__cycles_active', 'avg')"                                                                               , 'format_pct'  , 'ra'),
+            gen.Col('3D?'           , "getCounterValue('fe__draw_count', 'sum') ? '&#x2713;' : ''"                                                              , ''            , 'ra'),
+            gen.Col('Comp?'         , "getCounterValue('gr__dispatch_count', 'sum') ? '&#x2713;' : ''"                                                          , ''            , 'ra'),
+            gen.Col('#WFI'          , "getCounterValue('fe__output_ops_type_bundle_cmd_go_idle', 'sum')"                                                        , 'format_sum'  , 'ra'),
+            gen.Col('#Prims'        , "getCounterValue('pda__input_prims', 'sum')"                                                                              , 'format_sum'  , 'ra'),
+            gen.Col('#Pixels-Z'     , "getCounterValue('prop__prop2xbar_zrop_pixels_realtime', 'sum')"                                                          , 'format_sum'  , 'ra'),
+            gen.Col('#Pixels-C'     , "getCounterValue('prop__prop2xbar_crop_pixels_realtime', 'sum')"                                                          , 'format_sum'  , 'ra'),
+            gen.Col('SM%'           , "getThroughputPct('sm__throughput')"                                                                                      , 'format_pct'  , 'ra'),
+            gen.Col('L1TEX%'        , "getThroughputPct('l1tex__throughput')"                                                                                   , 'format_pct'  , 'ra'),
+            gen.Col('L2%'           , "getThroughputPct('lts__throughput')"                                                                                     , 'format_pct'  , 'ra'),
+            gen.Col('DRAM%'         , "getThroughputPct('dram__throughput')"                                                                                    , 'format_pct'  , 'ra'),
+            gen.Col('PCIe%'         , "getThroughputPct('pcie__throughput')"                                                                                    , 'format_pct'  , 'ra'),
+            gen.Col('PD%'           , "getThroughputPct('pda__throughput')"                                                                                     , 'format_pct'  , 'ra'),
+            gen.Col('PE%'           , "Math.max(getThroughputPct('vaf__throughput'), getThroughputPct('vpc__throughput'), getThroughputPct('pes__throughput'))" , 'format_pct'  , 'ra'),
+            gen.Col('RSTR%'         , "getThroughputPct('raster__throughput')"                                                                                  , 'format_pct'  , 'ra'),
+            gen.Col('PROP%'         , "getThroughputPct('prop__throughput')"                                                                                    , 'format_pct'  , 'ra'),
+            gen.Col('ZROP%'         , "getThroughputPct('zrop__throughput')"                                                                                    , 'format_pct'  , 'ra'),
+            gen.Col('CROP%'         , "getThroughputPct('crop__throughput')"                                                                                    , 'format_pct'  , 'ra'),
+        ]
+        gen.required_counters = [
+            'fe__draw_count',
+            'fe__output_ops_type_bundle_cmd_go_idle',
+            'gpu__time_duration',
+            'gr__cycles_active',
+            'gr__dispatch_count',
+            'pda__input_prims',
+            'prop__prop2xbar_crop_pixels_realtime',
+            'prop__prop2xbar_zrop_pixels_realtime',
+        ]
+        gen.required_ratios = []
+        gen.required_throughputs = [
+            'crop__throughput',
+            'dram__throughput',
+            'l1tex__throughput',
+            'lts__throughput',
+            'pcie__throughput',
+            'pda__throughput',
+            'pes__throughput',
+            'prop__throughput',
+            'raster__throughput',
+            'sm__throughput',
+            'vaf__throughput',
+            'vpc__throughput',
+            'zrop__throughput',
+        ]
--- a/ruins64k/tools/NvPerfUtility/build/ReportGenerator/pub/volta/report_gv100.py
+++ b/ruins64k/tools/NvPerfUtility/build/ReportGenerator/pub/volta/report_gv100.py
@@ -0,0 +1,75 @@
+# Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from profiler_report_types import *
+import pub.tables_common as tables_common
+import pub.volta.tables_gv100 as tables_gv100
+
+def get_per_range_report_definition():
+    sections = [
+        DataSection([
+            tables_gv100.DevicePropertiesGenerator().make_data_table(),
+            tables_common.ClocksGenerator().make_data_table(),
+        ], inter_table_spacing=False),
+        DataSection([
+            tables_common.TopLevelStatsGenerator().make_data_table(),
+            tables_gv100.TopThroughputsGenerator().make_data_table(),
+            tables_common.CacheHitRates().make_data_table(),
+        ], title='Overview Section'),
+        DataSection([
+            tables_common.MainMemoryGenerator().make_data_table(),
+            tables_gv100.L2TrafficByMemoryApertureShortBreakdownGenerator(show_generic_workflow=True).make_data_table(),
+            tables_gv100.L2TrafficBySrcBreakdownGenerator(show_generic_workflow=False).make_data_table(),
+            tables_common.L1TexThroughputsGenerator().make_data_table(),
+            tables_common.L1TexTrafficBreakdownGenerator().make_data_table(),
+        ], title='Memory Performance Section'),
+        DataSection([
+            tables_gv100.SmThroughputsGenerator().make_data_table(),
+            tables_gv100.SmInstExecutedGenerator().make_data_table(),
+            tables_common.SmShaderExecutionGenerator().make_data_table(),
+            tables_common.SmWarpLaunchStallsGenerator().make_data_table(),
+            tables_common.WarpIssueStallsGenerator().make_data_table(),
+        ], title='Shader Performance Section'),
+        DataSection([
+            tables_gv100.L2TrafficByMemoryApertureBreakdownGenerator(show_generic_workflow=False).make_data_table(),
+            tables_gv100.L2TrafficByOperationBreakdownGenerator(show_generic_workflow=False).make_data_table(),
+        ], title='Additional L2 Traffic Breakdowns Section'),
+        DataSection([
+            tables_common.AdditionalMetricsGenerator().make_data_table(),
+            tables_common.AllCountersGenerator().make_data_table(),
+            tables_common.AllRatiosGenerator().make_data_table(),
+            tables_common.AllThroughputsGenerator().make_data_table(),
+        ], title='Exhaustive Listings Section'),
+    ]
+    html = tables_common.generate_range_html_common(sections)
+    required_counters = get_required_counters(sections)
+    required_ratios = get_required_ratios(sections)
+    required_throughputs = get_required_throughputs(sections)
+    return ReportDefinition('PerRangeReport', html, required_counters, required_ratios, required_throughputs)
+
+def get_summary_report_definition():
+    sections = [
+        DataSection([
+            tables_common.CollectionInfoGenerator().make_data_table(),
+        ]),
+        DataSection([
+            tables_gv100.RangesSummaryGenerator().make_data_table(),
+        ], title='Summary of Measured Ranges'),
+    ]
+    html = tables_common.generate_summary_html_common(sections)
+    required_counters = get_required_counters(sections)
+    required_ratios = get_required_ratios(sections)
+    required_throughputs = get_required_throughputs(sections)
+    return ReportDefinition('SummaryReport', html, required_counters, required_ratios, required_throughputs)
+
--- a/ruins64k/tools/NvPerfUtility/build/ReportGenerator/pub/volta/tables_gv100.py
+++ b/ruins64k/tools/NvPerfUtility/build/ReportGenerator/pub/volta/tables_gv100.py
@@ -0,0 +1,555 @@
+# Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from profiler_report_types import *
+import pub.tables_common as tables_common
+
+class DevicePropertiesGenerator(tables_common.DevicePropertiesGenerator):
+    def __init__(gen):
+        super().__init__()
+        gen.l2cacheSizePerLts = 96
+
+class TopThroughputsGenerator(tables_common.TopThroughputsGenerator):
+    def __init__(gen):
+        super().__init__()
+        gen.rows += [
+            gen.Row('Shader'      , '<a href="#SM-Instruction-Throughput">SM (Shader Cores)</a>'    , "getThroughputPct('sm__throughput')"),
+            gen.Row('Memory'      , '<a href="#L1TEX-Throughput">L1TEX Cache</a>'                   , "getThroughputPct('l1tex__throughput')"),
+            gen.Row('Memory'      , '<a href="#L2-Sector-Traffic">L2 Cache</a>'                     , "getThroughputPct('lts__throughput')"),
+            gen.Row('Memory'      , '<a href="#Main-Memory-Throughput">DRAM</a>'                    , "getThroughputPct('dram__throughput')"),
+            gen.Row('Memory'      , '<a href="#Main-Memory-Throughput">PCIe</a>'                    , "getThroughputPct('pcie__throughput')"),
+            gen.Row('World Pipe'  , '<a href="#Primitive-Data-Flow">PDA Index Fetch</a>'            , "getThroughputPct('pda__throughput')"),
+            gen.Row('World Pipe'  , '<a href="#Primitive-Data-Flow">Vertex Attr. Fetch</a>'         , "getThroughputPct('vaf__throughput')"),
+            gen.Row('World Pipe'  , '<a href="#Primitive-Data-Flow">Primitive Engine</a>'           , "getThroughputPct('pes__throughput')"),
+            gen.Row('Screen Pipe' , '<a href="#Raster-Data-Flow">RASTER</a>'                        , "getThroughputPct('raster__throughput')"),
+            gen.Row('Screen Pipe' , '<a href="#Raster-Data-Flow">ZROP (Depth-Test)</a>'             , "getThroughputPct('zrop__throughput')"),
+            gen.Row('Screen Pipe' , '<a href="#Raster-Data-Flow">CROP (Color Blend)</a>'            , "getThroughputPct('crop__throughput')"),
+        ]
+        gen.required_throughputs += [
+            'crop__throughput',
+            'dram__throughput',
+            'l1tex__throughput',
+            'lts__throughput',
+            'pcie__throughput',
+            'pda__throughput',
+            'pes__throughput',
+            'raster__throughput',
+            'sm__throughput',
+            'vaf__throughput',
+            'zrop__throughput',
+        ]
+
+class SmThroughputsGenerator(tables_common.SmThroughputsGenerator):
+    def __init__(gen):
+        super().__init__()
+        gen.pipes += [
+            gen.Pipe('adu'          , 'Computed branches and indexed constants'),
+            gen.Pipe('alu'          , 'INT32 except multiply; FP32 comparison', 'sm__pipe_alu_cycles_active'),
+            gen.Pipe('cbu'          , 'Divergent branches and control flow'),
+            gen.Pipe('fma'          , 'FP32 mul/add and INT32 multiply', 'sm__pipe_fma_cycles_active'),
+            gen.Pipe('fp16'         , 'FP16 mul/add'),
+            gen.Pipe('fp64'         , 'FP64 mul/add'),
+            gen.Pipe('ipa'          , 'Pixel shader attribute interpolation'),
+            gen.Pipe('lsu'          , 'Global, local, shared memory, and misc'),
+            gen.Pipe('shared'       , 'Shared Pipe Dispatch (FP64,FP16,Tensor)', 'sm__pipe_shared_cycles_active', hasInstExecuted=False),
+            gen.Pipe('tensor'       , 'Tensor matrix multiply (FP16)', 'sm__pipe_tensor_cycles_active'),
+            gen.Pipe('tex'          , 'Texture and surface memory'),
+            gen.Pipe('xu'           , 'Transcendentals and float/int conversion'),
+        ]
+        for pipe in gen.pipes:
+            gen.required_counters.extend(pipe.get_counter_names())
+
+
+class SmInstExecutedGenerator(tables_common.SmInstExecutedGenerator):
+    def __init__(gen):
+        super().__init__()
+        gen.pipes += [
+            gen.Pipe('total'        , 'All instructions', True),
+            gen.Pipe('adu'          , 'Computed branches and indexed constants'),
+            gen.Pipe('alu'          , 'INT32 except multiply; FP32 comparison', True),
+            gen.Pipe('cbu'          , 'Divergent branches and control flow'),
+            gen.Pipe('fma'          , 'FP32 mul/add and INT32 multiply', True),
+            gen.Pipe('fp16'         , 'FP16 mul/add', True),
+            gen.Pipe('fp64'         , 'FP64 mul/add', True),
+            gen.Pipe('ipa'          , 'Pixel shader attribute interpolation', True),
+            gen.Pipe('lsu'          , 'Global, local, shared memory, and misc', True),
+            gen.Pipe('tensor'       , 'Tensor matrix multiply (FP16)'),
+            gen.Pipe('tex'          , 'Texture and surface memory'),
+            gen.Pipe('xu'           , 'Transcendentals and float/int conversion', True),
+        ]
+        for pipe in gen.pipes:
+            gen.required_counters.extend(pipe.get_counter_names())
+
+class L2TrafficByMemoryApertureShortBreakdownGenerator(tables_common.L2TrafficBreakdownGenerator):
+    def __init__(gen, show_generic_workflow):
+        super().__init__(show_generic_workflow=show_generic_workflow)
+        gen.name = 'L2SectorTrafficBreakdownByMemoryApertureShort'
+        gen.table_id = 'L2-Sector-Traffic-By-Memory-Aperture-Short'
+        gen.column_names = [
+            ('Memory Aperture', 'To Memory'),
+            ('Op', 'Op'),
+        ]
+        gen.workflow += ' This table decomposes L2 bandwidth to each destination, per operation. A <a href="#L2-Sector-Traffic-By-Memory-Aperture">more detailed version of this table</a> can be found below.'
+
+        gen.nodes = [
+            gen.Node('DRAM' , ['lts__average_t_sector_aperture_device'], [
+                gen.Node('Reads',      ['lts__average_t_sector_aperture_device_op_read'],  []),
+                gen.Node('Writes',     ['lts__average_t_sector_aperture_device_op_write'], []),
+                gen.Node('Atomics',    ['lts__average_t_sector_aperture_device_op_atom'], []),
+                gen.Node('Reductions', ['lts__average_t_sector_aperture_device_op_red'],  []),
+            ]),
+            gen.Node('System Memory', ['lts__average_t_sector_aperture_sysmem'], [
+                gen.Node('Reads',      ['lts__average_t_sector_aperture_sysmem_op_read'],  []),
+                gen.Node('Writes',     ['lts__average_t_sector_aperture_sysmem_op_write'], []),
+                gen.Node('Atomics',    ['lts__average_t_sector_aperture_sysmem_op_atom'], []),
+                gen.Node('Reductions', ['lts__average_t_sector_aperture_sysmem_op_red'],  []),
+            ]),
+            gen.Node('Peer Memory', ['lts__average_t_sector_aperture_peer'], [
+                gen.Node('Reads',      ['lts__average_t_sector_aperture_peer_op_read'],  []),
+                gen.Node('Writes',     ['lts__average_t_sector_aperture_peer_op_write'], []),
+                gen.Node('Atomics',    ['lts__average_t_sector_aperture_peer_op_atom'], []),
+                gen.Node('Reductions', ['lts__average_t_sector_aperture_peer_op_red'],  []),
+            ]),
+        ]
+
+        gen.required_ratios = gen.get_required_ratios()
+
+class L2TrafficBySrcBreakdownGenerator(tables_common.L2TrafficBreakdownGenerator):
+    def __init__(gen, show_generic_workflow):
+        super().__init__(show_generic_workflow=show_generic_workflow)
+        gen.name = 'L2SectorTrafficBreakdownBySource'
+        gen.table_id = 'L2-Sector-Traffic-By-Source'
+        gen.column_names = [
+            ('Source Breakdown', 'From Source'),
+            ('Unit Breakdown', 'From Unit'),
+            ('Memory Aperture', 'To Memory'),
+            ('Op', 'Op'),
+        ]
+        gen.workflow += ' This table decomposes L2 bandwidth from each source unit, to each destination, per operation. See also: these tables that prioritize <a href="#L2-Sector-Traffic-By-Memory-Aperture">destination Memory Aperture</a> and <a href="#L2-Sector-Traffic-By-Operation">Operation</a>.'
+
+        gen.nodes = [
+            gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc'], [
+                gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex'], [
+                    gen.Node('DRAM' , ['lts__average_t_sector_srcunit_tex_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_tex_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_tex_aperture_device_op_write'], []),
+                        gen.Node('Atomics',    ['lts__average_t_sector_srcunit_tex_aperture_device_op_atom'], []),
+                        gen.Node('Reductions', ['lts__average_t_sector_srcunit_tex_aperture_device_op_red'],  []),
+                    ]),
+                    gen.Node('Peer Memory', ['lts__average_t_sector_srcunit_tex_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_tex_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_tex_aperture_peer_op_write'], []),
+                        gen.Node('Atomics',    ['lts__average_t_sector_srcunit_tex_aperture_peer_op_atom'], []),
+                        gen.Node('Reductions', ['lts__average_t_sector_srcunit_tex_aperture_peer_op_red'],  []),
+                    ]),
+                    gen.Node('System Memory', ['lts__average_t_sector_srcunit_tex_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_write'], []),
+                        gen.Node('Atomics',    ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_atom'], []),
+                        gen.Node('Reductions', ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_red'],  []),
+                    ]),
+                ]),
+                gen.Node('Primitive Engine', ['lts__average_t_sector_srcunit_pe'], [
+                    gen.Node('DRAM' , ['lts__average_t_sector_srcunit_pe_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_pe_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_pe_aperture_device_op_write'], []),
+                    ]),
+                    gen.Node('Peer Memory', ['lts__average_t_sector_srcunit_pe_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_pe_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_pe_aperture_peer_op_write'], []),
+                    ]),
+                    gen.Node('System Memory', ['lts__average_t_sector_srcunit_pe_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_pe_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_pe_aperture_sysmem_op_write'], []),
+                    ]),
+                ]),
+                gen.Node('other GPC units', ['lts__average_t_sector_srcunit_gpcother'], [
+                    gen.Node('DRAM' , ['lts__average_t_sector_srcunit_gpcother_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_gpcother_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_gpcother_aperture_device_op_write'], []),
+                    ]),
+                    gen.Node('Peer Memory', ['lts__average_t_sector_srcunit_gpcother_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_gpcother_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_gpcother_aperture_peer_op_write'], []),
+                    ]),
+                    gen.Node('System Memory', ['lts__average_t_sector_srcunit_gpcother_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_gpcother_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_gpcother_aperture_sysmem_op_write'], []),
+                    ]),
+                ]),
+            ]),
+            gen.Node('FBP Units', ['lts__average_t_sector_srcnode_fbp'], [
+                gen.Node('ZROP', ['lts__average_t_sector_srcunit_zrop'], [
+                    gen.Node('DRAM' , ['lts__average_t_sector_srcunit_zrop_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_zrop_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_zrop_aperture_device_op_write'], []),
+                    ]),
+                    gen.Node('Peer Memory' , ['lts__average_t_sector_srcunit_zrop_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_zrop_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_zrop_aperture_peer_op_write'], []),
+                    ]),
+                    gen.Node('System Memory' , ['lts__average_t_sector_srcunit_zrop_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_zrop_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_zrop_aperture_sysmem_op_write'], []),
+                    ]),
+                ]),
+                gen.Node('CROP', ['lts__average_t_sector_srcunit_crop'], [
+                    gen.Node('DRAM' , ['lts__average_t_sector_srcunit_crop_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_crop_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_crop_aperture_device_op_write'], []),
+                    ]),
+                    gen.Node('Peer Memory' , ['lts__average_t_sector_srcunit_crop_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_crop_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_crop_aperture_peer_op_write'], []),
+                    ]),
+                    gen.Node('System Memory' , ['lts__average_t_sector_srcunit_crop_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_crop_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_crop_aperture_sysmem_op_write'], []),
+                    ]),
+                ]),
+            ]),
+            gen.Node('HUB Units', [], [ # lts__average_t_sector_srcnode_fbp is buggy, use sum of children instead
+                gen.Node('all HUB Units', [], [ # lts__average_t_sector_srcnode_fbp is buggy, use sum of children instead
+                    gen.Node('DRAM' , ['lts__average_t_sector_srcnode_hub_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcnode_hub_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcnode_hub_aperture_device_op_write'], []),
+                    ]),
+                    gen.Node('Peer Memory', ['lts__average_t_sector_srcnode_hub_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcnode_hub_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcnode_hub_aperture_peer_op_write'], []),
+                    ]),
+                    gen.Node('System Memory', ['lts__average_t_sector_srcnode_hub_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcnode_hub_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcnode_hub_aperture_sysmem_op_write'], []),
+                    ]),
+                ]),
+            ]),
+        ]
+
+        gen.required_ratios = gen.get_required_ratios()
+
+class L2TrafficByMemoryApertureBreakdownGenerator(tables_common.L2TrafficBreakdownGenerator):
+    def __init__(gen, show_generic_workflow):
+        super().__init__(show_generic_workflow=show_generic_workflow)
+        gen.name = 'L2SectorTrafficBreakdownByMemoryAperture'
+        gen.table_id = 'L2-Sector-Traffic-By-Memory-Aperture'
+        gen.column_names = [
+            ('Memory Aperture', 'To Memory'),
+            ('Source Breakdown', 'From Source'),
+            ('Unit Breakdown', 'From Unit'),
+            ('Op', 'Op'),
+        ]
+        gen.workflow += ' This is an extended breakdown of <a href="#L2-Sector-Traffic-By-Memory-Aperture-Short">L2 Traffic by destination</a>. It decomposes L2 bandwidth to each destination, from each source unit, per operation.'
+
+        gen.nodes = [
+            gen.Node('DRAM' , ['lts__average_t_sector_aperture_device'], [
+                gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_device'], [
+                    gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_tex_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_tex_aperture_device_op_write'], []),
+                        gen.Node('Atomics',    ['lts__average_t_sector_srcunit_tex_aperture_device_op_atom'], []),
+                        gen.Node('Reductions', ['lts__average_t_sector_srcunit_tex_aperture_device_op_red'],  []),
+                    ]),
+                    gen.Node('Primitive Engine', ['lts__average_t_sector_srcunit_pe_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_pe_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_pe_aperture_device_op_write'], []),
+                    ]),
+                    gen.Node('other GPC units', ['lts__average_t_sector_srcunit_gpcother_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_gpcother_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_gpcother_aperture_device_op_write'], []),
+                    ]),
+                ]),
+                gen.Node('FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_device'], [
+                    gen.Node('ZROP', ['lts__average_t_sector_srcunit_zrop_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_zrop_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_zrop_aperture_device_op_write'], []),
+                    ]),
+                    gen.Node('CROP', ['lts__average_t_sector_srcunit_crop_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_crop_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_crop_aperture_device_op_write'], []),
+                    ]),
+                ]),
+                gen.Node('HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_device'], [
+                    gen.Node('all HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_device'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcnode_hub_aperture_device_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcnode_hub_aperture_device_op_write'], []),
+                    ]),
+                ]),
+            ]),
+            gen.Node('Peer Memory', ['lts__average_t_sector_aperture_peer'], [
+                gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_peer'], [
+                    gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_tex_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_tex_aperture_peer_op_write'], []),
+                        gen.Node('Atomics',    ['lts__average_t_sector_srcunit_tex_aperture_peer_op_atom'], []),
+                        gen.Node('Reductions', ['lts__average_t_sector_srcunit_tex_aperture_peer_op_red'],  []),
+                    ]),
+                    gen.Node('Primitive Engine', ['lts__average_t_sector_srcunit_pe_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_pe_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_pe_aperture_peer_op_write'], []),
+                    ]),
+                    gen.Node('other GPC units', ['lts__average_t_sector_srcunit_gpcother_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_gpcother_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_gpcother_aperture_peer_op_write'], []),
+                    ]),
+                ]),
+                gen.Node('FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_peer'], [
+                    gen.Node('ZROP', ['lts__average_t_sector_srcunit_zrop_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_zrop_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_zrop_aperture_peer_op_write'], []),
+                    ]),
+                    gen.Node('CROP', ['lts__average_t_sector_srcunit_crop_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_crop_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_crop_aperture_peer_op_write'], []),
+                    ]),
+                ]),
+                gen.Node('HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_peer'], [
+                    gen.Node('all HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_peer'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcnode_hub_aperture_peer_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcnode_hub_aperture_peer_op_write'], []),
+                    ]),
+                ]),
+            ]),
+            gen.Node('System Memory', ['lts__average_t_sector_aperture_sysmem'], [
+                gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_sysmem'], [
+                    gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_write'], []),
+                        gen.Node('Atomics',    ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_atom'], []),
+                        gen.Node('Reductions', ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_red'],  []),
+                    ]),
+                    gen.Node('Primitive Engine', ['lts__average_t_sector_srcunit_pe_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_pe_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_pe_aperture_sysmem_op_write'], []),
+                    ]),
+                    gen.Node('other GPC units', ['lts__average_t_sector_srcunit_gpcother_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_gpcother_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_gpcother_aperture_sysmem_op_write'], []),
+                    ]),
+                ]),
+                gen.Node('FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_sysmem'], [
+                    gen.Node('ZROP', ['lts__average_t_sector_srcunit_zrop_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_zrop_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_zrop_aperture_sysmem_op_write'], []),
+                    ]),
+                    gen.Node('CROP', ['lts__average_t_sector_srcunit_crop_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcunit_crop_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcunit_crop_aperture_sysmem_op_write'], []),
+                    ]),
+                ]),
+                gen.Node('HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_sysmem'], [
+                    gen.Node('all HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_sysmem'], [
+                        gen.Node('Reads',      ['lts__average_t_sector_srcnode_hub_aperture_sysmem_op_read'],  []),
+                        gen.Node('Writes',     ['lts__average_t_sector_srcnode_hub_aperture_sysmem_op_write'], []),
+                    ]),
+                ]),
+            ]),
+        ]
+
+        gen.required_ratios = gen.get_required_ratios()
+
+class L2TrafficByOperationBreakdownGenerator(tables_common.L2TrafficBreakdownGenerator):
+    def __init__(gen, show_generic_workflow):
+        super().__init__(show_generic_workflow=show_generic_workflow)
+        gen.name = 'L2SectorTrafficBreakdownByOperation'
+        gen.table_id = 'L2-Sector-Traffic-By-Operation'
+        gen.column_names = [
+            ('Op', 'Op'),
+            ('Memory Aperture', 'To Memory'),
+            ('Source Breakdown', 'From Source'),
+            ('Unit Breakdown', 'From Unit'),
+        ]
+        gen.workflow += ' This table decomposes L2 bandwidth per operation, to each destination, from each source unit.'
+
+        gen.nodes = [
+            gen.Node('Reads',      ['lts__average_t_sector_op_read'],  [
+                gen.Node('DRAM', ['lts__average_t_sector_aperture_device_op_read'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_device_op_read'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_device_op_read'], []),
+                        gen.Node('Primitive Engine', ['lts__average_t_sector_srcunit_pe_aperture_device_op_read'], []),
+                        gen.Node('other GPC units', ['lts__average_t_sector_srcunit_gpcother_aperture_device_op_read'], []),
+                    ]),
+                    gen.Node('FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_device_op_read'], [
+                        gen.Node('ZROP', ['lts__average_t_sector_srcunit_zrop_aperture_device_op_read'], []),
+                        gen.Node('CROP', ['lts__average_t_sector_srcunit_crop_aperture_device_op_read'], []),
+                    ]),
+                    gen.Node('HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_device_op_read'], [
+                        gen.Node('all HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_device_op_read'], []),
+                    ]),
+                ]),
+                gen.Node('Peer Memory', ['lts__average_t_sector_aperture_peer_op_read'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_peer_op_read'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_peer_op_read'], []),
+                        gen.Node('Primitive Engine', ['lts__average_t_sector_srcunit_pe_aperture_peer_op_read'], []),
+                        gen.Node('other GPC units', ['lts__average_t_sector_srcunit_gpcother_aperture_peer_op_read'], []),
+                    ]),
+                    gen.Node('FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_peer_op_read'], [
+                        gen.Node('ZROP', ['lts__average_t_sector_srcunit_zrop_aperture_peer_op_read'], []),
+                        gen.Node('CROP', ['lts__average_t_sector_srcunit_crop_aperture_peer_op_read'], []),
+                    ]),
+                    gen.Node('HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_peer_op_read'], [
+                        gen.Node('all HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_peer_op_read'], []),
+                    ]),
+                ]),
+                gen.Node('System Memory', ['lts__average_t_sector_aperture_sysmem_op_read'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_sysmem_op_read'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_read'], []),
+                        gen.Node('Primitive Engine', ['lts__average_t_sector_srcunit_pe_aperture_sysmem_op_read'], []),
+                        gen.Node('other GPC units', ['lts__average_t_sector_srcunit_gpcother_aperture_sysmem_op_read'], []),
+                    ]),
+                    gen.Node('FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_sysmem_op_read'], [
+                        gen.Node('ZROP', ['lts__average_t_sector_srcunit_zrop_aperture_sysmem_op_read'], []),
+                        gen.Node('CROP', ['lts__average_t_sector_srcunit_crop_aperture_sysmem_op_read'], []),
+                    ]),
+                    gen.Node('HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_sysmem_op_read'], [
+                        gen.Node('all HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_sysmem_op_read'], []),
+                    ]),
+                ]),
+            ]),
+            gen.Node('Writes',     ['lts__average_t_sector_op_write'], [
+                gen.Node('DRAM', ['lts__average_t_sector_aperture_device_op_write'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_device_op_write'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_device_op_write'], []),
+                        gen.Node('Primitive Engine', ['lts__average_t_sector_srcunit_pe_aperture_device_op_write'], []),
+                        gen.Node('other GPC units', ['lts__average_t_sector_srcunit_gpcother_aperture_device_op_write'], []),
+                    ]),
+                    gen.Node('FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_device_op_write'], [
+                        gen.Node('ZROP', ['lts__average_t_sector_srcunit_zrop_aperture_device_op_write'], []),
+                        gen.Node('CROP', ['lts__average_t_sector_srcunit_crop_aperture_device_op_write'], []),
+                    ]),
+                    gen.Node('HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_device_op_write'], [
+                        gen.Node('all HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_device_op_write'], []),
+                    ]),
+                ]),
+                gen.Node('Peer Memory', ['lts__average_t_sector_aperture_peer_op_write'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_peer_op_write'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_peer_op_write'], []),
+                        gen.Node('Primitive Engine', ['lts__average_t_sector_srcunit_pe_aperture_peer_op_write'], []),
+                        gen.Node('other GPC units', ['lts__average_t_sector_srcunit_gpcother_aperture_peer_op_write'], []),
+                    ]),
+                    gen.Node('FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_peer_op_write'], [
+                        gen.Node('ZROP', ['lts__average_t_sector_srcunit_zrop_aperture_peer_op_write'], []),
+                        gen.Node('CROP', ['lts__average_t_sector_srcunit_crop_aperture_peer_op_write'], []),
+                    ]),
+                    gen.Node('HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_peer_op_write'], [
+                        gen.Node('all HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_peer_op_write'], []),
+                    ]),
+                ]),
+                gen.Node('System Memory', ['lts__average_t_sector_aperture_sysmem_op_write'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_sysmem_op_write'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_write'], []),
+                        gen.Node('Primitive Engine', ['lts__average_t_sector_srcunit_pe_aperture_sysmem_op_write'], []),
+                        gen.Node('other GPC units', ['lts__average_t_sector_srcunit_gpcother_aperture_sysmem_op_write'], []),
+                    ]),
+                    gen.Node('FBP Units', ['lts__average_t_sector_srcnode_fbp_aperture_sysmem_op_write'], [
+                        gen.Node('ZROP', ['lts__average_t_sector_srcunit_zrop_aperture_sysmem_op_write'], []),
+                        gen.Node('CROP', ['lts__average_t_sector_srcunit_crop_aperture_sysmem_op_write'], []),
+                    ]),
+                    gen.Node('HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_sysmem_op_write'], [
+                        gen.Node('all HUB Units', ['lts__average_t_sector_srcnode_hub_aperture_sysmem_op_write'], []),
+                    ]),
+                ]),
+            ]),
+            gen.Node('Atomics',    ['lts__average_t_sector_op_atom'], [
+                gen.Node('DRAM', ['lts__average_t_sector_aperture_device_op_atom'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_device_op_atom'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_device_op_atom'], []),
+                    ]),
+                ]),
+                gen.Node('Peer Memory', ['lts__average_t_sector_aperture_peer_op_atom'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_peer_op_atom'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_peer_op_atom'], []),
+                    ]),
+                ]),
+                gen.Node('System Memory', ['lts__average_t_sector_aperture_sysmem_op_atom'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_sysmem_op_atom'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_atom'], []),
+                    ]),
+                ]),
+            ]),
+            gen.Node('Reductions', ['lts__average_t_sector_op_red'], [
+                gen.Node('DRAM', ['lts__average_t_sector_aperture_device_op_red'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_device_op_red'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_device_op_red'], []),
+                    ]),
+                ]),
+                gen.Node('Peer Memory', ['lts__average_t_sector_aperture_peer_op_red'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_peer_op_red'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_peer_op_red'], []),
+                    ]),
+                ]),
+                gen.Node('System Memory', ['lts__average_t_sector_aperture_sysmem_op_red'], [
+                    gen.Node('GPC Units', ['lts__average_t_sector_srcnode_gpc_aperture_sysmem_op_red'], [
+                        gen.Node('<a href="#L1TEX-Sector-Traffic">L1TEX Cache</a>', ['lts__average_t_sector_srcunit_tex_aperture_sysmem_op_red'], []),
+                    ]),
+                ]),
+            ]),
+        ]
+
+        gen.required_ratios = gen.get_required_ratios()
+
+class RasterDataflowGenerator(tables_common.RasterDataflowGenerator):
+    def __init__(gen):
+        super().__init__()
+        gen.zrop_pixels_input = r'''getCounterValue('prop__prop2zrop_pixels_realtime', 'sum')'''
+        gen.crop_pixels_input = r'''getCounterValue('prop__prop2crop_pixels_realtime', 'sum')'''
+        gen.required_counters.extend([
+            'prop__prop2zrop_pixels_realtime',
+            'prop__prop2crop_pixels_realtime'
+        ])
+
+class RangesSummaryGenerator(tables_common.RangesSummaryGenerator):
+    def __init__(gen):
+        super().__init__()
+        gen.cols = [
+            gen.Col('Duration μs'   , "getCounterValue('gpu__time_duration', 'avg')"                                                                            , 'format_avg'  , 'ra'),
+            gen.Col('GR Active%'    , "getCounterPct('gr__cycles_active', 'avg')"                                                                               , 'format_pct'  , 'ra'),
+            gen.Col('3D?'           , "getCounterValue('fe__draw_count', 'sum') ? '&#x2713;' : ''"                                                              , ''            , 'ra'),
+            gen.Col('Comp?'         , "getCounterValue('gr__dispatch_count', 'sum') ? '&#x2713;' : ''"                                                          , ''            , 'ra'),
+            gen.Col('#WFI'          , "getCounterValue('fe__output_ops_type_bundle_cmd_go_idle', 'sum')"                                                        , 'format_sum'  , 'ra'),
+            gen.Col('#Prims'        , "getCounterValue('pda__input_prims', 'sum')"                                                                              , 'format_sum'  , 'ra'),
+            gen.Col('SM%'           , "getThroughputPct('sm__throughput')"                                                                                      , 'format_pct'  , 'ra'),
+            gen.Col('L1TEX%'        , "getThroughputPct('l1tex__throughput')"                                                                                   , 'format_pct'  , 'ra'),
+            gen.Col('L2%'           , "getThroughputPct('lts__throughput')"                                                                                     , 'format_pct'  , 'ra'),
+            gen.Col('DRAM%'         , "getThroughputPct('dram__throughput')"                                                                                    , 'format_pct'  , 'ra'),
+            gen.Col('PCIe%'         , "getThroughputPct('pcie__throughput')"                                                                                    , 'format_pct'  , 'ra'),
+            gen.Col('PD%'           , "getThroughputPct('pda__throughput')"                                                                                     , 'format_pct'  , 'ra'),
+            gen.Col('PE%'           , "Math.max(getThroughputPct('vaf__throughput'), getThroughputPct('vpc__throughput'), getThroughputPct('pes__throughput'))" , 'format_pct'  , 'ra'),
+            gen.Col('RSTR%'         , "getThroughputPct('raster__throughput')"                                                                                  , 'format_pct'  , 'ra'),
+            gen.Col('ZROP%'         , "getThroughputPct('zrop__throughput')"                                                                                    , 'format_pct'  , 'ra'),
+            gen.Col('CROP%'         , "getThroughputPct('crop__throughput')"                                                                                    , 'format_pct'  , 'ra'),
+        ]
+        gen.required_counters = [
+            'fe__draw_count',
+            'fe__output_ops_type_bundle_cmd_go_idle',
+            'gpu__time_duration',
+            'gr__cycles_active',
+            'gr__dispatch_count',
+            'pda__input_prims',
+        ]
+        gen.required_ratios = []
+        gen.required_throughputs = [
+            'crop__throughput',
+            'dram__throughput',
+            'l1tex__throughput',
+            'lts__throughput',
+            'pcie__throughput',
+            'pda__throughput',
+            'pes__throughput',
+            'raster__throughput',
+            'sm__throughput',
+            'vaf__throughput',
+            'vpc__throughput',
+            'zrop__throughput',
+        ]
--- a/ruins64k/tools/NvPerfUtility/build/ReportGenerator/readme.txt
+++ b/ruins64k/tools/NvPerfUtility/build/ReportGenerator/readme.txt
@@ -0,0 +1,11 @@
+This is an offline tool that generates the C++ "report definition" header.
+
+Example command:
+```
+python3 profiler_report_generator.py --chip ga10x --outDir=PATH/TO/YOUR/OUTPUT_DIR --pypath pub/ampere
+```
+
+* This has been tested with Python 3.5.2
+* Please use "profiler_report_generator.py --help" for more details
+
+A pre-generated version of header files have been deployed to both the "/gen" sub-directory and the "NvPerfUtility" directory.
--- a/ruins64k/tools/NvPerfUtility/imports/json-3.9.1/LICENSE.MIT
+++ b/ruins64k/tools/NvPerfUtility/imports/json-3.9.1/LICENSE.MIT
@@ -0,0 +1,21 @@
+MIT License 
+
+Copyright (c) 2013-2020 Niels Lohmann
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/ruins64k/tools/NvPerfUtility/imports/json-3.9.1/README.md
+++ b/ruins64k/tools/NvPerfUtility/imports/json-3.9.1/README.md
--- a/ruins64k/tools/NvPerfUtility/imports/json-3.9.1/json/json.hpp
+++ b/ruins64k/tools/NvPerfUtility/imports/json-3.9.1/json/json.hpp
--- a/ruins64k/tools/NvPerfUtility/include/NvPerfCounterConfiguration.h
+++ b/ruins64k/tools/NvPerfUtility/include/NvPerfCounterConfiguration.h
@@ -0,0 +1,107 @@
+/*
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+#pragma once
+
+#include <stdint.h>
+#include <vector>
+#include "NvPerfInit.h"
+#include "NvPerfMetricsConfigBuilder.h"
+
+namespace nv { namespace perf {
+
+    struct CounterConfiguration
+    {
+        std::vector<uint8_t> configImage;
+        std::vector<uint8_t> counterDataPrefix;
+        size_t numPipelinedPasses;
+        size_t numIsolatedPasses;
+    };
+
+    /// Transforms configBuilder into configuration.
+    inline bool CreateConfiguration(
+        MetricsConfigBuilder& configBuilder,
+        CounterConfiguration& configuration)
+    {
+        bool res = false;
+        res = configBuilder.PrepareConfigImage();
+        if (!res)
+        {
+            //std::cerr << "FAILED: D3D12CreateConfiguration - failed PrepareConfigImage\n";
+            return false;
+        }
+
+        const size_t configImageSize = configBuilder.GetConfigImageSize();
+        if (!configImageSize)
+        {
+            // std::cerr << "FAILED: GetConfigImageSize - failed PrepareConfigImage\n";
+            return false;
+        }
+        configuration.configImage.resize(configImageSize);
+        if (!configBuilder.GetConfigImage(configuration.configImage.size(), &configuration.configImage[0]))
+        {
+            //std::cerr << "FAILED: GetConfigImage - failed PrepareConfigImage\n";
+            return false;
+        }
+
+        const size_t counterDataPrefixSize = configBuilder.GetCounterDataPrefixSize();
+        if (!counterDataPrefixSize)
+        {
+            //std::cerr << "FAILED: GetCounterDataPrefixSize - failed PrepareConfigImage\n";
+            return false;
+        }
+        configuration.counterDataPrefix.resize(counterDataPrefixSize);
+        if (!configBuilder.GetCounterDataPrefix(configuration.counterDataPrefix.size(), &configuration.counterDataPrefix[0]))
+        {
+            //std::cerr << "FAILED: GetCounterDataPrefix - failed PrepareConfigImage\n";
+            return false;
+        }
+
+        NVPW_Config_GetNumPasses_Params getNumPassesParams = { NVPW_Config_GetNumPasses_Params_STRUCT_SIZE };
+        getNumPassesParams.pConfig = &configuration.configImage[0];
+        NVPA_Status nvpaStatus = NVPW_Config_GetNumPasses(&getNumPassesParams);
+        if (nvpaStatus)
+        {
+            return false;
+        }
+        configuration.numPipelinedPasses = getNumPassesParams.numPipelinedPasses;
+        configuration.numIsolatedPasses = getNumPassesParams.numIsolatedPasses;
+
+        return true;
+    }
+
+
+    /// Adds pMetricNames[0..numMetrics-1] into configBuilder, then transforms configBuilder into configuration.
+    inline bool CreateConfiguration(
+        MetricsConfigBuilder& configBuilder,
+        size_t numMetrics,
+        const char* const pMetricNames[],
+        CounterConfiguration& configuration)
+    {
+        bool succeeded = configBuilder.AddMetrics(pMetricNames, numMetrics);
+        if (!succeeded)
+        {
+            return false;
+        }
+
+        succeeded = CreateConfiguration(configBuilder, configuration);
+        if (!succeeded)
+        {
+            return false;
+        }
+        return true;
+    }
+
+}}
--- a/ruins64k/tools/NvPerfUtility/include/NvPerfCounterData.h
+++ b/ruins64k/tools/NvPerfUtility/include/NvPerfCounterData.h
@@ -0,0 +1,80 @@
+/*
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+#pragma once
+
+#include "nvperf_host.h"
+#include "nvperf_target.h"
+#include <string>
+#include <vector>
+
+namespace nv { namespace perf {
+    inline size_t CounterDataGetNumRanges(const uint8_t* pCounterDataImage)
+    {
+        NVPW_CounterData_GetNumRanges_Params getNumRangeParams = { NVPW_CounterData_GetRangeDescriptions_Params_STRUCT_SIZE };
+        getNumRangeParams.pCounterDataImage = pCounterDataImage;
+        NVPA_Status nvpaStatus = NVPW_CounterData_GetNumRanges(&getNumRangeParams);
+        if (nvpaStatus)
+        {
+            return 0;
+        }
+        return getNumRangeParams.numRanges;
+    }
+
+    // TODO: this function performs dynamic allocations; either need a non-malloc'ing variant, or move this to an appropriate place
+    inline std::string CounterDataGetRangeName(const uint8_t* pCounterDataImage, size_t rangeIndex, char delimiter, const char** ppLeafName = nullptr)
+    {
+        std::string rangeName;
+
+        NVPW_CounterData_GetRangeDescriptions_Params params = { NVPW_CounterData_GetRangeDescriptions_Params_STRUCT_SIZE };
+        params.pCounterDataImage = pCounterDataImage;
+        params.rangeIndex = rangeIndex;
+        NVPA_Status nvpaStatus = NVPW_CounterData_GetRangeDescriptions(&params);
+        if (nvpaStatus)
+        {
+            return "";
+        }
+
+        if (!params.numDescriptions)
+        {
+            return "";
+        }
+
+        std::vector<const char*> descriptions;
+        descriptions.resize(params.numDescriptions);
+        params.ppDescriptions = descriptions.data();
+        nvpaStatus = NVPW_CounterData_GetRangeDescriptions(&params);
+        if (nvpaStatus)
+        {
+            return "";
+        }
+
+        rangeName += descriptions[0];
+        for (size_t descriptionIdx = 1; descriptionIdx < params.numDescriptions; ++descriptionIdx)
+        {
+            const char* pDescription = params.ppDescriptions[descriptionIdx];
+            rangeName += delimiter;
+            rangeName += pDescription;
+        }
+
+        if (ppLeafName)
+        {
+            *ppLeafName = descriptions.back();
+        }
+
+        return rangeName;
+    }
+}}
--- a/ruins64k/tools/NvPerfUtility/include/NvPerfD3D.h
+++ b/ruins64k/tools/NvPerfUtility/include/NvPerfD3D.h
@@ -0,0 +1,82 @@
+/*
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+#pragma once
+
+#include "NvPerfInit.h"
+#include "NvPerfDeviceProperties.h"
+
+#include <dxgi.h>
+
+namespace nv { namespace perf {
+
+    inline bool DxgiIsNvidiaDevice(IDXGIAdapter* pAdapter)
+    {
+        DXGI_ADAPTER_DESC adapterDesc = {};
+        HRESULT hr = pAdapter->GetDesc(&adapterDesc);
+        if (FAILED(hr))
+        {
+            return false;
+        }
+
+        if (adapterDesc.VendorId != 0x10de)
+        {
+            return false;
+        }
+
+        return true;
+    }
+
+    inline size_t D3DGetNvperfDeviceIndex(IDXGIAdapter* pDXGIAdapter, size_t sliIndex = 0)
+    {
+        NVPW_Adapter_GetDeviceIndex_Params getDeviceIndexParams = { NVPW_Adapter_GetDeviceIndex_Params_STRUCT_SIZE };
+        getDeviceIndexParams.pAdapter = pDXGIAdapter;
+        getDeviceIndexParams.sliIndex = sliIndex;
+        NVPA_Status nvpaStatus = NVPW_Adapter_GetDeviceIndex(&getDeviceIndexParams);
+        if (nvpaStatus)
+        {
+            return ~size_t(0);
+        }
+
+        return getDeviceIndexParams.deviceIndex;
+    }
+
+    inline DeviceIdentifiers D3DGetDeviceIdentifiers(IDXGIAdapter* pDXGIAdapter, size_t sliIndex = 0)
+    {
+        const size_t deviceIndex = D3DGetNvperfDeviceIndex(pDXGIAdapter, sliIndex);
+
+        DeviceIdentifiers deviceIdentifiers = GetDeviceIdentifiers(deviceIndex);
+        return deviceIdentifiers;
+    }
+
+    inline NVPW_Device_ClockStatus D3DGetDeviceClockState(IDXGIAdapter* pDXGIAdapter)
+    {
+        size_t nvperfDeviceIndex = D3DGetNvperfDeviceIndex(pDXGIAdapter);
+        return GetDeviceClockState(nvperfDeviceIndex);
+    }
+
+    inline bool D3DSetDeviceClockState(IDXGIAdapter* pDXGIAdapter, NVPW_Device_ClockSetting clockSetting)
+    {
+        size_t nvperfDeviceIndex = D3DGetNvperfDeviceIndex(pDXGIAdapter);
+        return SetDeviceClockState(nvperfDeviceIndex, clockSetting);
+    }
+
+    inline bool D3DSetDeviceClockState(IDXGIAdapter* pDXGIAdapter, NVPW_Device_ClockStatus clockStatus)
+    {
+        size_t nvperfDeviceIndex = D3DGetNvperfDeviceIndex(pDXGIAdapter);
+        return SetDeviceClockState(nvperfDeviceIndex, clockStatus);
+    }
+}}
--- a/ruins64k/tools/NvPerfUtility/include/NvPerfD3D11.h
+++ b/ruins64k/tools/NvPerfUtility/include/NvPerfD3D11.h
@@ -0,0 +1,252 @@
+/*
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+#pragma once
+
+#include "NvPerfD3D.h"
+
+#include "nvperf_d3d11_host.h"
+#include "nvperf_d3d11_target.h"
+#include <D3D11.h>
+#include <atlbase.h>
+
+namespace nv { namespace perf {
+
+    //
+    // D3D11 Only Utilities
+    //
+
+    inline bool D3D11FindAdapterForDevice(ID3D11Device* pDevice, IDXGIAdapter** ppDXGIAdapter, DXGI_ADAPTER_DESC* pAdapterDesc = nullptr)
+    {
+        CComPtr<IDXGIDevice> pDXGIDevice;
+        HRESULT hr = pDevice->QueryInterface(IID_PPV_ARGS(&pDXGIDevice));
+        if (FAILED(hr))
+        {
+            return false;
+        }
+
+        hr = pDXGIDevice->GetAdapter(ppDXGIAdapter);
+        if (FAILED(hr))
+        {
+            return false;
+        }
+
+        if (pAdapterDesc)
+        {
+            hr = (*ppDXGIAdapter)->GetDesc(pAdapterDesc);
+            if (FAILED(hr))
+            {
+                return false;;
+            }
+        }
+
+        return true;
+    }
+
+    inline std::wstring D3D11GetDeviceName(ID3D11Device* pDevice)
+    {
+      DXGI_ADAPTER_DESC adapterDesc = {};
+      CComPtr<IDXGIAdapter> pDXGIAdapter;
+      if (!D3D11FindAdapterForDevice(pDevice, &pDXGIAdapter, &adapterDesc))
+      {
+          return L"";
+      }
+
+      return adapterDesc.Description;
+    }
+
+    inline bool D3D11IsNvidiaDevice(ID3D11Device* pDevice)
+    {
+        CComPtr<IDXGIAdapter> pDXGIAdapter;
+        if (!D3D11FindAdapterForDevice(pDevice, &pDXGIAdapter))
+        {
+            return false;
+        }
+
+        const bool isNvidiaDevice = DxgiIsNvidiaDevice(pDXGIAdapter);
+
+        return isNvidiaDevice;
+    }
+
+    inline bool D3D11IsNvidiaDevice(ID3D11DeviceContext* pDeviceContext)
+    {
+        CComPtr<ID3D11Device> pDevice;
+        pDeviceContext->GetDevice(&pDevice);
+        if (!pDevice)
+        {
+            return false;
+        }
+
+        const bool isNvidiaDevice = D3D11IsNvidiaDevice(pDevice);
+        return isNvidiaDevice;
+    }
+
+    //
+    // D3D11 NvPerf Utilities
+    //
+
+    inline bool D3D11LoadDriver()
+    {
+        NVPW_D3D11_LoadDriver_Params loadDriverParams = { NVPW_D3D11_LoadDriver_Params_STRUCT_SIZE };
+        NVPA_Status nvpaStatus = NVPW_D3D11_LoadDriver(&loadDriverParams);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(10, "NVPW_D3D11_LoadDriver failed\n");
+            return false;
+        }
+        return true;
+    }
+
+    inline size_t D3D11GetNvperfDeviceIndex(ID3D11Device* pDevice, size_t sliIndex = 0)
+    {
+        NVPW_D3D11_Device_GetDeviceIndex_Params getDeviceIndexParams = { NVPW_D3D11_Device_GetDeviceIndex_Params_STRUCT_SIZE };
+        getDeviceIndexParams.pDevice = pDevice;
+        getDeviceIndexParams.sliIndex = sliIndex;
+        NVPA_Status nvpaStatus = NVPW_D3D11_Device_GetDeviceIndex(&getDeviceIndexParams);
+        if (nvpaStatus)
+        {
+            return ~size_t(0);
+        }
+
+        return getDeviceIndexParams.deviceIndex;
+    }
+
+    inline DeviceIdentifiers D3D11GetDeviceIdentifiers(ID3D11Device* pDevice, size_t sliIndex = 0)
+    {
+        CComPtr<IDXGIAdapter> pDXGIAdapter;
+        if (!D3D11FindAdapterForDevice(pDevice, &pDXGIAdapter))
+        {
+            return {};
+        }
+
+        return D3DGetDeviceIdentifiers(pDXGIAdapter, sliIndex);
+    }
+
+    inline NVPW_Device_ClockStatus D3D11GetDeviceClockState(ID3D11Device* pDevice)
+    {
+        size_t nvperfDeviceIndex = D3D11GetNvperfDeviceIndex(pDevice);
+        return GetDeviceClockState(nvperfDeviceIndex);
+    }
+
+    inline bool D3D11SetDeviceClockState(ID3D11Device* pDevice, NVPW_Device_ClockSetting clockSetting)
+    {
+        size_t nvperfDeviceIndex = D3D11GetNvperfDeviceIndex(pDevice);
+        return SetDeviceClockState(nvperfDeviceIndex, clockSetting);
+    }
+
+    inline bool D3D11SetDeviceClockState(ID3D11Device* pDevice, NVPW_Device_ClockStatus clockStatus)
+    {
+        size_t nvperfDeviceIndex = D3D11GetNvperfDeviceIndex(pDevice);
+        return SetDeviceClockState(nvperfDeviceIndex, clockStatus);
+    }
+
+    inline size_t D3D11CalculateMetricsEvaluatorScratchBufferSize(const char* pChipName)
+    {
+        NVPW_D3D11_MetricsEvaluator_CalculateScratchBufferSize_Params calculateScratchBufferSizeParams = { NVPW_D3D11_MetricsEvaluator_CalculateScratchBufferSize_Params_STRUCT_SIZE };
+        calculateScratchBufferSizeParams.pChipName = pChipName;
+        NVPA_Status nvpaStatus = NVPW_D3D11_MetricsEvaluator_CalculateScratchBufferSize(&calculateScratchBufferSizeParams);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(20, "NVPW_D3D11_MetricsEvaluator_CalculateScratchBufferSize failed\n");
+            return 0;
+        }
+        return calculateScratchBufferSizeParams.scratchBufferSize;
+    }
+
+    inline NVPW_MetricsEvaluator* D3D11CreateMetricsEvaluator(uint8_t* pScratchBuffer, size_t scratchBufferSize, const char* pChipName)
+    {
+        NVPW_D3D11_MetricsEvaluator_Initialize_Params initializeParams = { NVPW_D3D11_MetricsEvaluator_Initialize_Params_STRUCT_SIZE };
+        initializeParams.pScratchBuffer = pScratchBuffer;
+        initializeParams.scratchBufferSize = scratchBufferSize;
+        initializeParams.pChipName = pChipName;
+        NVPA_Status nvpaStatus = NVPW_D3D11_MetricsEvaluator_Initialize(&initializeParams);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(20, "NVPW_D3D11_MetricsEvaluator_Initialize failed\n");
+            return nullptr;
+        }
+        return initializeParams.pMetricsEvaluator;
+    }
+}}
+
+namespace nv { namespace perf { namespace profiler {
+
+    inline NVPA_RawMetricsConfig* D3D11CreateRawMetricsConfig(const char* pChipName)
+    {
+        NVPW_D3D11_RawMetricsConfig_Create_Params configParams = { NVPW_D3D11_RawMetricsConfig_Create_Params_STRUCT_SIZE };
+        configParams.activityKind = NVPA_ACTIVITY_KIND_PROFILER;
+        configParams.pChipName = pChipName;
+
+        NVPA_Status nvpaStatus = NVPW_D3D11_RawMetricsConfig_Create(&configParams);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(20, "NVPW_D3D11_RawMetricsConfig_Create failed\n");
+            return nullptr;
+        }
+
+        return configParams.pRawMetricsConfig;
+    }
+
+    inline bool D3D11IsGpuSupported(ID3D11Device* pDevice, size_t sliIndex = 0)
+    {
+        const size_t deviceIndex = D3D11GetNvperfDeviceIndex(pDevice, sliIndex);
+        if (deviceIndex == ~size_t(0))
+        {
+            NV_PERF_LOG_ERR(10, "D3D11GetNvperfDeviceIndex failed on %ls\n", D3D11GetDeviceName(pDevice).c_str());
+            return false;
+        }
+
+        NVPW_D3D11_Profiler_IsGpuSupported_Params params = { NVPW_D3D11_Profiler_IsGpuSupported_Params_STRUCT_SIZE };
+        params.deviceIndex = deviceIndex;
+        NVPA_Status nvpaStatus = NVPW_D3D11_Profiler_IsGpuSupported(&params);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(10, "NVPW_D3D11_Profiler_IsGpuSupported failed on %ls\n", D3D11GetDeviceName(pDevice).c_str());
+            return false;
+        }
+
+        if (!params.isSupported)
+        {
+            NV_PERF_LOG_ERR(10, "%ls is not supported\n", D3D11GetDeviceName(pDevice).c_str());
+            if (params.gpuArchitectureSupportLevel != NVPW_GPU_ARCHITECTURE_SUPPORT_LEVEL_SUPPORTED)
+            {
+                const DeviceIdentifiers deviceIdentifiers = D3D11GetDeviceIdentifiers(pDevice, sliIndex);
+                NV_PERF_LOG_ERR(10, "Unsupported GPU architecture %s\n", deviceIdentifiers.pChipName);
+            }
+            if (params.sliSupportLevel == NVPW_SLI_SUPPORT_LEVEL_UNSUPPORTED)
+            {
+                NV_PERF_LOG_ERR(10, "Devices in SLI configuration are not supported.\n");
+            }
+            return false;
+        }
+
+        return true;
+    }
+
+    inline bool D3D11IsGpuSupported(ID3D11DeviceContext* pDeviceContext, size_t sliIndex = 0)
+    {
+        CComPtr<ID3D11Device> pDevice;
+        pDeviceContext->GetDevice(&pDevice);
+        if (!pDevice)
+        {
+            return false;
+        }
+
+        const bool isGpuSupported = D3D11IsGpuSupported(pDevice, sliIndex);
+        return isGpuSupported;
+    }
+
+}}}
--- a/ruins64k/tools/NvPerfUtility/include/NvPerfD3D12.h
+++ b/ruins64k/tools/NvPerfUtility/include/NvPerfD3D12.h
@@ -0,0 +1,351 @@
+/*
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+#pragma once
+
+#include "NvPerfInit.h"
+#include "NvPerfDeviceProperties.h"
+#include "NvPerfD3D.h"
+#include "nvperf_d3d12_host.h"
+#include "nvperf_d3d12_target.h"
+#include <D3D12.h>
+#include <atlbase.h>
+
+namespace nv { namespace perf {
+
+    //
+    // D3D Only Utilities
+    //
+
+    inline bool D3D12FindAdapterForDevice(ID3D12Device* pDevice, IDXGIAdapter1** ppDXGIAdapter, DXGI_ADAPTER_DESC1* pAdapterDesc = nullptr)
+    {
+        const LUID deviceLuid = pDevice->GetAdapterLuid();
+
+        CComPtr<IDXGIFactory1> pDXGIFactory;
+        HRESULT hr = CreateDXGIFactory1(IID_PPV_ARGS(&pDXGIFactory));
+        if (FAILED(hr))
+        {
+            return false;
+        }
+
+        for (UINT adapterIndex = 0; ; ++adapterIndex)
+        {
+            CComPtr<IDXGIAdapter1> pDXGIAdapter;
+            hr = pDXGIFactory->EnumAdapters1(adapterIndex, &pDXGIAdapter);
+            if (FAILED(hr))
+            {
+                break; // the intended loop termination
+            }
+
+            DXGI_ADAPTER_DESC1 adapterDesc = {};
+            HRESULT hr = pDXGIAdapter->GetDesc1(&adapterDesc);
+            if (FAILED(hr))
+            {
+                continue;
+            }
+
+            if (!memcmp(&adapterDesc.AdapterLuid, &deviceLuid, sizeof(deviceLuid)))
+            {
+                *ppDXGIAdapter = pDXGIAdapter.Detach();
+                if (pAdapterDesc)
+                {
+                    *pAdapterDesc = adapterDesc;
+                }
+                return true;
+            }
+        }
+
+        return false;
+    }
+
+    inline std::wstring D3D12GetDeviceName(ID3D12Device* pDevice)
+    {
+        DXGI_ADAPTER_DESC1 adapterDesc = {};
+        CComPtr<IDXGIAdapter1> pDXGIAdapter;
+        if (!D3D12FindAdapterForDevice(pDevice, &pDXGIAdapter, &adapterDesc))
+        {
+            return L"";
+        }
+
+        return adapterDesc.Description;
+    }
+
+
+    inline bool D3D12IsNvidiaDevice(ID3D12Device* pDevice)
+    {
+        CComPtr<IDXGIAdapter1> pDXGIAdapter;
+        if (!D3D12FindAdapterForDevice(pDevice, &pDXGIAdapter))
+        {
+            return false;
+        }
+
+        const bool isNvidiaDevice = DxgiIsNvidiaDevice(pDXGIAdapter);
+        return isNvidiaDevice;
+    }
+
+    inline bool D3D12IsNvidiaDevice(ID3D12CommandQueue* pCommandQueue)
+    {
+        CComPtr<ID3D12Device> pDevice;
+        HRESULT hr = pCommandQueue->GetDevice(IID_PPV_ARGS(&pDevice));
+        if (FAILED(hr))
+        {
+            return false;
+        }
+
+        const bool isNvidiaDevice = D3D12IsNvidiaDevice(pDevice);
+        return isNvidiaDevice;
+    }
+
+    //
+    // D3D12 NvPerf Utilities
+    //
+
+    inline bool D3D12LoadDriver()
+    {
+        NVPW_D3D12_LoadDriver_Params loadDriverParams = { NVPW_D3D12_LoadDriver_Params_STRUCT_SIZE };
+        NVPA_Status nvpaStatus = NVPW_D3D12_LoadDriver(&loadDriverParams);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(10, "NVPW_D3D12_LoadDriver failed\n");
+            return false;
+        }
+        return true;
+    }
+
+
+    inline size_t D3D12GetNvperfDeviceIndex(ID3D12Device* pDevice, size_t sliIndex = 0)
+    {
+        NVPW_D3D12_Device_GetDeviceIndex_Params getDeviceIndexParams = { NVPW_D3D12_Device_GetDeviceIndex_Params_STRUCT_SIZE };
+        getDeviceIndexParams.pDevice = pDevice;
+        getDeviceIndexParams.sliIndex = sliIndex;
+        NVPA_Status nvpaStatus = NVPW_D3D12_Device_GetDeviceIndex(&getDeviceIndexParams);
+        if (nvpaStatus)
+        {
+            return ~size_t(0);
+        }
+
+        return getDeviceIndexParams.deviceIndex;
+    }
+
+    inline DeviceIdentifiers D3D12GetDeviceIdentifiers(ID3D12Device* pDevice, size_t sliIndex = 0)
+    {
+        CComPtr<IDXGIAdapter1> pDXGIAdapter;
+        if (!D3D12FindAdapterForDevice(pDevice, &pDXGIAdapter))
+        {
+            return {};
+        }
+
+        return D3DGetDeviceIdentifiers(pDXGIAdapter, sliIndex);
+    }
+
+    inline NVPW_Device_ClockStatus D3D12GetDeviceClockState(ID3D12Device* pDevice)
+    {
+        size_t nvperfDeviceIndex = D3D12GetNvperfDeviceIndex(pDevice);
+        return GetDeviceClockState(nvperfDeviceIndex);
+    }
+
+    inline bool D3D12SetDeviceClockState(ID3D12Device* pDevice, NVPW_Device_ClockSetting clockSetting)
+    {
+        size_t nvperfDeviceIndex = D3D12GetNvperfDeviceIndex(pDevice);
+        return SetDeviceClockState(nvperfDeviceIndex, clockSetting);
+    }
+
+    inline bool D3D12SetDeviceClockState(ID3D12Device* pDevice, NVPW_Device_ClockStatus clockStatus)
+    {
+        size_t nvperfDeviceIndex = D3D12GetNvperfDeviceIndex(pDevice);
+        return SetDeviceClockState(nvperfDeviceIndex, clockStatus);
+    }
+
+    inline size_t D3D12CalculateMetricsEvaluatorScratchBufferSize(const char* pChipName)
+    {
+        NVPW_D3D12_MetricsEvaluator_CalculateScratchBufferSize_Params calculateScratchBufferSizeParams = { NVPW_D3D12_MetricsEvaluator_CalculateScratchBufferSize_Params_STRUCT_SIZE };
+        calculateScratchBufferSizeParams.pChipName = pChipName;
+        NVPA_Status nvpaStatus = NVPW_D3D12_MetricsEvaluator_CalculateScratchBufferSize(&calculateScratchBufferSizeParams);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(20, "NVPW_D3D12_MetricsEvaluator_CalculateScratchBufferSize failed\n");
+            return 0;
+        }
+        return calculateScratchBufferSizeParams.scratchBufferSize;
+    }
+
+    inline NVPW_MetricsEvaluator* D3D12CreateMetricsEvaluator(uint8_t* pScratchBuffer, size_t scratchBufferSize, const char* pChipName)
+    {
+        NVPW_D3D12_MetricsEvaluator_Initialize_Params initializeParams = { NVPW_D3D12_MetricsEvaluator_Initialize_Params_STRUCT_SIZE };
+        initializeParams.pScratchBuffer = pScratchBuffer;
+        initializeParams.scratchBufferSize = scratchBufferSize;
+        initializeParams.pChipName = pChipName;
+        NVPA_Status nvpaStatus = NVPW_D3D12_MetricsEvaluator_Initialize(&initializeParams);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(20, "NVPW_D3D12_MetricsEvaluator_Initialize failed\n");
+            return nullptr;
+        }
+        return initializeParams.pMetricsEvaluator;
+    }
+
+}}
+
+namespace nv { namespace perf { namespace profiler {
+
+    inline NVPA_RawMetricsConfig* D3D12CreateRawMetricsConfig(const char* pChipName)
+    {
+        NVPW_D3D12_RawMetricsConfig_Create_Params configParams = { NVPW_D3D12_RawMetricsConfig_Create_Params_STRUCT_SIZE };
+        configParams.activityKind = NVPA_ACTIVITY_KIND_PROFILER;
+        configParams.pChipName = pChipName;
+
+        NVPA_Status nvpaStatus = NVPW_D3D12_RawMetricsConfig_Create(&configParams);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(20, "NVPW_D3D12_RawMetricsConfig_Create failed\n");
+            return nullptr;
+        }
+
+        return configParams.pRawMetricsConfig;
+    }
+
+    inline bool D3D12IsGpuSupported(ID3D12Device* pDevice, size_t sliIndex = 0)
+    {
+        const size_t deviceIndex = D3D12GetNvperfDeviceIndex(pDevice, sliIndex);
+        if (deviceIndex == ~size_t(0))
+        {
+            NV_PERF_LOG_ERR(10, "D3D12GetNvperfDeviceIndex failed on %ls\n", D3D12GetDeviceName(pDevice).c_str());
+            return false;
+        }
+
+        NVPW_D3D12_Profiler_IsGpuSupported_Params params = { NVPW_D3D12_Profiler_IsGpuSupported_Params_STRUCT_SIZE };
+        params.deviceIndex = deviceIndex;
+        NVPA_Status nvpaStatus = NVPW_D3D12_Profiler_IsGpuSupported(&params);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(10, "NVPW_D3D12_Profiler_IsGpuSupported failed on %ls\n", D3D12GetDeviceName(pDevice).c_str());
+            return false;
+        }
+
+        if (!params.isSupported)
+        {
+            NV_PERF_LOG_ERR(10, "%ls is not supported\n", D3D12GetDeviceName(pDevice).c_str());
+            if (params.gpuArchitectureSupportLevel != NVPW_GPU_ARCHITECTURE_SUPPORT_LEVEL_SUPPORTED)
+            {
+                const DeviceIdentifiers deviceIdentifiers = D3D12GetDeviceIdentifiers(pDevice, sliIndex);
+                NV_PERF_LOG_ERR(10, "Unsupported GPU architecture %s\n", deviceIdentifiers.pChipName);
+            }
+            if (params.sliSupportLevel == NVPW_SLI_SUPPORT_LEVEL_UNSUPPORTED)
+            {
+                NV_PERF_LOG_ERR(10, "Devices in SLI configuration are not supported.\n");
+            }
+            return false;
+        }
+
+        return true;
+    }
+
+    inline bool D3D12IsGpuSupported(ID3D12CommandQueue* pCommandQueue, size_t sliIndex = 0)
+    {
+        CComPtr<ID3D12Device> pDevice;
+        HRESULT hr = pCommandQueue->GetDevice(IID_PPV_ARGS(&pDevice));
+        if (FAILED(hr))
+        {
+            return false;
+        }
+
+        const bool isGpuSupported = D3D12IsGpuSupported(pDevice, sliIndex);
+        return isGpuSupported;
+    }
+
+
+    inline bool D3D12PushRange(ID3D12GraphicsCommandList* pCommandList, const char* pRangeName)
+    {
+        NVPW_D3D12_Profiler_CommandList_PushRange_Params pushRangeParams = { NVPW_D3D12_Profiler_CommandList_PushRange_Params_STRUCT_SIZE };
+        pushRangeParams.pRangeName = pRangeName;
+        pushRangeParams.rangeNameLength = 0;
+        pushRangeParams.pCommandList = pCommandList;
+        NVPA_Status nvpaStatus = NVPW_D3D12_Profiler_CommandList_PushRange(&pushRangeParams);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(50, "NVPW_D3D12_Profiler_CommandList_PushRange failed\n");
+            return false;
+        }
+        return true;
+    }
+
+    inline bool D3D12PopRange(ID3D12GraphicsCommandList* pCommandList)
+    {
+        NVPW_D3D12_Profiler_CommandList_PopRange_Params popParams = { NVPW_D3D12_Profiler_CommandList_PopRange_Params_STRUCT_SIZE };
+        popParams.pCommandList = pCommandList;
+        NVPA_Status nvpaStatus = NVPW_D3D12_Profiler_CommandList_PopRange(&popParams);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(50, "NVPW_D3D12_Profiler_CommandList_PopRange failed\n");
+            return false;
+        }
+        return true;
+    }
+
+    inline bool D3D12PushRange_Nop(ID3D12GraphicsCommandList* pCommandList, const char* pRangeName)
+    {
+        return false;
+    }
+
+    inline bool D3D12PopRange_Nop(ID3D12GraphicsCommandList* pCommandList)
+    {
+        return false;
+    }
+
+    // 
+    struct D3D12RangeCommands
+    {
+        bool isNvidiaDevice;
+        bool(*PushRange)(ID3D12GraphicsCommandList* pCommandList, const char* pRangeName);
+        bool(*PopRange)(ID3D12GraphicsCommandList* pCommandList);
+
+    public:
+        D3D12RangeCommands()
+            : isNvidiaDevice(false)
+            , PushRange(&D3D12PushRange_Nop)
+            , PopRange(&D3D12PopRange_Nop)
+        {
+        }
+
+        void Initialize(bool isNvidiaDevice_)
+        {
+            isNvidiaDevice = isNvidiaDevice_;
+            if (isNvidiaDevice_)
+            {
+                PushRange = &D3D12PushRange;
+                PopRange = &D3D12PopRange;
+            }
+            else
+            {
+                PushRange = &D3D12PushRange_Nop;
+                PopRange = &D3D12PopRange_Nop;
+            }
+        }
+    
+        void Initialize(IDXGIAdapter* pDXGIAdapter)
+        {
+            const bool isNvidiaDevice_ = DxgiIsNvidiaDevice(pDXGIAdapter);
+            return Initialize(isNvidiaDevice_);
+        }
+
+        void Initialize(ID3D12Device* pDevice)
+        {
+            const bool isNvidiaDevice_ = D3D12IsNvidiaDevice(pDevice);
+            return Initialize(isNvidiaDevice_);
+        }
+    };
+
+}}}
--- a/ruins64k/tools/NvPerfUtility/include/NvPerfDeviceProperties.h
+++ b/ruins64k/tools/NvPerfUtility/include/NvPerfDeviceProperties.h
@@ -0,0 +1,125 @@
+/*
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+#pragma once
+
+#include "nvperf_host.h"
+#include "nvperf_target.h"
+#include "NvPerfInit.h"
+#include <vector>
+
+namespace nv { namespace perf {
+    enum
+    {
+        NVIDIA_VENDOR_ID = 0x10de
+    };
+
+    struct DeviceIdentifiers
+    {
+        const char* pDeviceName;
+        const char* pChipName;
+    };
+
+    inline DeviceIdentifiers GetDeviceIdentifiers(size_t deviceIndex)
+    {
+        NVPW_Device_GetNames_Params getNamesParams = { NVPW_Device_GetNames_Params_STRUCT_SIZE };
+        getNamesParams.deviceIndex = deviceIndex;
+        NVPA_Status nvpaStatus = NVPW_Device_GetNames(&getNamesParams);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(10, "NVPW_Device_GetNames failed\n");
+            return {};
+        }
+
+        DeviceIdentifiers deviceIdentifiers = {};
+        deviceIdentifiers.pDeviceName = getNamesParams.pDeviceName;
+        deviceIdentifiers.pChipName = getNamesParams.pChipName;
+
+        return deviceIdentifiers;
+    }
+
+    inline NVPW_Device_ClockStatus GetDeviceClockState(size_t nvperfDeviceIndex)
+    {
+        NVPW_Device_GetClockStatus_Params getClockStatusParams = { NVPW_Device_GetClockStatus_Params_STRUCT_SIZE };
+        getClockStatusParams.deviceIndex = nvperfDeviceIndex;
+        NVPA_Status nvpaStatus = NVPW_Device_GetClockStatus(&getClockStatusParams);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(10, "NVPW_Device_GetClockStatus() failed on %s\n", GetDeviceIdentifiers(nvperfDeviceIndex).pDeviceName);
+            return NVPW_DEVICE_CLOCK_STATUS_UNKNOWN;
+        }
+        return getClockStatusParams.clockStatus;
+    }
+
+    inline const char* ToCString(NVPW_Device_ClockSetting clockSetting)
+    {
+        switch(clockSetting)
+        {
+            case NVPW_DEVICE_CLOCK_SETTING_INVALID: return "NVPW_DEVICE_CLOCK_SETTING_INVALID";
+            case NVPW_DEVICE_CLOCK_SETTING_DEFAULT: return "NVPW_DEVICE_CLOCK_SETTING_DEFAULT";
+            case NVPW_DEVICE_CLOCK_SETTING_LOCK_TO_RATED_TDP: return "NVPW_DEVICE_CLOCK_SETTING_LOCK_TO_RATED_TDP";
+            default: return "Unknown NVPW_Device_ClockSetting";
+        }
+    }
+
+    inline bool SetDeviceClockState(size_t nvperfDeviceIndex, NVPW_Device_ClockSetting clockSetting)
+    {
+        NVPW_Device_SetClockSetting_Params setClockSettingParams = { NVPW_Device_SetClockSetting_Params_STRUCT_SIZE };
+        setClockSettingParams.deviceIndex = nvperfDeviceIndex;
+        setClockSettingParams.clockSetting = clockSetting;
+        NVPA_Status nvpaStatus = NVPW_Device_SetClockSetting(&setClockSettingParams);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(10, "NVPW_Device_SetClockSetting( %s ) failed on %s\n", ToCString(clockSetting), GetDeviceIdentifiers(nvperfDeviceIndex).pDeviceName);
+            return false;
+        }
+        return true;
+    }
+
+    inline const char* ToCString(NVPW_Device_ClockStatus clockStatus)
+    {
+        switch(clockStatus)
+        {
+            case NVPW_DEVICE_CLOCK_STATUS_UNKNOWN: return "NVPW_DEVICE_CLOCK_STATUS_UNKNOWN";
+            case NVPW_DEVICE_CLOCK_STATUS_LOCKED_TO_RATED_TDP: return "NVPW_DEVICE_CLOCK_STATUS_LOCKED_TO_RATED_TDP";
+            case NVPW_DEVICE_CLOCK_STATUS_BOOST_ENABLED: return "NVPW_DEVICE_CLOCK_STATUS_BOOST_ENABLED";
+            case NVPW_DEVICE_CLOCK_STATUS_BOOST_DISABLED: return "NVPW_DEVICE_CLOCK_STATUS_BOOST_DISABLED";
+            case NVPW_DEVICE_CLOCK_STATUS__COUNT: return "NVPW_DEVICE_CLOCK_STATUS__COUNT";
+            default: return "Unknown NVPW_Device_ClockStatus";
+        }
+    }
+
+    inline bool SetDeviceClockState(size_t nvperfDeviceIndex, NVPW_Device_ClockStatus clockStatus)
+    {
+        // convert to NVPW_Device_ClockSetting
+        NVPW_Device_ClockSetting clockSetting = NVPW_DEVICE_CLOCK_SETTING_INVALID;
+        switch (clockStatus)
+        {
+            case NVPW_DEVICE_CLOCK_STATUS_UNKNOWN:
+            case NVPW_DEVICE_CLOCK_STATUS_BOOST_ENABLED:
+            case NVPW_DEVICE_CLOCK_STATUS_BOOST_DISABLED:
+                // default driver setting (normally unlocked and not boosted, but could be unlocked boosted, or locked to rated TDP)
+                clockSetting = NVPW_DEVICE_CLOCK_SETTING_DEFAULT;
+                break;
+            case NVPW_DEVICE_CLOCK_STATUS_LOCKED_TO_RATED_TDP:
+                clockSetting = NVPW_DEVICE_CLOCK_SETTING_LOCK_TO_RATED_TDP;
+                break;
+            default:
+                NV_PERF_LOG_ERR(10, "Invalid clockStatus: %s\n", ToCString(clockStatus));
+                return false;
+        }
+        return SetDeviceClockState(nvperfDeviceIndex, clockSetting);
+    }
+}}
--- a/ruins64k/tools/NvPerfUtility/include/NvPerfInit.h
+++ b/ruins64k/tools/NvPerfUtility/include/NvPerfInit.h
@@ -0,0 +1,432 @@
+/*
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+#pragma once
+
+#include <stdio.h>
+#include <stdarg.h>
+#include <string>
+#include <cassert>
+#include "nvperf_host.h"
+#include "nvperf_target.h"
+#if defined(_WIN32)
+#include <Windows.h>
+#else
+#include <sys/time.h>
+#endif
+
+namespace nv { namespace perf {
+
+    inline int FormatTimeCommon(char* pBuf, size_t size, uint32_t hour, uint32_t minute, uint32_t second, uint32_t milliSecond)
+    {
+        const int written = snprintf(pBuf, size, "%02u:%02u:%02u:%03u", hour, minute, second, milliSecond);
+        return written;
+    }
+
+    inline int FormatDateCommon(char* pBuf, size_t size, uint32_t year, uint32_t month, uint32_t day)
+    {
+        const char* pMonth = [&](){
+            static const char* s_months[12] = {
+                "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
+            };
+
+            if (1 <= month && month <= 12)
+            {
+                return s_months[month - 1];
+            }
+            return "???";
+        }();
+        const int written = snprintf(pBuf, size, "%4u-%s-%02u", year, pMonth, day);
+        return written;
+    }
+
+#if defined(_WIN32)
+    typedef struct _FILETIME LogTimeStamp;
+
+    inline void UserLogImplPlatform(const char* pMessage)
+    {
+        OutputDebugStringA(pMessage);
+    }
+
+    inline void GetTimeStamp(LogTimeStamp* pTimestamp)
+    {
+        GetSystemTimeAsFileTime(pTimestamp);
+    }
+
+    inline size_t FormatTime(LogTimeStamp* pTimestamp, char* pBuf, size_t size)
+    {
+        SYSTEMTIME utc, stime;
+        FileTimeToSystemTime(pTimestamp, &utc);
+        SystemTimeToTzSpecificLocalTime(NULL, &utc, &stime);
+        return FormatTimeCommon(pBuf, size, (uint32_t)stime.wHour, (uint32_t)stime.wMinute, (uint32_t)stime.wSecond, (uint32_t)stime.wMilliseconds);
+    }
+
+    inline size_t FormatDate(LogTimeStamp* pTimestamp, char* pBuf, size_t size)
+    {
+        SYSTEMTIME utc, stime;
+        FileTimeToSystemTime(pTimestamp, &utc);
+        SystemTimeToTzSpecificLocalTime(NULL, &utc, &stime);
+        return FormatDateCommon(pBuf, size, (uint32_t)stime.wYear, (uint32_t)stime.wMonth, (uint32_t)stime.wDay);
+    }
+#else // !defined(_WIN32)
+    typedef struct timeval LogTimeStamp;
+
+    inline void UserLogImplPlatform(const char* pMessage)
+    {
+        (void*)pMessage;
+    }
+
+    inline void GetTimeStamp(LogTimeStamp* pTimestamp)
+    {
+        gettimeofday(pTimestamp, 0);
+    }
+
+    inline size_t FormatTime(LogTimeStamp* pTimestamp, char* pBuf, size_t size)
+    {
+        const struct tm* ltm = localtime(&pTimestamp->tv_sec);
+        int milliseconds = pTimestamp->tv_usec / 1000;
+        return FormatTimeCommon(pBuf, size, (uint32_t)ltm->tm_hour, (uint32_t)ltm->tm_min, (uint32_t)ltm->tm_sec, (uint32_t)milliseconds);
+    }
+
+    inline size_t FormatDate(LogTimeStamp* pTimestamp, char* pBuf, size_t size)
+    {
+        const struct tm* ltm = localtime(&pTimestamp->tv_sec);
+        return FormatDateCommon(pBuf, size, (uint32_t)ltm->tm_year + 1900, (uint32_t)ltm->tm_mon + 1, (uint32_t)ltm->tm_mday);
+    }
+#endif // defined(_WIN32)
+
+}}
+
+#ifndef NV_PERF_LOG_INF
+#define NV_PERF_LOG_INF(level_, ...) ::nv::perf::UserLog(LogSeverity::Inf, level_, __FUNCTION__, __VA_ARGS__)
+#endif
+
+#ifndef NV_PERF_LOG_WRN
+#define NV_PERF_LOG_WRN(level_, ...) ::nv::perf::UserLog(LogSeverity::Wrn, level_, __FUNCTION__, __VA_ARGS__)
+#endif
+
+#ifndef NV_PERF_LOG_ERR
+#define NV_PERF_LOG_ERR(level_, ...) ::nv::perf::UserLog(LogSeverity::Err, level_, __FUNCTION__, __VA_ARGS__)
+#endif
+
+namespace nv { namespace perf {
+
+    enum class LogSeverity
+    {
+        Inf,
+        Wrn,
+        Err,
+        COUNT
+    };
+
+    struct LogSettings
+    {
+        uint32_t volumeLevels[(unsigned)LogSeverity::COUNT] = { 50, 50, 50 };
+
+#if defined(_WIN32)
+        bool writePlatform = true;
+#else
+        bool writePlatform = false;
+#endif
+        bool writeStderr                        = true;
+        FILE* writeFileFD                       = nullptr;
+        bool appendToFile                       = true;
+        LogSeverity flushFileSeverity           = LogSeverity::Err;
+
+        bool logDate                            = true;
+        bool logTime                            = true;
+
+        LogSettings()
+        {
+#if defined(_WIN32)
+            {
+                const char* const pEnvValue = getenv("NV_PERF_LOG_ENABLE_PLATFORM");
+                if (pEnvValue)
+                {
+                    char* pEnd = nullptr;
+                    writePlatform = !!strtol(pEnvValue, &pEnd, 0);
+                }
+            }
+#endif
+            {
+                const char* const pEnvValue = getenv("NV_PERF_LOG_ENABLE_STDERR");
+                if (pEnvValue)
+                {
+                    char* pEnd = nullptr;
+                    writeStderr = !!strtol(pEnvValue, &pEnd, 0);
+                }
+            }
+            {
+                const char* const pEnvValue = getenv("NV_PERF_LOG_ENABLE_FILE");
+                if (pEnvValue)
+                {
+                    FILE* fp = fopen(pEnvValue, appendToFile ? "a" : "w");
+                    assert(fp);
+                    writeFileFD = fp;
+                }
+            }
+            {
+                const char* const pEnvValue = getenv("NV_PERF_LOG_FILE_FLUSH_SEVERITY");
+                if (pEnvValue)
+                {
+                    char* pEnd = nullptr;
+                    int severity = strtol(pEnvValue, &pEnd, 0);
+                    if (0 <= severity && severity < (int)LogSeverity::COUNT)
+                    {
+                        flushFileSeverity = (LogSeverity)severity;
+                    }
+                }
+            }
+        }
+
+        ~LogSettings()
+        {
+            if (writeFileFD)
+            {
+                fclose(writeFileFD);
+            }
+        }
+    };
+
+
+    inline LogSettings* GetLogSettingsStorage_()
+    {
+        static LogSettings settings;
+        return &settings;
+    }
+
+    inline uint32_t GetLogVolumeLevel(LogSeverity severity)
+    {
+        LogSettings* pSettings = GetLogSettingsStorage_();
+        if ((uint32_t)severity < 3)
+        {
+            return pSettings->volumeLevels[(uint32_t)severity];
+        }
+        return 0;
+    }
+
+    // Higher values produce more log output.  0 <= volumeLevel <= 100
+    // Technically it's more like a noise floor (all messages below this level are treated as noise and discarded).
+    inline void SetLogVolumeLevel(LogSeverity severity, uint32_t volumeLevel)
+    {
+        LogSettings* pSettings = GetLogSettingsStorage_();
+        if ((uint32_t)severity < 3)
+        {
+            pSettings->volumeLevels[(uint32_t)severity] = volumeLevel;
+        }
+    }
+
+    inline void SetLogAppendToFile(bool enable)
+    {
+        LogSettings* pSettings = GetLogSettingsStorage_();
+        pSettings->appendToFile = enable;
+    }
+
+    inline void SetLogFlushSeverity(LogSeverity severity)
+    {
+        LogSettings* pSettings = GetLogSettingsStorage_();
+        if (0 <= (int)severity && (int)severity < (int)LogSeverity::COUNT)
+        {
+            pSettings->flushFileSeverity = severity;
+        }
+    }
+
+    inline void SetLogDate(bool enable)
+    {
+        LogSettings* pSettings = GetLogSettingsStorage_();
+        pSettings->logDate = enable;
+    }
+
+    inline void SetLogTime(bool enable)
+    {
+        LogSettings* pSettings = GetLogSettingsStorage_();
+        pSettings->logTime = enable;
+    }
+
+    inline bool UserLogEnablePlatform(bool enable)
+    {
+        LogSettings* pSettings = GetLogSettingsStorage_();
+        pSettings->writePlatform = enable;
+        return true;
+    }
+
+    inline bool UserLogEnableStderr(bool enable)
+    {
+        LogSettings* pSettings = GetLogSettingsStorage_();
+        pSettings->writeStderr = enable;
+        return true;
+    }
+
+    inline bool UserLogEnableFile(const char* filename)
+    {
+        LogSettings* pSettings = GetLogSettingsStorage_();
+        if (filename)
+        {
+            FILE* fp = fopen(filename, pSettings->appendToFile ? "a" : "w");
+            if (!fp)
+            {
+                return false;
+            }
+            pSettings->writeFileFD = fp;
+        }
+        return true;
+    }
+
+    inline void UserLogImplStderr(const char* pMessage) 
+    {
+        fprintf(stderr, "%s", pMessage);
+    }
+
+    inline void UserLogImplFile(const char* pMessage, FILE* fd)
+    {
+        fprintf(fd, "%s", pMessage);
+    }
+
+    inline void UserLogImplFileFlush(FILE* fd)
+    {
+        fflush(fd);
+    }
+
+    inline void UserLog(LogSeverity severity, uint32_t level, const char* pFunctionName, const char* pFormat, ...)
+    {
+        const uint32_t volumeLevel = GetLogVolumeLevel(severity);
+        if (volumeLevel < level)
+        {
+            return;
+        }
+
+        LogSettings& settings = *GetLogSettingsStorage_();
+
+        va_list args;
+
+        va_start(args, pFormat);
+        const int length = vsnprintf(nullptr, 0, pFormat, args);
+        va_end(args);
+
+        std::string str;
+        str.append(length + 1, ' ');
+        va_start(args, pFormat);
+        vsnprintf(&str[0], length+1, pFormat, args);
+        va_end(args);
+        str.back() = '\0'; // ensure NULL terminated
+
+        const char* const pPrefix = [&]() {
+            switch (severity)
+            {
+                case (LogSeverity::Inf): return "NVPERF|INF|";
+                case (LogSeverity::Wrn): return "NVPERF|WRN|";
+                case (LogSeverity::Err): return "NVPERF|ERR|";
+                default:                 return "NVPERF|???|";
+            }
+        }();
+
+        char datebuf[16];
+        char timebuf[16];
+        if (settings.logDate || settings.logTime)
+        {
+            LogTimeStamp time;
+            GetTimeStamp(&time);
+            if (settings.logDate)
+            {
+                FormatDate(&time, datebuf, sizeof(datebuf));
+            }
+            if (settings.logTime)
+            {
+                FormatTime(&time, timebuf, sizeof(timebuf));
+            }
+        }
+
+        if (settings.writePlatform)
+        {
+            UserLogImplPlatform(pPrefix);
+            if (settings.logDate)
+            {
+                UserLogImplPlatform(datebuf);
+                UserLogImplPlatform("|");
+            }
+            if (settings.logTime)
+            {
+                UserLogImplPlatform(timebuf);
+                UserLogImplPlatform("|");
+            }
+            UserLogImplPlatform(pFunctionName);
+            UserLogImplPlatform(" || ");
+            UserLogImplPlatform(str.c_str());
+        }
+        if (settings.writeStderr)
+        {
+            UserLogImplStderr(pPrefix);
+            if (settings.logDate)
+            {
+                UserLogImplStderr(datebuf);
+                UserLogImplStderr("|");
+            }
+            if (settings.logTime)
+            {
+                UserLogImplStderr(timebuf);
+                UserLogImplStderr("|");
+            }
+            UserLogImplStderr(pFunctionName);
+            UserLogImplStderr(" || ");
+            UserLogImplStderr(str.c_str());
+        }
+        if (settings.writeFileFD)
+        {
+            UserLogImplFile(pPrefix, settings.writeFileFD);
+            if (settings.logDate)
+            {
+                UserLogImplFile(datebuf, settings.writeFileFD);
+                UserLogImplFile("|", settings.writeFileFD);
+            }
+            if (settings.logTime)
+            {
+                UserLogImplFile(timebuf, settings.writeFileFD);
+                UserLogImplFile("|", settings.writeFileFD);
+            }
+            UserLogImplFile(pFunctionName, settings.writeFileFD);
+            UserLogImplFile(" || ", settings.writeFileFD);
+            UserLogImplFile(str.c_str(), settings.writeFileFD);
+            if (severity >= settings.flushFileSeverity)
+            {
+                UserLogImplFileFlush(settings.writeFileFD);
+            }
+        }
+    }
+
+    inline bool InitializeNvPerf()
+    {
+        NVPA_Status nvpaStatus;
+
+        NVPW_InitializeHost_Params initializeHostParams = { NVPW_InitializeHost_Params_STRUCT_SIZE };
+        nvpaStatus = NVPW_InitializeHost(&initializeHostParams);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(10, "NVPW_InitalizeHost failed\n");
+            return false;
+        }
+
+        NVPW_InitializeTarget_Params initializeTargetParams = { NVPW_InitializeTarget_Params_STRUCT_SIZE };
+        nvpaStatus = NVPW_InitializeTarget(&initializeTargetParams);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(10, "NVPW_InitializeTarget failed\n");
+            return false;
+        }
+
+        return true;
+    }
+
+}}
--- a/ruins64k/tools/NvPerfUtility/include/NvPerfMetricsConfigBuilder.h
+++ b/ruins64k/tools/NvPerfUtility/include/NvPerfMetricsConfigBuilder.h
@@ -0,0 +1,299 @@
+/*
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+#pragma once
+
+#include <utility>
+#include "NvPerfMetricsEvaluator.h"
+
+namespace nv { namespace perf {
+
+    class MetricsConfigBuilder
+    {
+    protected:
+        NVPW_MetricsEvaluator* m_pMetricsEvaluator;         // not owned
+        NVPA_RawMetricsConfig* m_pRawMetricsConfig;         // owned
+        NVPA_CounterDataBuilder* m_pCounterDataBuilder;     // owned
+        bool m_configuring;
+
+    protected:
+        void MoveAssign(MetricsConfigBuilder&& rhs)
+        {
+            Reset();
+            m_pMetricsEvaluator = rhs.m_pMetricsEvaluator;
+            m_pRawMetricsConfig = rhs.m_pRawMetricsConfig;
+            m_pCounterDataBuilder = rhs.m_pCounterDataBuilder;
+            m_configuring = rhs.m_configuring;
+
+            rhs.m_pMetricsEvaluator = nullptr;
+            rhs.m_pRawMetricsConfig = nullptr;
+            rhs.m_pCounterDataBuilder = nullptr;
+        }
+
+    public:
+        ~MetricsConfigBuilder()
+        {
+            Reset();
+        }
+        MetricsConfigBuilder() : m_pMetricsEvaluator(nullptr), m_pRawMetricsConfig(nullptr), m_pCounterDataBuilder(nullptr), m_configuring(false)
+        {
+        }
+        MetricsConfigBuilder(MetricsConfigBuilder&& rhs) : m_pMetricsEvaluator(nullptr), m_pRawMetricsConfig(nullptr), m_pCounterDataBuilder(nullptr), m_configuring(false)
+        {
+            MoveAssign(std::forward<MetricsConfigBuilder>(rhs));
+        }
+        MetricsConfigBuilder& operator=(MetricsConfigBuilder&& rhs)
+        {
+            MoveAssign(std::forward<MetricsConfigBuilder>(rhs));
+            return *this;
+        }
+
+        void Reset()
+        {
+            NVPW_RawMetricsConfig_Destroy_Params rawMetricsConfigParams = { NVPW_RawMetricsConfig_Destroy_Params_STRUCT_SIZE };
+            rawMetricsConfigParams.pRawMetricsConfig = m_pRawMetricsConfig;
+            NVPW_RawMetricsConfig_Destroy(&rawMetricsConfigParams);
+
+            NVPW_CounterDataBuilder_Destroy_Params counterDataBuilderParams = { NVPW_CounterDataBuilder_Destroy_Params_STRUCT_SIZE };
+            counterDataBuilderParams.pCounterDataBuilder = m_pCounterDataBuilder;
+            NVPW_CounterDataBuilder_Destroy(&counterDataBuilderParams);
+
+            m_pMetricsEvaluator     = nullptr;
+            m_pRawMetricsConfig     = nullptr;
+            m_pCounterDataBuilder   = nullptr;
+        }
+
+        bool Initialize(NVPW_MetricsEvaluator* pMetricsEvaluator, NVPA_RawMetricsConfig* pRawMetricsConfig, const char* chipName)
+        {
+            NVPA_Status nvpaStatus;
+
+            Reset(); // destroy any existing objects
+            m_pMetricsEvaluator = pMetricsEvaluator;
+            m_pRawMetricsConfig = pRawMetricsConfig;
+            NVPW_CounterDataBuilder_Create_Params counterDataBuilderParams = { NVPW_CounterDataBuilder_Create_Params_STRUCT_SIZE };
+            counterDataBuilderParams.pChipName = chipName;
+            nvpaStatus = NVPW_CounterDataBuilder_Create(&counterDataBuilderParams);
+            if (nvpaStatus)
+            {
+                return false;
+            }
+
+            m_pCounterDataBuilder = counterDataBuilderParams.pCounterDataBuilder;
+
+            NVPW_RawMetricsConfig_BeginPassGroup_Params beginPassGroupParams = { NVPW_RawMetricsConfig_BeginPassGroup_Params_STRUCT_SIZE };
+            beginPassGroupParams.pRawMetricsConfig = m_pRawMetricsConfig;
+            nvpaStatus = NVPW_RawMetricsConfig_BeginPassGroup(&beginPassGroupParams);
+            if (nvpaStatus)
+            {
+                return false;
+            }
+            m_configuring = true;
+
+            return true;
+        }
+
+        bool AddMetrics(const NVPW_MetricEvalRequest* pMetricEvalRequests, size_t numMetricEvalRequests)
+        {
+            NVPA_Status nvpaStatus;
+            NVPW_MetricsEvaluator_GetMetricRawDependencies_Params getMetricRawDependenciesParams = { NVPW_MetricsEvaluator_GetMetricRawDependencies_Params_STRUCT_SIZE };
+            getMetricRawDependenciesParams.pMetricsEvaluator = m_pMetricsEvaluator;
+            getMetricRawDependenciesParams.pMetricEvalRequests = pMetricEvalRequests;
+            getMetricRawDependenciesParams.numMetricEvalRequests = numMetricEvalRequests;
+            getMetricRawDependenciesParams.metricEvalRequestStructSize = NVPW_MetricEvalRequest_STRUCT_SIZE;
+            getMetricRawDependenciesParams.metricEvalRequestStrideSize = sizeof(NVPW_MetricEvalRequest);
+            nvpaStatus = NVPW_MetricsEvaluator_GetMetricRawDependencies(&getMetricRawDependenciesParams);
+            if (nvpaStatus)
+            {
+                NV_PERF_LOG_ERR(50, "NVPW_MetricsEvaluator_GetMetricRawDependencies failed\n");
+                return false;
+            }
+
+            std::vector<const char*> rawDependencies(getMetricRawDependenciesParams.numRawDependencies);
+            getMetricRawDependenciesParams.ppRawDependencies = rawDependencies.data();
+            nvpaStatus = NVPW_MetricsEvaluator_GetMetricRawDependencies(&getMetricRawDependenciesParams);
+            if (nvpaStatus)
+            {
+                NV_PERF_LOG_ERR(50, "NVPW_MetricsEvaluator_GetMetricRawDependencies failed\n");
+                return false;
+            }
+            for (const char* const pRawMetricName : rawDependencies)
+            {
+                NVPA_RawMetricRequest rawMetricRequest = { NVPA_RAW_METRIC_REQUEST_STRUCT_SIZE };
+                rawMetricRequest.pMetricName = pRawMetricName;
+                rawMetricRequest.isolated = true;
+                rawMetricRequest.keepInstances = true;
+
+                NVPW_CounterDataBuilder_AddMetrics_Params addMetricParams = { NVPW_CounterDataBuilder_AddMetrics_Params_STRUCT_SIZE };
+                addMetricParams.numMetricRequests = 1;
+                addMetricParams.pCounterDataBuilder = m_pCounterDataBuilder;
+                addMetricParams.pRawMetricRequests = &rawMetricRequest;
+                nvpaStatus = NVPW_CounterDataBuilder_AddMetrics(&addMetricParams);
+                if (nvpaStatus)
+                {
+                    NV_PERF_LOG_ERR(50, "NVPW_CounterDataBuilder_AddMetrics failed\n");
+                    return false;
+                }
+
+                NVPW_RawMetricsConfig_AddMetrics_Params configAddMetricParams = { NVPW_RawMetricsConfig_AddMetrics_Params_STRUCT_SIZE };
+                configAddMetricParams.numMetricRequests = 1;
+                configAddMetricParams.pRawMetricRequests = &rawMetricRequest;
+                configAddMetricParams.pRawMetricsConfig = m_pRawMetricsConfig;
+                nvpaStatus = NVPW_RawMetricsConfig_AddMetrics(&configAddMetricParams);
+                if (nvpaStatus)
+                {
+                    NV_PERF_LOG_ERR(50, "NVPW_RawMetricsConfig_AddMetrics failed\n");
+                    return false;
+                }
+            }
+
+            return true;
+        }
+
+        bool AddMetric(const char* pMetricName)
+        {
+            NVPW_MetricEvalRequest metricEvalRequest{};
+            bool success = ToMetricEvalRequest(m_pMetricsEvaluator, pMetricName, metricEvalRequest);
+            if (!success)
+            {
+                NV_PERF_LOG_ERR(50, "ToMetricEvalRequest failed for metric: %s\n", pMetricName);
+                return false;
+            }
+            success = AddMetrics(&metricEvalRequest, 1);
+            if (!success)
+            {
+                NV_PERF_LOG_ERR(50, "AddMetrics failed for metric: %s\n", pMetricName);
+                return false;
+            }
+            return true;
+        }
+
+        bool AddMetrics(const char* const pMetricNames[], size_t numMetrics)
+        {
+            bool success = true;
+            for (size_t metricIdx = 0; metricIdx < numMetrics; ++metricIdx)
+            {
+                const bool addMetricSuccess = AddMetric(pMetricNames[metricIdx]);
+                if (!addMetricSuccess)
+                {
+                    success = false;
+                }
+            }
+            if (!success)
+            {
+                return false;
+            }
+            return true;
+        }
+
+        bool PrepareConfigImage()
+        {
+            NVPA_Status nvpaStatus;
+            m_configuring = false;
+
+            NVPW_RawMetricsConfig_EndPassGroup_Params endPassGroupParam = { NVPW_RawMetricsConfig_EndPassGroup_Params_STRUCT_SIZE };
+            endPassGroupParam.pRawMetricsConfig = m_pRawMetricsConfig;
+            nvpaStatus = NVPW_RawMetricsConfig_EndPassGroup(&endPassGroupParam);
+            if (nvpaStatus)
+            {
+                return false;
+            }
+
+            NVPW_RawMetricsConfig_GenerateConfigImage_Params generateConfigImageParam = { NVPW_RawMetricsConfig_GenerateConfigImage_Params_STRUCT_SIZE };
+            generateConfigImageParam.pRawMetricsConfig = m_pRawMetricsConfig;
+            nvpaStatus = NVPW_RawMetricsConfig_GenerateConfigImage(&generateConfigImageParam);
+            if (nvpaStatus)
+            {
+                return false;
+            }
+
+            // Start a new PassGroup so that subsequent AddMetrics() calls will succeed.
+            // This will not result in optimal scheduling, but it obeys the principle of least surprise.
+            NVPW_RawMetricsConfig_BeginPassGroup_Params beginPassGroupParams = { NVPW_RawMetricsConfig_BeginPassGroup_Params_STRUCT_SIZE };
+            beginPassGroupParams.pRawMetricsConfig = m_pRawMetricsConfig;
+            nvpaStatus = NVPW_RawMetricsConfig_BeginPassGroup(&beginPassGroupParams);
+            if (nvpaStatus)
+            {
+                return false;
+            }
+            m_configuring = true;
+
+            return true;
+        }
+
+        // Returns the buffer size needed for the ConfigImage, or zero on error.
+        size_t GetConfigImageSize() const
+        {
+            NVPW_RawMetricsConfig_GetConfigImage_Params getConfigImageParam = { NVPW_RawMetricsConfig_GetConfigImage_Params_STRUCT_SIZE };
+            getConfigImageParam.pBuffer = nullptr;
+            getConfigImageParam.bytesAllocated = 0;
+            getConfigImageParam.pRawMetricsConfig = m_pRawMetricsConfig;
+            NVPA_Status nvpaStatus = NVPW_RawMetricsConfig_GetConfigImage(&getConfigImageParam);
+            if (nvpaStatus)
+            {
+                return 0;
+            }
+
+            return getConfigImageParam.bytesCopied;
+        }
+
+        // Copies the generated ConfigImage into pBuffer.
+        bool GetConfigImage(size_t bufferSize, uint8_t* pBuffer) const
+        {
+            NVPW_RawMetricsConfig_GetConfigImage_Params getConfigImageParam = { NVPW_RawMetricsConfig_GetConfigImage_Params_STRUCT_SIZE };
+            getConfigImageParam.pRawMetricsConfig = m_pRawMetricsConfig;
+            getConfigImageParam.bytesAllocated = bufferSize;
+            getConfigImageParam.pBuffer = pBuffer;
+            NVPA_Status nvpaStatus = NVPW_RawMetricsConfig_GetConfigImage(&getConfigImageParam);
+            if (nvpaStatus)
+            {
+                return false;
+            }
+            return true;
+        }
+
+        // Returns the buffer size needed for the CounterDataPrefix, or zero on error.
+        size_t GetCounterDataPrefixSize() const
+        {
+            NVPW_CounterDataBuilder_GetCounterDataPrefix_Params getCounterDataPrefixParams = { NVPW_CounterDataBuilder_GetCounterDataPrefix_Params_STRUCT_SIZE };
+            getCounterDataPrefixParams.bytesAllocated = 0;
+            getCounterDataPrefixParams.pBuffer = nullptr;
+            getCounterDataPrefixParams.pCounterDataBuilder = m_pCounterDataBuilder;
+            NVPA_Status nvpaStatus = NVPW_CounterDataBuilder_GetCounterDataPrefix(&getCounterDataPrefixParams);
+            if (nvpaStatus)
+            {
+                return 0;
+            }
+
+            return getCounterDataPrefixParams.bytesCopied;
+        }
+
+        // Copies the generated CounterDataPrefix into pBuffer.
+        bool GetCounterDataPrefix(size_t bufferSize, uint8_t* pBuffer) const
+        {
+            NVPW_CounterDataBuilder_GetCounterDataPrefix_Params getCounterDataPrefixParams = { NVPW_CounterDataBuilder_GetCounterDataPrefix_Params_STRUCT_SIZE };
+            getCounterDataPrefixParams.bytesAllocated = bufferSize;
+            getCounterDataPrefixParams.pBuffer = pBuffer;
+            getCounterDataPrefixParams.pCounterDataBuilder = m_pCounterDataBuilder;
+            NVPA_Status nvpaStatus = NVPW_CounterDataBuilder_GetCounterDataPrefix(&getCounterDataPrefixParams);
+            if (nvpaStatus)
+            {
+                return false;
+            }
+            return true;
+        }
+    };
+
+}}
--- a/ruins64k/tools/NvPerfUtility/include/NvPerfMetricsEvaluator.h
+++ b/ruins64k/tools/NvPerfUtility/include/NvPerfMetricsEvaluator.h
@@ -0,0 +1,766 @@
+/*
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+#pragma once
+
+#include <sstream>
+#include <utility>
+#include <vector>
+#include <string>
+#include "NvPerfInit.h"
+
+namespace nv { namespace perf {
+
+    // Smart Pointer for NVPW_MetricsEvaluator
+    class MetricsEvaluator
+    {
+    protected:
+        NVPW_MetricsEvaluator* m_pMetricsEvaluator;
+        std::vector<uint8_t> m_scratchBuffer;
+
+    private:
+        // Prevent accidental use of "delete" keyword on this class' implicit conversions.
+        // Introducing a second 'operator CompileErrorOnOperatorDelete*()' triggers an 'ambiguous conversion to void*'
+        // on the 'delete', which catches the usage error at compile time.  c.f. http://stackoverflow.com/a/3312507
+        struct CompileErrorOnOperatorDelete;
+        operator CompileErrorOnOperatorDelete*() const;
+
+    private:
+        // non-copyable
+        MetricsEvaluator(const MetricsEvaluator& rhs);
+        MetricsEvaluator& operator=(const MetricsEvaluator& rhs);
+
+    public:
+        ~MetricsEvaluator()
+        {
+            Reset();
+        }
+
+        MetricsEvaluator()
+            : m_pMetricsEvaluator()
+        {
+        }
+
+        // takes the ownership
+        MetricsEvaluator(NVPW_MetricsEvaluator* pMetricsEvaluator, std::vector<uint8_t>&& scratchBuffer)
+            : m_pMetricsEvaluator(pMetricsEvaluator)
+            , m_scratchBuffer(std::move(scratchBuffer))
+        {
+            scratchBuffer.clear();
+        }
+
+        MetricsEvaluator(MetricsEvaluator&& evaluator)
+            : m_pMetricsEvaluator(evaluator.m_pMetricsEvaluator)
+            , m_scratchBuffer(std::move(evaluator.m_scratchBuffer))
+        {
+            evaluator.m_pMetricsEvaluator = nullptr;
+            evaluator.m_scratchBuffer.clear();
+        }
+
+        MetricsEvaluator& operator=(MetricsEvaluator&& evaluator)
+        {
+            Reset();
+            m_pMetricsEvaluator = evaluator.m_pMetricsEvaluator;
+            m_scratchBuffer = std::move(evaluator.m_scratchBuffer);
+            evaluator.m_pMetricsEvaluator = nullptr;
+            evaluator.m_scratchBuffer.clear();
+            return *this;
+        }
+
+        operator NVPW_MetricsEvaluator*() const
+        {
+            return m_pMetricsEvaluator;
+        }
+
+        void Reset()
+        {
+            if (m_pMetricsEvaluator != nullptr)
+            {
+                NVPW_MetricsEvaluator_Destroy_Params destroyParams = { NVPW_MetricsEvaluator_Destroy_Params_STRUCT_SIZE };
+                destroyParams.pMetricsEvaluator = m_pMetricsEvaluator;
+                NVPA_Status status = NVPW_MetricsEvaluator_Destroy(&destroyParams);
+                if (status != NVPA_STATUS_SUCCESS)
+                {
+                    NV_PERF_LOG_ERR(80, "NVPW_MetricsEvaluator_Destroy failed\n");
+                }
+                m_pMetricsEvaluator = nullptr;
+            }
+            m_scratchBuffer.clear();
+        }
+    };
+
+    class MetricsEnumerator
+    {
+    public:
+        class Iterator
+        {
+        private:
+            // note these are pointing to the .RO section of the library, so their lifetime are not bound to any particular metrics enumerator or metrics evaluator instance
+            const char* m_pMetricNames;
+            const size_t* m_pMetricNameBeginIndices;
+            size_t m_numMetrics;
+            size_t m_metricIndex;
+        public:
+            Iterator()
+                : m_pMetricNames(nullptr)
+                , m_pMetricNameBeginIndices(nullptr)
+                , m_numMetrics(0)
+                , m_metricIndex(0)
+            {
+            }
+
+            Iterator(const char* pMetricNames, const size_t* pMetricNameBeginIndices, size_t numMetrics, size_t metricIndex)
+                : m_pMetricNames(pMetricNames)
+                , m_pMetricNameBeginIndices(pMetricNameBeginIndices)
+                , m_numMetrics(numMetrics)
+                , m_metricIndex(metricIndex)
+            {
+            }
+
+            Iterator(const Iterator& iterator)
+                : m_pMetricNames(iterator.m_pMetricNames)
+                , m_pMetricNameBeginIndices(iterator.m_pMetricNameBeginIndices)
+                , m_numMetrics(iterator.m_numMetrics)
+                , m_metricIndex(iterator.m_metricIndex)
+            {
+            }
+
+            Iterator& operator=(const Iterator& rhs)
+            {
+                m_pMetricNames = rhs.m_pMetricNames;
+                m_pMetricNameBeginIndices = rhs.m_pMetricNameBeginIndices;
+                m_numMetrics = rhs.m_numMetrics;
+                m_metricIndex = rhs.m_metricIndex;
+                return *this;
+            }
+
+            bool operator!=(const Iterator& rhs) const
+            {
+                return !(*this == rhs);
+            }
+
+            bool operator==(const Iterator& rhs) const
+            {
+                return m_pMetricNames == rhs.m_pMetricNames
+                    && m_pMetricNameBeginIndices == rhs.m_pMetricNameBeginIndices
+                    && m_numMetrics == rhs.m_numMetrics
+                    && m_metricIndex == rhs.m_metricIndex;
+            }
+
+            Iterator operator++()
+            {
+                if (m_metricIndex < m_numMetrics)
+                {
+                    ++m_metricIndex;
+                }
+                return *this;
+            }
+
+            Iterator operator++(int)
+            {
+                Iterator prev = *this;
+                ++*this;
+                return prev;
+            }
+
+            // no validity check
+            const char* operator*() const
+            {
+                const char* pMetricName = &m_pMetricNames[m_pMetricNameBeginIndices[m_metricIndex]];
+                return pMetricName;
+            }
+        };
+
+    private:
+        // note these are pointing to the .RO section of the library, so their lifetime are not bound to any particular metrics evaluator instance
+        const char* m_pMetricNames;
+        const size_t* m_pMetricNameBeginIndices;
+        size_t m_numMetrics;
+
+    public:
+        MetricsEnumerator()
+            : m_pMetricNames(nullptr)
+            , m_pMetricNameBeginIndices(nullptr)
+            , m_numMetrics(0)
+        {
+        }
+
+        MetricsEnumerator(const char* pMetricNames, const size_t* pMetricNameBeginIndices, size_t numMetrics)
+            : m_pMetricNames(pMetricNames)
+            , m_pMetricNameBeginIndices(pMetricNameBeginIndices)
+            , m_numMetrics(numMetrics)
+        {
+        }
+
+        MetricsEnumerator(const MetricsEnumerator& metricsEnumerator)
+            : m_pMetricNames(metricsEnumerator.m_pMetricNames)
+            , m_pMetricNameBeginIndices(metricsEnumerator.m_pMetricNameBeginIndices)
+            , m_numMetrics(metricsEnumerator.m_numMetrics)
+        {
+        }
+
+        MetricsEnumerator& operator=(const MetricsEnumerator& rhs)
+        {
+            m_pMetricNames = rhs.m_pMetricNames;
+            m_pMetricNameBeginIndices = rhs.m_pMetricNameBeginIndices;
+            m_numMetrics = rhs.m_numMetrics;
+            return *this;
+        }
+
+        // no bounds check
+        const char* operator[](size_t index) const
+        {
+            const char* pMetricName = &m_pMetricNames[m_pMetricNameBeginIndices[index]];
+            return pMetricName;
+        }
+
+        Iterator begin() const
+        {
+            return Iterator(m_pMetricNames, m_pMetricNameBeginIndices, m_numMetrics, 0);
+        }
+
+        Iterator end() const
+        {
+            return Iterator(m_pMetricNames, m_pMetricNameBeginIndices, m_numMetrics, m_numMetrics);
+        }
+
+        size_t size() const
+        {
+            return m_numMetrics;
+        }
+
+        bool empty() const
+        {
+            return !m_numMetrics;
+        }
+    };
+
+    inline MetricsEnumerator EnumerateMetrics(NVPW_MetricsEvaluator* pMetricsEvaluator, NVPW_MetricType metricType)
+    {
+        NVPW_MetricsEvaluator_GetMetricNames_Params metricsEvaluatorGetMetricNamesParams = { NVPW_MetricsEvaluator_GetMetricNames_Params_STRUCT_SIZE };
+        metricsEvaluatorGetMetricNamesParams.pMetricsEvaluator = pMetricsEvaluator;
+        metricsEvaluatorGetMetricNamesParams.metricType = static_cast<uint8_t>(metricType);
+        const NVPA_Status status = NVPW_MetricsEvaluator_GetMetricNames(&metricsEvaluatorGetMetricNamesParams);
+        if (status != NVPA_STATUS_SUCCESS)
+        {
+            return MetricsEnumerator();
+        }
+        return MetricsEnumerator(metricsEvaluatorGetMetricNamesParams.pMetricNames, metricsEvaluatorGetMetricNamesParams.pMetricNameBeginIndices, metricsEvaluatorGetMetricNamesParams.numMetrics);
+    }
+
+    inline MetricsEnumerator EnumerateCounters(NVPW_MetricsEvaluator* pMetricsEvaluator)
+    {
+        return EnumerateMetrics(pMetricsEvaluator, NVPW_METRIC_TYPE_COUNTER);
+    }
+
+    inline MetricsEnumerator EnumerateRatios(NVPW_MetricsEvaluator* pMetricsEvaluator)
+    {
+        return EnumerateMetrics(pMetricsEvaluator, NVPW_METRIC_TYPE_RATIO);
+    }
+
+    inline MetricsEnumerator EnumerateThroughputs(NVPW_MetricsEvaluator* pMetricsEvaluator)
+    {
+        return EnumerateMetrics(pMetricsEvaluator, NVPW_METRIC_TYPE_THROUGHPUT);
+    }
+
+    inline const char* ToCString(NVPW_MetricType metricType)
+    {
+        switch (metricType)
+        {
+            case NVPW_METRIC_TYPE_COUNTER:
+                return "Counter";
+            case NVPW_METRIC_TYPE_RATIO:
+                return "Ratio";
+            case NVPW_METRIC_TYPE_THROUGHPUT:
+                return "Throughput";
+            default:
+                return "";
+        }
+    }
+
+    inline const char* ToCString(NVPW_RollupOp rollupOp)
+    {
+        switch (rollupOp)
+        {
+            case NVPW_ROLLUP_OP_AVG:
+                return ".avg";
+            case NVPW_ROLLUP_OP_MAX:
+                return ".max";
+            case NVPW_ROLLUP_OP_MIN:
+                return ".min";
+            case NVPW_ROLLUP_OP_SUM:
+                return ".sum";
+            default:
+                return "";
+        }
+    }
+
+    inline const char* ToCString(NVPW_Submetric submetric)
+    {
+        switch (submetric)
+        {
+            case NVPW_SUBMETRIC_NONE:
+                return "";
+            case NVPW_SUBMETRIC_PEAK_SUSTAINED:
+                return ".peak_sustained";
+            case NVPW_SUBMETRIC_PEAK_SUSTAINED_ACTIVE:
+                return ".peak_sustained_active";
+            case NVPW_SUBMETRIC_PEAK_SUSTAINED_ACTIVE_PER_SECOND:
+                return ".peak_sustained_active.per_second";
+            case NVPW_SUBMETRIC_PEAK_SUSTAINED_ELAPSED:
+                return ".peak_sustained_elapsed";
+            case NVPW_SUBMETRIC_PEAK_SUSTAINED_ELAPSED_PER_SECOND:
+                return ".peak_sustained_elapsed.per_second";
+            case NVPW_SUBMETRIC_PEAK_SUSTAINED_FRAME:
+                return ".peak_sustained_frame";
+            case NVPW_SUBMETRIC_PEAK_SUSTAINED_FRAME_PER_SECOND:
+                return ".peak_sustained_frame.per_second";
+            case NVPW_SUBMETRIC_PEAK_SUSTAINED_REGION:
+                return ".peak_sustained_region";
+            case NVPW_SUBMETRIC_PEAK_SUSTAINED_REGION_PER_SECOND:
+                return ".peak_sustained_region.per_second";
+            case NVPW_SUBMETRIC_PER_CYCLE_ACTIVE:
+                return ".per_cycle_active";
+            case NVPW_SUBMETRIC_PER_CYCLE_ELAPSED:
+                return ".per_cycle_elapsed";
+            case NVPW_SUBMETRIC_PER_CYCLE_IN_FRAME:
+                return ".per_cycle_in_frame";
+            case NVPW_SUBMETRIC_PER_CYCLE_IN_REGION:
+                return ".per_cycle_in_region";
+            case NVPW_SUBMETRIC_PER_SECOND:
+                return ".per_second";
+            case NVPW_SUBMETRIC_PCT_OF_PEAK_SUSTAINED_ACTIVE:
+                return ".pct_of_peak_sustained_active";
+            case NVPW_SUBMETRIC_PCT_OF_PEAK_SUSTAINED_ELAPSED:
+                return ".pct_of_peak_sustained_elapsed";
+            case NVPW_SUBMETRIC_PCT_OF_PEAK_SUSTAINED_FRAME:
+                return ".pct_of_peak_sustained_frame";
+            case NVPW_SUBMETRIC_PCT_OF_PEAK_SUSTAINED_REGION:
+                return ".pct_of_peak_sustained_region";
+            case NVPW_SUBMETRIC_MAX_RATE:
+                return ".max_rate";
+            case NVPW_SUBMETRIC_PCT:
+                return ".pct";
+            case NVPW_SUBMETRIC_RATIO:
+                return ".ratio";
+            default:
+                return "";
+        }
+    }
+
+    inline const char* ToCString(const MetricsEnumerator& countersEnumerator, const MetricsEnumerator& ratiosEnumerator, const MetricsEnumerator& throughputsEnumerator, NVPW_MetricType metricType, size_t metricIndex)
+    {
+        if (metricType == NVPW_METRIC_TYPE_COUNTER)
+        {
+            if (metricIndex < countersEnumerator.size())
+            {
+                return countersEnumerator[metricIndex];
+            }
+        }
+        else if (metricType == NVPW_METRIC_TYPE_RATIO)
+        {
+            if (metricIndex < ratiosEnumerator.size())
+            {
+                return ratiosEnumerator[metricIndex];
+            }
+        }
+        else if (metricType == NVPW_METRIC_TYPE_THROUGHPUT)
+        {
+            if (metricIndex < throughputsEnumerator.size())
+            {
+                return throughputsEnumerator[metricIndex];
+            }
+        }
+        NV_PERF_LOG_WRN(50, "ToCString failed\n");
+        return "";
+    }
+
+    inline const char* ToCString(NVPW_MetricsEvaluator* pMetricsEvaluator, NVPW_MetricType metricType, size_t metricIndex)
+    {
+        if (metricType == NVPW_METRIC_TYPE_COUNTER)
+        {
+            const MetricsEnumerator countersEnumerator = EnumerateCounters(pMetricsEvaluator);
+            if (metricIndex < countersEnumerator.size())
+            {
+                return countersEnumerator[metricIndex];
+            }
+        }
+        else if (metricType == NVPW_METRIC_TYPE_RATIO)
+        {
+            const MetricsEnumerator ratiosEnumerator = EnumerateRatios(pMetricsEvaluator);
+            if (metricIndex < ratiosEnumerator.size())
+            {
+                return ratiosEnumerator[metricIndex];
+            }
+        }
+        else if (metricType == NVPW_METRIC_TYPE_THROUGHPUT)
+        {
+            const MetricsEnumerator throughputsEnumerator = EnumerateThroughputs(pMetricsEvaluator);
+            if (metricIndex < throughputsEnumerator.size())
+            {
+                return throughputsEnumerator[metricIndex];
+            }
+        }
+        NV_PERF_LOG_WRN(50, "ToCString failed\n");
+        return "";
+    }
+
+    inline std::string ToString(const MetricsEnumerator& countersEnumerator, const MetricsEnumerator& ratiosEnumerator, const MetricsEnumerator& throughputsEnumerator, const NVPW_MetricEvalRequest& metricEvalRequest)
+    {
+        std::string metricName(ToCString(countersEnumerator, ratiosEnumerator, throughputsEnumerator, static_cast<NVPW_MetricType>(metricEvalRequest.metricType), metricEvalRequest.metricIndex));
+        if (metricEvalRequest.metricType == NVPW_METRIC_TYPE_COUNTER || metricEvalRequest.metricType == NVPW_METRIC_TYPE_THROUGHPUT)
+        {
+            metricName += ToCString(static_cast<NVPW_RollupOp>(metricEvalRequest.rollupOp));
+        }
+        metricName += ToCString(static_cast<NVPW_Submetric>(metricEvalRequest.submetric));
+        return metricName;
+    }
+
+    inline std::string ToString(NVPW_MetricsEvaluator* pMetricsEvaluator, const NVPW_MetricEvalRequest& metricEvalRequest)
+    {
+        std::string metricName(ToCString(pMetricsEvaluator, static_cast<NVPW_MetricType>(metricEvalRequest.metricType), metricEvalRequest.metricIndex));
+        if (metricEvalRequest.metricType == NVPW_METRIC_TYPE_COUNTER || metricEvalRequest.metricType == NVPW_METRIC_TYPE_THROUGHPUT)
+        {
+            metricName += ToCString(static_cast<NVPW_RollupOp>(metricEvalRequest.rollupOp));
+        }
+        metricName += ToCString(static_cast<NVPW_Submetric>(metricEvalRequest.submetric));
+        return metricName;
+    }
+
+    inline bool ToMetricEvalRequest(NVPW_MetricsEvaluator* pMetricsEvaluator, const char* pMetricName, NVPW_MetricEvalRequest& metricEvalRequest)
+    {
+        NVPW_MetricsEvaluator_ConvertMetricNameToMetricEvalRequest_Params toMetricEvalRequestParams = { NVPW_MetricsEvaluator_ConvertMetricNameToMetricEvalRequest_Params_STRUCT_SIZE };
+        toMetricEvalRequestParams.pMetricsEvaluator = pMetricsEvaluator;
+        toMetricEvalRequestParams.pMetricName = pMetricName;
+        toMetricEvalRequestParams.pMetricEvalRequest = &metricEvalRequest;
+        toMetricEvalRequestParams.metricEvalRequestStructSize = NVPW_MetricEvalRequest_STRUCT_SIZE;
+        const NVPA_Status status = NVPW_MetricsEvaluator_ConvertMetricNameToMetricEvalRequest(&toMetricEvalRequestParams);
+        if (status != NVPA_STATUS_SUCCESS)
+        {
+            NV_PERF_LOG_WRN(80, "NVPW_MetricsEvaluator_ConvertMetricNameToMetricEvalRequest failed\n");
+            return false;
+        }
+        return true;
+    }
+
+    inline bool GetMetricTypeAndIndex(NVPW_MetricsEvaluator* pMetricsEvaluator, const char* pMetricName, NVPW_MetricType& metricType, size_t& metricIndex)
+    {
+        NVPW_MetricsEvaluator_GetMetricTypeAndIndex_Params getMetricTypeAndIndexParams = { NVPW_MetricsEvaluator_GetMetricTypeAndIndex_Params_STRUCT_SIZE };
+        getMetricTypeAndIndexParams.pMetricsEvaluator = pMetricsEvaluator;
+        getMetricTypeAndIndexParams.pMetricName = pMetricName;
+        NVPA_Status status = NVPW_MetricsEvaluator_GetMetricTypeAndIndex(&getMetricTypeAndIndexParams);
+        if (status != NVPA_STATUS_SUCCESS)
+        {
+            NV_PERF_LOG_WRN(80, "NVPW_MetricsEvaluator_GetMetricTypeAndIndex failed\n");
+            return false;
+        }
+        metricType = static_cast<NVPW_MetricType>(getMetricTypeAndIndexParams.metricType);
+        metricIndex = getMetricTypeAndIndexParams.metricIndex;
+        return true;
+    }
+
+    inline bool GetSupportedSubmetrics(NVPW_MetricsEvaluator* pMetricsEvaluator, NVPW_MetricType metricType, std::vector<NVPW_Submetric>& submetrics)
+    {
+        NVPW_MetricsEvaluator_GetSupportedSubmetrics_Params getSupportedSubmetrics = { NVPW_MetricsEvaluator_GetSupportedSubmetrics_Params_STRUCT_SIZE };
+        getSupportedSubmetrics.pMetricsEvaluator = pMetricsEvaluator;
+        getSupportedSubmetrics.metricType = static_cast<uint8_t>(metricType);
+        NVPA_Status status = NVPW_MetricsEvaluator_GetSupportedSubmetrics(&getSupportedSubmetrics);
+        if (status != NVPA_STATUS_SUCCESS)
+        {
+            NV_PERF_LOG_ERR(80, "NVPW_MetricsEvaluator_GetSupportedSubmetrics failed for metric type: %u\n", getSupportedSubmetrics.metricType);
+            return false;
+        }
+        submetrics.reserve(getSupportedSubmetrics.numSupportedSubmetrics);
+        for (size_t ii = 0; ii < getSupportedSubmetrics.numSupportedSubmetrics; ++ii)
+        {
+            submetrics.push_back(static_cast<NVPW_Submetric>(getSupportedSubmetrics.pSupportedSubmetrics[ii]));   
+        }
+        return true;
+    }
+
+    inline bool MetricsEvaluatorSetDeviceAttributes(NVPW_MetricsEvaluator* pMetricsEvaluator, const uint8_t* pCounterDataImage, size_t counterDataImageSize)
+    {
+        NVPW_MetricsEvaluator_SetDeviceAttributes_Params setDeviceAttributesParams = { NVPW_MetricsEvaluator_SetDeviceAttributes_Params_STRUCT_SIZE };
+        setDeviceAttributesParams.pMetricsEvaluator = pMetricsEvaluator;
+        setDeviceAttributesParams.pCounterDataImage = pCounterDataImage;
+        setDeviceAttributesParams.counterDataImageSize = counterDataImageSize;
+        const NVPA_Status status = NVPW_MetricsEvaluator_SetDeviceAttributes(&setDeviceAttributesParams);
+        if (status != NVPA_STATUS_SUCCESS)
+        {
+            NV_PERF_LOG_ERR(50, "NVPW_MetricsEvaluator_SetDeviceAttributes failed\n");
+            return false;
+        }
+        return true;
+    }
+
+    // Evaluate the named metrics from (CounterDataImage, rangeIndex) and store them in pMetricValues.
+    inline bool EvaluateToGpuValues(
+        NVPW_MetricsEvaluator* pMetricsEvaluator,
+        const uint8_t* pCounterDataImage,
+        size_t counterDataImageSize,
+        size_t rangeIndex,
+        size_t numMetricEvalRequests,
+        const NVPW_MetricEvalRequest* pMetricEvalRequests,
+        double* pMetricValues)
+    {
+        NVPW_MetricsEvaluator_EvaluateToGpuValues_Params evaluateToGpuValuesParams = { NVPW_MetricsEvaluator_EvaluateToGpuValues_Params_STRUCT_SIZE };
+        evaluateToGpuValuesParams.pMetricsEvaluator = pMetricsEvaluator;
+        evaluateToGpuValuesParams.pMetricEvalRequests = pMetricEvalRequests;
+        evaluateToGpuValuesParams.numMetricEvalRequests = numMetricEvalRequests;
+        evaluateToGpuValuesParams.metricEvalRequestStructSize = NVPW_MetricEvalRequest_STRUCT_SIZE;
+        evaluateToGpuValuesParams.metricEvalRequestStrideSize = sizeof(NVPW_MetricEvalRequest);
+        evaluateToGpuValuesParams.pCounterDataImage = pCounterDataImage;
+        evaluateToGpuValuesParams.counterDataImageSize = counterDataImageSize;
+        evaluateToGpuValuesParams.rangeIndex = rangeIndex;
+        evaluateToGpuValuesParams.isolated = (NVPA_Bool)true;
+        evaluateToGpuValuesParams.pMetricValues = pMetricValues;
+        NVPA_Status status = NVPW_MetricsEvaluator_EvaluateToGpuValues(&evaluateToGpuValuesParams);
+        if (status != NVPA_STATUS_SUCCESS)
+        {
+            NV_PERF_LOG_ERR(80, "NVPW_MetricsEvaluator_EvaluateToGpuValues failed\n");
+            return false;
+        }
+        return true;
+    }
+
+    inline bool operator==(const NVPW_DimUnitFactor& lhs, const NVPW_DimUnitFactor& rhs)
+    {
+        return (lhs.dimUnit == rhs.dimUnit) && (lhs.exponent == rhs.exponent);
+    }
+
+    inline bool operator<(const NVPW_DimUnitFactor& lhs, const NVPW_DimUnitFactor& rhs)
+    {
+        if (lhs.dimUnit != rhs.dimUnit)
+        {
+            return lhs.dimUnit < rhs.dimUnit;
+        }
+        if (lhs.exponent != rhs.exponent)
+        {
+            return lhs.exponent < rhs.exponent;
+        }
+        return false;
+    }
+
+    inline bool GetMetricDimUnits(NVPW_MetricsEvaluator* pMetricsEvaluator, const NVPW_MetricEvalRequest& metricRequest, std::vector<NVPW_DimUnitFactor>& dimUnits)
+    {
+        NVPW_MetricsEvaluator_GetMetricDimUnits_Params getMetricDimUnitsParams = { NVPW_MetricsEvaluator_GetMetricDimUnits_Params_STRUCT_SIZE };
+        getMetricDimUnitsParams.pMetricsEvaluator = pMetricsEvaluator;
+        getMetricDimUnitsParams.pMetricEvalRequest = &metricRequest;
+        getMetricDimUnitsParams.metricEvalRequestStructSize = NVPW_MetricEvalRequest_STRUCT_SIZE;
+        getMetricDimUnitsParams.dimUnitFactorStructSize = NVPW_DimUnitFactor_STRUCT_SIZE;
+        NVPA_Status status = NVPW_MetricsEvaluator_GetMetricDimUnits(&getMetricDimUnitsParams);
+        if (status != NVPA_STATUS_SUCCESS || !getMetricDimUnitsParams.numDimUnits)
+        {
+            NV_PERF_LOG_WRN(80, "NVPW_MetricsEvaluator_GetMetricDimUnits failed for metric = %s\n", ToString(pMetricsEvaluator, metricRequest).c_str());
+            return false;
+        }
+        dimUnits.resize(getMetricDimUnitsParams.numDimUnits);
+        getMetricDimUnitsParams.pDimUnits = dimUnits.data();
+        status = NVPW_MetricsEvaluator_GetMetricDimUnits(&getMetricDimUnitsParams);
+        if (status != NVPA_STATUS_SUCCESS)
+        {
+            NV_PERF_LOG_WRN(80, "NVPW_MetricsEvaluator_GetMetricDimUnits failed for metric = %s\n", ToString(pMetricsEvaluator, metricRequest).c_str());
+            return false;
+        }
+        return true;
+    }
+
+    inline const char* GetMetricDescription(NVPW_MetricsEvaluator* pMetricsEvaluator, NVPW_MetricType metricType, size_t metricIndex)
+    {
+        if (metricType == NVPW_METRIC_TYPE_COUNTER)
+        {
+            NVPW_MetricsEvaluator_GetCounterProperties_Params params{ NVPW_MetricsEvaluator_GetCounterProperties_Params_STRUCT_SIZE };
+            params.pMetricsEvaluator = pMetricsEvaluator;
+            params.counterIndex = metricIndex;
+            NVPA_Status status = NVPW_MetricsEvaluator_GetCounterProperties(&params);
+            if (status == NVPA_STATUS_SUCCESS)
+            {
+                return params.pDescription;
+            }
+        }
+        else if (metricType == NVPW_METRIC_TYPE_RATIO)
+        {
+            NVPW_MetricsEvaluator_GetRatioMetricProperties_Params params{ NVPW_MetricsEvaluator_GetRatioMetricProperties_Params_STRUCT_SIZE };
+            params.pMetricsEvaluator = pMetricsEvaluator;
+            params.ratioMetricIndex = metricIndex;
+            NVPA_Status status = NVPW_MetricsEvaluator_GetRatioMetricProperties(&params);
+            if (status == NVPA_STATUS_SUCCESS)
+            {
+                return params.pDescription;
+            }
+        }
+        else if (metricType == NVPW_METRIC_TYPE_THROUGHPUT)
+        {
+            NVPW_MetricsEvaluator_GetThroughputMetricProperties_Params params{ NVPW_MetricsEvaluator_GetThroughputMetricProperties_Params_STRUCT_SIZE };
+            params.pMetricsEvaluator = pMetricsEvaluator;
+            params.throughputMetricIndex = metricIndex;
+            NVPA_Status status = NVPW_MetricsEvaluator_GetThroughputMetricProperties(&params);
+            if (status == NVPA_STATUS_SUCCESS)
+            {
+                return params.pDescription;
+            }
+        }
+        NV_PERF_LOG_WRN(50, "GetMetricDescription failed for metricType = %u, metricIndex = %u\n", (uint32_t)metricType, (uint32_t)metricIndex);
+        return nullptr;
+    }
+
+    inline const char* ToCString(NVPW_MetricsEvaluator* pMetricsEvaluator, NVPW_HwUnit hwUnit)
+    {
+        NVPW_MetricsEvaluator_HwUnitToString_Params params{ NVPW_MetricsEvaluator_HwUnitToString_Params_STRUCT_SIZE };
+        params.pMetricsEvaluator = pMetricsEvaluator;
+        params.hwUnit = hwUnit;
+        NVPA_Status status = NVPW_MetricsEvaluator_HwUnitToString(&params);
+        if (status != NVPA_STATUS_SUCCESS)
+        {
+            NV_PERF_LOG_WRN(50, "NVPW_MetricsEvaluator_HwUnitToString failed for hwUnit: %u\n", hwUnit);
+            return nullptr;
+        }
+        return params.pHwUnitName;
+    }
+
+    inline NVPW_HwUnit GetMetricHwUnit(NVPW_MetricsEvaluator* pMetricsEvaluator, NVPW_MetricType metricType, size_t metricIndex)
+    {
+        if (metricType == NVPW_METRIC_TYPE_COUNTER)
+        {
+            NVPW_MetricsEvaluator_GetCounterProperties_Params params{ NVPW_MetricsEvaluator_GetCounterProperties_Params_STRUCT_SIZE };
+            params.pMetricsEvaluator = pMetricsEvaluator;
+            params.counterIndex = metricIndex;
+            NVPA_Status status = NVPW_MetricsEvaluator_GetCounterProperties(&params);
+            if (status == NVPA_STATUS_SUCCESS)
+            {
+                return static_cast<NVPW_HwUnit>(params.hwUnit);
+            }
+        }
+        else if (metricType == NVPW_METRIC_TYPE_RATIO)
+        {
+            NVPW_MetricsEvaluator_GetRatioMetricProperties_Params params{ NVPW_MetricsEvaluator_GetRatioMetricProperties_Params_STRUCT_SIZE };
+            params.pMetricsEvaluator = pMetricsEvaluator;
+            params.ratioMetricIndex = metricIndex;
+            NVPA_Status status = NVPW_MetricsEvaluator_GetRatioMetricProperties(&params);
+            if (status == NVPA_STATUS_SUCCESS)
+            {
+                return static_cast<NVPW_HwUnit>(params.hwUnit);
+            }
+        }
+        else if (metricType == NVPW_METRIC_TYPE_THROUGHPUT)
+        {
+            NVPW_MetricsEvaluator_GetThroughputMetricProperties_Params params{ NVPW_MetricsEvaluator_GetThroughputMetricProperties_Params_STRUCT_SIZE };
+            params.pMetricsEvaluator = pMetricsEvaluator;
+            params.throughputMetricIndex = metricIndex;
+            NVPA_Status status = NVPW_MetricsEvaluator_GetThroughputMetricProperties(&params);
+            if (status == NVPA_STATUS_SUCCESS)
+            {
+                return static_cast<NVPW_HwUnit>(params.hwUnit);
+            }
+        }
+        NV_PERF_LOG_WRN(50, "GetMetricHwUnit failed for metricType = %u, metricIndex = %u\n", (uint32_t)metricType, (uint32_t)metricIndex);
+        return NVPW_HW_UNIT_INVALID;
+    }
+
+    inline const char* GetMetricHwUnitStr(NVPW_MetricsEvaluator* pMetricsEvaluator, NVPW_MetricType metricType, size_t metricIndex)
+    {
+        const NVPW_HwUnit hwUnit = GetMetricHwUnit(pMetricsEvaluator, metricType, metricIndex);
+        const char* pHwUnitStr = ToCString(pMetricsEvaluator, hwUnit);
+        return pHwUnitStr;
+    }
+
+    inline const char* ToCString(NVPW_MetricsEvaluator* pMetricsEvaluator, NVPW_DimUnitName dimUnit, bool plural)
+    {
+        NVPW_MetricsEvaluator_DimUnitToString_Params dimUnitToStringParams = { NVPW_MetricsEvaluator_DimUnitToString_Params_STRUCT_SIZE };
+        dimUnitToStringParams.pMetricsEvaluator = pMetricsEvaluator;
+        dimUnitToStringParams.dimUnit = static_cast<uint32_t>(dimUnit);
+        NVPA_Status status = NVPW_MetricsEvaluator_DimUnitToString(&dimUnitToStringParams);
+        if (status != NVPA_STATUS_SUCCESS)
+        {
+            NV_PERF_LOG_WRN(80, "NVPW_MetricsEvaluator_DimUnitToString failed for dimUnit = %u\n", dimUnit);
+            return "";
+        }
+        const char* pDimUnitStr = plural? dimUnitToStringParams.pPluralName : dimUnitToStringParams.pSingularName;
+        return pDimUnitStr;
+    }
+
+    // `getDimUnitStrFunctor` must be in the form of const char*(NVPW_DimUnitName dimUnit, bool plural)
+    template <typename GetDimUnitStrFunctor>
+    inline std::string ToString(const std::vector<NVPW_DimUnitFactor>& dimUnitFactors, GetDimUnitStrFunctor&& getDimUnitStrFunctor)
+    {
+        if (dimUnitFactors.empty())
+        {
+            return "<unitless>";
+        }
+
+        std::stringstream sstream;
+        size_t numeratorCount = 0;
+        size_t denominatorCount = 0;
+        auto isNumerator = [](const NVPW_DimUnitFactor& dimUnitFactor) {
+            return dimUnitFactor.exponent > 0;
+        };
+        // if printNumerator == false, print the denominator
+        auto printFormattedDimUnits = [&](size_t count, bool printNumerator) {
+            if (count > 1)
+            {
+                sstream << "(";
+            }
+            bool isFirst = true;
+            for (const NVPW_DimUnitFactor& dimUnitFactor : dimUnitFactors)
+            {
+                if (printNumerator != isNumerator(dimUnitFactor))
+                {
+                    continue;
+                }
+
+                if (!isFirst)
+                {
+                    sstream << " * ";
+                }
+                const bool plural = printNumerator;
+                sstream << getDimUnitStrFunctor(static_cast<NVPW_DimUnitName>(dimUnitFactor.dimUnit), plural);
+                if (std::abs(dimUnitFactor.exponent) != 1)
+                {
+                    sstream << "^" << (uint32_t)std::abs(dimUnitFactor.exponent);
+                }
+                isFirst = false;
+            }
+            if (count > 1)
+            {
+                sstream << ")";
+            }
+        };
+
+        for (const NVPW_DimUnitFactor& dimUnitFactor : dimUnitFactors)
+        {
+            isNumerator(dimUnitFactor) ? ++numeratorCount : ++denominatorCount;
+        }
+
+        if (numeratorCount)
+        {
+            const bool printNumerator = true;
+            printFormattedDimUnits(numeratorCount, printNumerator);
+        }
+        else
+        {
+            sstream << "1";
+        }
+
+        if (denominatorCount)
+        {
+            sstream << " / ";
+            const bool printNumerator = false;
+            printFormattedDimUnits(denominatorCount, printNumerator);
+        }
+        return sstream.str();
+    }
+
+}}
--- a/ruins64k/tools/NvPerfUtility/include/NvPerfOpenGL.h
+++ b/ruins64k/tools/NvPerfUtility/include/NvPerfOpenGL.h
@@ -0,0 +1,185 @@
+/*
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+#pragma once
+
+#include "NvPerfInit.h"
+#include "NvPerfDeviceProperties.h"
+#include "nvperf_opengl_host.h"
+#include "nvperf_opengl_target.h"
+#include "GL/gl.h"
+#include <string.h>
+namespace nv { namespace perf {
+
+    // OpenGL Only Utilities
+    //
+    inline std::string OpenGLGetDeviceName()
+    {
+        const GLubyte* pRenderer = glGetString(GL_RENDERER);
+        if (!pRenderer)
+        {
+            return "";
+        }
+
+        return (const char*) pRenderer;
+    }
+
+    inline bool OpenGLIsNvidiaDevice()
+    {
+        const GLubyte* pVendor = glGetString(GL_VENDOR);
+        if (!pVendor)
+        {
+            return false;
+        }
+
+        if (strstr((const char*)pVendor, "NVIDIA"))
+        {
+            return true;
+        }
+        return false;
+    }
+
+    inline bool OpenGLLoadDriver()
+    {
+        NVPW_OpenGL_LoadDriver_Params loadDriverParams = { NVPW_OpenGL_LoadDriver_Params_STRUCT_SIZE };
+        NVPA_Status nvpaStatus = NVPW_OpenGL_LoadDriver(&loadDriverParams);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(10, "NVPW_OpenGL_LoadDriver failed\n");
+            return false;
+        }
+        return true;
+    }
+
+    inline size_t OpenGLGetNvperfDeviceIndex(size_t sliIndex = 0)
+    {
+        NVPW_OpenGL_GraphicsContext_GetDeviceIndex_Params getDeviceIndexParams = { NVPW_OpenGL_GraphicsContext_GetDeviceIndex_Params_STRUCT_SIZE };
+        getDeviceIndexParams.sliIndex = sliIndex;
+
+        NVPA_Status nvpaStatus = NVPW_OpenGL_GraphicsContext_GetDeviceIndex(&getDeviceIndexParams);
+        if (nvpaStatus)
+        {
+            return ~size_t(0);
+        }
+
+        return getDeviceIndexParams.deviceIndex;
+    }
+
+    inline DeviceIdentifiers OpenGLGetDeviceIdentifiers(size_t sliIndex = 0)
+    {
+        const size_t deviceIndex = OpenGLGetNvperfDeviceIndex(sliIndex);
+
+        DeviceIdentifiers deviceIdentifiers = GetDeviceIdentifiers(deviceIndex);
+        return deviceIdentifiers;
+    }
+
+    inline NVPW_Device_ClockStatus OpenGLGetDeviceClockState()
+    {
+        size_t nvperfDeviceIndex = OpenGLGetNvperfDeviceIndex();
+        return GetDeviceClockState(nvperfDeviceIndex);
+    }
+
+    inline bool OpenGLSetDeviceClockState(NVPW_Device_ClockSetting clockStatus)
+    {
+        size_t nvperfDeviceIndex = OpenGLGetNvperfDeviceIndex();
+        return SetDeviceClockState(nvperfDeviceIndex, clockStatus);
+    }
+
+    inline bool OpenGLSetDeviceClockState(NVPW_Device_ClockStatus clockStatus)
+    {
+        size_t nvperfDeviceIndex = OpenGLGetNvperfDeviceIndex();
+        return SetDeviceClockState(nvperfDeviceIndex, clockStatus);
+    }
+
+    inline size_t OpenGLCalculateMetricsEvaluatorScratchBufferSize(const char* pChipName)
+    {
+        NVPW_OpenGL_MetricsEvaluator_CalculateScratchBufferSize_Params calculateScratchBufferSizeParams = { NVPW_OpenGL_MetricsEvaluator_CalculateScratchBufferSize_Params_STRUCT_SIZE };
+        calculateScratchBufferSizeParams.pChipName = pChipName;
+        NVPA_Status nvpaStatus = NVPW_OpenGL_MetricsEvaluator_CalculateScratchBufferSize(&calculateScratchBufferSizeParams);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(20, "NVPW_OpenGL_MetricsEvaluator_CalculateScratchBufferSize failed\n");
+            return 0;
+        }
+        return calculateScratchBufferSizeParams.scratchBufferSize;
+    }
+
+    inline NVPW_MetricsEvaluator* OpenGLCreateMetricsEvaluator(uint8_t* pScratchBuffer, size_t scratchBufferSize, const char* pChipName)
+    {
+        NVPW_OpenGL_MetricsEvaluator_Initialize_Params initializeParams = { NVPW_OpenGL_MetricsEvaluator_Initialize_Params_STRUCT_SIZE };
+        initializeParams.pScratchBuffer = pScratchBuffer;
+        initializeParams.scratchBufferSize = scratchBufferSize;
+        initializeParams.pChipName = pChipName;
+        NVPA_Status nvpaStatus = NVPW_OpenGL_MetricsEvaluator_Initialize(&initializeParams);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(20, "NVPW_OpenGL_MetricsEvaluator_Initialize failed\n");
+            return nullptr;
+        }
+        return initializeParams.pMetricsEvaluator;
+    }
+
+}}
+
+namespace nv { namespace perf { namespace profiler {
+
+    inline NVPA_RawMetricsConfig* OpenGLCreateRawMetricsConfig(const char* pChipName)
+    {
+        NVPW_OpenGL_RawMetricsConfig_Create_Params configParams = { NVPW_OpenGL_RawMetricsConfig_Create_Params_STRUCT_SIZE };
+        configParams.activityKind = NVPA_ACTIVITY_KIND_PROFILER;
+        configParams.pChipName = pChipName;
+
+        NVPA_Status nvpaStatus = NVPW_OpenGL_RawMetricsConfig_Create(&configParams);
+        if (nvpaStatus)
+        {
+            return nullptr;
+        }
+
+        return configParams.pRawMetricsConfig;
+    }
+
+    inline bool OpenGLIsGpuSupported(size_t sliIndex = 0)
+    {
+        const size_t deviceIndex = OpenGLGetNvperfDeviceIndex(sliIndex);
+
+        NVPW_OpenGL_Profiler_IsGpuSupported_Params params = { NVPW_OpenGL_Profiler_IsGpuSupported_Params_STRUCT_SIZE };
+        params.deviceIndex = deviceIndex;
+        NVPA_Status nvpaStatus = NVPW_OpenGL_Profiler_IsGpuSupported(&params);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(10, "NVPW_OpenGL_Profiler_IsGpuSupported failed on %s\n", OpenGLGetDeviceName().c_str());
+            return false;
+        }
+
+        if (!params.isSupported)
+        {
+            NV_PERF_LOG_ERR(10, "%s is not supported\n", OpenGLGetDeviceName().c_str());
+            if (params.gpuArchitectureSupportLevel != NVPW_GPU_ARCHITECTURE_SUPPORT_LEVEL_SUPPORTED)
+            {
+                const DeviceIdentifiers deviceIdentifiers = OpenGLGetDeviceIdentifiers(sliIndex);
+                NV_PERF_LOG_ERR(10, "Unsupported GPU architecture %s\n", deviceIdentifiers.pChipName);
+            }
+            if (params.sliSupportLevel == NVPW_SLI_SUPPORT_LEVEL_UNSUPPORTED)
+            {
+                NV_PERF_LOG_ERR(10, "Devices in SLI configuration are not supported.\n");
+            }
+            return false;
+        }
+
+        return true;
+    }
+
+}}}
--- a/ruins64k/tools/NvPerfUtility/include/NvPerfRangeProfiler.h
+++ b/ruins64k/tools/NvPerfUtility/include/NvPerfRangeProfiler.h
@@ -0,0 +1,336 @@
+/*
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+#pragma once
+#include <list>
+#include <utility>
+#include <vector>
+
+#ifdef __linux__
+#include <sys/stat.h>
+#endif
+
+#include "NvPerfCounterData.h"
+#include "NvPerfCounterConfiguration.h"
+
+namespace nv { namespace perf { namespace profiler {
+
+    // safe defaults for realtime
+    struct SessionOptions
+    {
+        size_t maxNumRanges = 16;
+        size_t avgRangeNameLength = 128;
+        size_t numTraceBuffers = 5;                 // recommended: SwapChainDepth + 2
+    };
+
+    struct SetConfigParams
+    {
+        const uint8_t* pConfigImage;
+        size_t configImageSize;
+        const uint8_t* pCounterDataPrefix;
+        size_t counterDataPrefixSize;
+        size_t numPipelinedPasses;
+        size_t numIsolatedPasses;
+        uint16_t numNestingLevels;
+        size_t numStatisticalSamples;
+
+        SetConfigParams()
+            : pConfigImage()
+            , configImageSize()
+            , pCounterDataPrefix()
+            , counterDataPrefixSize()
+            , numPipelinedPasses()
+            , numIsolatedPasses()
+            , numNestingLevels()
+            , numStatisticalSamples()
+        {
+        }
+
+        SetConfigParams(const CounterConfiguration& configuration, uint16_t numNestingLevels = 1, size_t numStatisticalSamples = 1)
+            : pConfigImage(configuration.configImage.data())
+            , configImageSize(configuration.configImage.size())
+            , pCounterDataPrefix(configuration.counterDataPrefix.data())
+            , counterDataPrefixSize(configuration.counterDataPrefix.size())
+            , numPipelinedPasses(configuration.numPipelinedPasses)
+            , numIsolatedPasses(configuration.numIsolatedPasses)
+            , numNestingLevels(numNestingLevels)
+            , numStatisticalSamples(numStatisticalSamples)
+        {
+        }
+    };
+
+    // out-param from DecodeCounters
+    struct DecodeResult
+    {
+        bool onePassDecoded;
+        bool allPassesDecoded;
+        bool allStatisticalSamplesCollected;
+        std::vector<uint8_t> counterDataImage;      // if allPassesDecoded is true, this will be non-empty
+    };
+
+    class RangeProfilerStateMachine
+    {
+    public: // types
+        struct IProfilerApi
+        {
+            virtual bool CreateCounterData(const SetConfigParams& config, std::vector<uint8_t>& counterDataImage, std::vector<uint8_t>& counterDataScratch) const = 0;
+            virtual bool SetConfig(const SetConfigParams& config) const = 0;
+            virtual bool BeginPass() const = 0;
+            virtual bool EndPass() const = 0;
+            virtual bool PushRange(const char* pRangeName) = 0;
+            virtual bool PopRange() = 0;
+            virtual bool DecodeCounters(std::vector<uint8_t>& counterDataImage, std::vector<uint8_t>& counterDataScratch, bool& onePassDecoded, bool& allPassesDecoded) const = 0;
+        };
+
+    protected: // types
+        struct CounterStateMachine
+        {
+            // state updated per-pass
+            size_t numPassesSubmitted;                          /// number of passes submitted (incremented at EndPass)
+            size_t numStatisticalSamplesCollected;              /// number of times all passes were collected
+
+            // state derived from the configuration
+            size_t numPassesPerStatisticalSample;               /// number of passes required by the {ConfigImage, numNestingLevels}
+            size_t numStatisticalSamplesRequired;               /// number of repeated samplings required by SetConfig
+            std::vector<uint8_t> counterDataImage;              /// opaque buffer containing HW counter data; updated in DecodeCounters on each frame
+            std::vector<uint8_t> counterDataScratch;            /// opaque buffer needed by DecodeCounters
+
+            bool AllPassesSubmitted() const
+            {
+                const bool allPassesSubmitted = (numPassesSubmitted == numPassesPerStatisticalSample * numStatisticalSamplesRequired);
+                return allPassesSubmitted;
+            }
+        };
+
+    protected: // members
+        IProfilerApi& m_profilerApi;
+        bool m_inPass;
+
+        // Use std::list for stable iterators and a guarantee of no-copy.
+        typedef std::list<SetConfigParams> ConfigQueue;
+        typedef std::list<CounterStateMachine> CountersQueue;
+        bool m_needSetConfig;
+        ConfigQueue m_configQueue;                      // m_configQueue.front() is the active configuration (by SetConfig), and is popped after all passes are submitted
+        CountersQueue m_countersQueue;                  // queued CounterData, which may lag the configQueue when frames are rendered asynchronously
+        CountersQueue::iterator m_submitCounterItr;     // points at the CounterData corresponding to m_configQueue.front()
+
+    private:
+        // non-copyable
+        RangeProfilerStateMachine(const RangeProfilerStateMachine&);
+
+    public:
+        ~RangeProfilerStateMachine()
+        {
+            Reset();
+        }
+
+        RangeProfilerStateMachine(IProfilerApi& profilerApi)
+            : m_profilerApi(profilerApi)
+            , m_inPass(false)
+            , m_needSetConfig()
+            , m_configQueue()
+            , m_countersQueue()
+            , m_submitCounterItr()
+        {
+        }
+
+        void Reset()
+        {
+            m_submitCounterItr = {};
+            m_countersQueue.clear();
+            m_configQueue.clear();
+            m_needSetConfig = false;
+            m_inPass = false;
+        }
+
+        bool IsInPass() const
+        {
+            return m_inPass;
+        }
+
+        bool EnqueueCounterCollection(const SetConfigParams& config)
+        {
+            CounterStateMachine counterStateMachine = {};
+            counterStateMachine.numPassesPerStatisticalSample = config.numPipelinedPasses + config.numIsolatedPasses * config.numNestingLevels;
+            counterStateMachine.numStatisticalSamplesRequired = config.numStatisticalSamples;
+            if (!m_profilerApi.CreateCounterData(config, counterStateMachine.counterDataImage, counterStateMachine.counterDataScratch))
+            {
+                return false;
+            }
+
+            if (m_configQueue.empty())
+            {
+                m_needSetConfig = true;
+            }
+            m_configQueue.push_back(config);
+
+            const bool countersQueueWasEmpty = m_countersQueue.empty();
+            m_countersQueue.emplace_back(std::move(counterStateMachine));
+            if (countersQueueWasEmpty)
+            {
+                m_submitCounterItr = m_countersQueue.begin();
+            }
+
+            return true;
+        }
+
+        bool BeginPass()
+        {
+            if (m_inPass)
+            {
+                // TODO: error - must be called in session, but outside of a pass
+                return false;
+            }
+            if (m_configQueue.empty())
+            {
+                // Do not enqueue additional HW data collection.
+                return true;
+            }
+
+            if (m_needSetConfig)
+            {
+                if (!m_profilerApi.SetConfig(m_configQueue.front()))
+                {
+                    return false;
+                }
+                m_needSetConfig = false;
+            }
+
+            if (!m_profilerApi.BeginPass())
+            {
+                return false;
+            }
+
+            m_inPass = true;
+            return true;
+        }
+
+        bool EndPass()
+        {
+            if (!m_inPass)
+            {
+                // TODO: error - must be called in session, and inside of a pass
+                return false;
+            }
+
+            if (m_configQueue.empty())
+            {
+                // Do not enqueue additional HW data collection.
+                return true;
+            }
+
+            if (!m_profilerApi.EndPass())
+            {
+                return false;
+            }
+
+            CounterStateMachine& counterStateMachine = *m_submitCounterItr;
+            counterStateMachine.numPassesSubmitted += 1;
+            if (counterStateMachine.AllPassesSubmitted())
+            {
+                ++m_submitCounterItr;
+                m_configQueue.pop_front();
+                if (!m_configQueue.empty())
+                {
+                    m_needSetConfig = true;
+                }
+            }
+
+            m_inPass = false;
+            return true;
+        }
+
+        bool PushRange(const char* pRangeName)
+        {
+            if (!m_inPass)
+            {
+                // TODO: error - must be called in session, and inside of a pass
+                return false;
+            }
+
+            if (m_configQueue.empty())
+            {
+                // Do not enqueue additional HW data collection.
+                return true;
+            }
+            
+            if (!m_profilerApi.PushRange(pRangeName))
+            {
+                return false;
+            }
+
+            return true;
+        }
+
+        bool PopRange()
+        {
+            if (!m_inPass)
+            {
+                // TODO: error - must be called in session, and inside of a pass
+                return false;
+            }
+
+            if (m_configQueue.empty())
+            {
+                // Do not enqueue additional HW data collection.
+                return true;
+            }
+            
+            if (!m_profilerApi.PopRange())
+            {
+                return false;
+            }
+
+            return true;
+        }
+
+        bool DecodeCounters(DecodeResult& decodeResult)
+        {
+            if (m_countersQueue.empty())
+            {
+                // TODO: error - nothing is queued for collection.  see SetConfig ...
+                return false;
+            }
+
+            CounterStateMachine& counterStateMachine = m_countersQueue.front();
+
+            decodeResult = {};
+            if (!m_profilerApi.DecodeCounters(counterStateMachine.counterDataImage, counterStateMachine.counterDataScratch, decodeResult.onePassDecoded, decodeResult.allPassesDecoded))
+            {
+                // TODO: error - the session must be torn down
+                return false;
+            }
+
+            if (decodeResult.allPassesDecoded)
+            {
+                counterStateMachine.numStatisticalSamplesCollected += 1;
+                if (counterStateMachine.numStatisticalSamplesCollected == counterStateMachine.numStatisticalSamplesRequired)
+                {
+                    decodeResult.allStatisticalSamplesCollected = true;
+                    decodeResult.counterDataImage = std::move(counterStateMachine.counterDataImage);
+                    m_countersQueue.pop_front();
+                }
+            }
+            return true;
+        }
+
+        bool AllPassesSubmitted() const
+        {
+            const bool allPassesSubmitted = m_configQueue.empty();
+            return allPassesSubmitted;
+        }
+    };
+
+}}}
--- a/ruins64k/tools/NvPerfUtility/include/NvPerfRangeProfilerD3D11.h
+++ b/ruins64k/tools/NvPerfUtility/include/NvPerfRangeProfilerD3D11.h
@@ -0,0 +1,373 @@
+/*
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+#pragma once
+
+#include "NvPerfRangeProfiler.h"
+#include "NvPerfD3D11.h"
+#include <atlbase.h>
+
+namespace nv { namespace perf { namespace profiler {
+
+    class RangeProfilerD3D11
+    {
+    private:
+        struct ProfilerApi : RangeProfilerStateMachine::IProfilerApi
+        {
+            CComPtr<ID3D11DeviceContext> pDeviceContext;
+            SessionOptions sessionOptions;
+
+            ProfilerApi()
+                : pDeviceContext(nullptr)
+                , sessionOptions()
+            {
+            }
+
+            virtual bool CreateCounterData(const SetConfigParams& config, std::vector<uint8_t>& counterDataImage, std::vector<uint8_t>& counterDataScratch) const override
+            {
+                NVPA_Status nvpaStatus;
+
+                NVPW_D3D11_Profiler_CounterDataImageOptions counterDataImageOptions = { NVPW_D3D11_Profiler_CounterDataImageOptions_STRUCT_SIZE };
+                counterDataImageOptions.pCounterDataPrefix = config.pCounterDataPrefix;
+                counterDataImageOptions.counterDataPrefixSize = config.counterDataPrefixSize;
+                counterDataImageOptions.maxNumRanges = static_cast<uint32_t>(sessionOptions.maxNumRanges);
+                counterDataImageOptions.maxNumRangeTreeNodes = static_cast<uint32_t>(2 * sessionOptions.maxNumRanges);
+                counterDataImageOptions.maxRangeNameLength = static_cast<uint32_t>(sessionOptions.avgRangeNameLength);
+
+                NVPW_D3D11_Profiler_CounterDataImage_CalculateSize_Params calculateSizeParams = { NVPW_D3D11_Profiler_CounterDataImage_CalculateSize_Params_STRUCT_SIZE };
+                calculateSizeParams.counterDataImageOptionsSize = NVPW_D3D11_Profiler_CounterDataImageOptions_STRUCT_SIZE;
+                calculateSizeParams.pOptions = &counterDataImageOptions;
+                nvpaStatus = NVPW_D3D11_Profiler_CounterDataImage_CalculateSize(&calculateSizeParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+
+                counterDataImage.resize(calculateSizeParams.counterDataImageSize);
+
+                NVPW_D3D11_Profiler_CounterDataImage_Initialize_Params initializeParams = { NVPW_D3D11_Profiler_CounterDataImage_Initialize_Params_STRUCT_SIZE };
+                initializeParams.counterDataImageOptionsSize = NVPW_D3D11_Profiler_CounterDataImageOptions_STRUCT_SIZE;
+                initializeParams.pOptions = &counterDataImageOptions;
+                initializeParams.counterDataImageSize = calculateSizeParams.counterDataImageSize;
+                initializeParams.pCounterDataImage = &counterDataImage[0];
+                nvpaStatus = NVPW_D3D11_Profiler_CounterDataImage_Initialize(&initializeParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+
+                NVPW_D3D11_Profiler_CounterDataImage_CalculateScratchBufferSize_Params scratchBufferSizeParams = { NVPW_D3D11_Profiler_CounterDataImage_CalculateScratchBufferSize_Params_STRUCT_SIZE };
+                scratchBufferSizeParams.counterDataImageSize = calculateSizeParams.counterDataImageSize;
+                scratchBufferSizeParams.pCounterDataImage = initializeParams.pCounterDataImage;
+                nvpaStatus = NVPW_D3D11_Profiler_CounterDataImage_CalculateScratchBufferSize(&scratchBufferSizeParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+
+                counterDataScratch.resize(scratchBufferSizeParams.counterDataScratchBufferSize);
+
+                NVPW_D3D11_Profiler_CounterDataImage_InitializeScratchBuffer_Params initScratchBufferParams = { NVPW_D3D11_Profiler_CounterDataImage_InitializeScratchBuffer_Params_STRUCT_SIZE };
+                initScratchBufferParams.counterDataImageSize = calculateSizeParams.counterDataImageSize;
+                initScratchBufferParams.pCounterDataImage = initializeParams.pCounterDataImage;
+                initScratchBufferParams.counterDataScratchBufferSize = scratchBufferSizeParams.counterDataScratchBufferSize;
+                initScratchBufferParams.pCounterDataScratchBuffer = &counterDataScratch[0];
+                nvpaStatus = NVPW_D3D11_Profiler_CounterDataImage_InitializeScratchBuffer(&initScratchBufferParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+
+                return true;
+            }
+
+            virtual bool SetConfig(const SetConfigParams& config) const override
+            {
+                NVPW_D3D11_Profiler_DeviceContext_SetConfig_Params setConfigParams = { NVPW_D3D11_Profiler_DeviceContext_SetConfig_Params_STRUCT_SIZE };
+                setConfigParams.pDeviceContext = pDeviceContext;
+                setConfigParams.pConfig = config.pConfigImage;
+                setConfigParams.configSize = config.configImageSize;
+                setConfigParams.minNestingLevel = 1;
+                setConfigParams.numNestingLevels = config.numNestingLevels;
+                setConfigParams.passIndex = 0;
+                setConfigParams.targetNestingLevel = 1;
+                NVPA_Status nvpaStatus = NVPW_D3D11_Profiler_DeviceContext_SetConfig(&setConfigParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+
+                return true;
+            }
+
+            virtual bool BeginPass() const override
+            {
+                NVPW_D3D11_Profiler_DeviceContext_BeginPass_Params beginPassParams = { NVPW_D3D11_Profiler_DeviceContext_BeginPass_Params_STRUCT_SIZE };
+                beginPassParams.pDeviceContext = pDeviceContext;
+                NVPA_Status nvpaStatus = NVPW_D3D11_Profiler_DeviceContext_BeginPass(&beginPassParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+                return true;
+            }
+
+            virtual bool EndPass() const override
+            {
+                NVPW_D3D11_Profiler_DeviceContext_EndPass_Params endPassParams = { NVPW_D3D11_Profiler_DeviceContext_EndPass_Params_STRUCT_SIZE };
+                endPassParams.pDeviceContext = pDeviceContext;
+                NVPA_Status nvpaStatus = NVPW_D3D11_Profiler_DeviceContext_EndPass(&endPassParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+                return true;
+            }
+
+            virtual bool PushRange(const char* pRangeName) override
+            {
+                NVPW_D3D11_Profiler_DeviceContext_PushRange_Params pushRangeParams = { NVPW_D3D11_Profiler_DeviceContext_PushRange_Params_STRUCT_SIZE };
+                pushRangeParams.pDeviceContext = pDeviceContext;
+                pushRangeParams.pRangeName = pRangeName;
+                pushRangeParams.rangeNameLength = 0;
+                NVPA_Status nvpaStatus = NVPW_D3D11_Profiler_DeviceContext_PushRange(&pushRangeParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+                return true;
+            }
+
+            virtual bool PopRange() override
+            {
+                NVPW_D3D11_Profiler_DeviceContext_PopRange_Params popParams = { NVPW_D3D11_Profiler_DeviceContext_PopRange_Params_STRUCT_SIZE };
+                popParams.pDeviceContext = pDeviceContext;
+                NVPA_Status nvpaStatus = NVPW_D3D11_Profiler_DeviceContext_PopRange(&popParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+                return true;
+            }
+
+            virtual bool DecodeCounters(std::vector<uint8_t>& counterDataImage, std::vector<uint8_t>& counterDataScratch, bool& onePassDecoded, bool& allPassesDecoded) const
+            {
+                NVPW_D3D11_Profiler_DeviceContext_DecodeCounters_Params decodeParams = { NVPW_D3D11_Profiler_DeviceContext_DecodeCounters_Params_STRUCT_SIZE };
+                decodeParams.pDeviceContext = pDeviceContext;
+                decodeParams.counterDataImageSize = counterDataImage.size();
+                decodeParams.pCounterDataImage = counterDataImage.data();
+                decodeParams.counterDataScratchBufferSize = counterDataScratch.size();
+                decodeParams.pCounterDataScratchBuffer = counterDataScratch.data();
+                NVPA_Status nvpaStatus = NVPW_D3D11_Profiler_DeviceContext_DecodeCounters(&decodeParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+
+                onePassDecoded = decodeParams.onePassCollected;
+                allPassesDecoded = decodeParams.allPassesCollected;
+                return true;
+            }
+
+            bool Initialize(ID3D11DeviceContext* pDeviceContext_, SessionOptions sessionOptions_)
+            {
+                pDeviceContext = pDeviceContext_;
+                sessionOptions = sessionOptions_;
+            }
+
+            void Reset()
+            {
+                NVPW_D3D11_Profiler_DeviceContext_EndSession_Params endSessionParams = {NVPW_D3D11_Profiler_DeviceContext_EndSession_Params_STRUCT_SIZE};
+                endSessionParams.pDeviceContext = pDeviceContext;
+                NVPA_Status nvpaStatus = NVPW_D3D11_Profiler_DeviceContext_EndSession(&endSessionParams);
+                if (nvpaStatus)
+                {
+                    NV_PERF_LOG_ERR(10, "NVPW_D3D11_Profiler_DeviceContext_EndSession failed, nvpaStatus = %d\n", nvpaStatus);
+                }
+
+                sessionOptions = {};
+                pDeviceContext = nullptr;
+            }
+        };
+
+    private:
+        ProfilerApi m_profilerApi;
+        RangeProfilerStateMachine m_stateMachine;
+
+    public:
+        ~RangeProfilerD3D11()
+        {
+        }
+
+        RangeProfilerD3D11(const RangeProfilerD3D11&) = delete;
+
+        RangeProfilerD3D11()
+            : m_profilerApi()
+            , m_stateMachine(m_profilerApi)
+        {
+        }
+        // TODO: make this move friendly
+
+        RangeProfilerD3D11& operator=(const RangeProfilerD3D11&) = delete;
+
+        bool IsInSession() const
+        {
+            return !!m_profilerApi.pDeviceContext;
+        }
+
+        bool IsInPass() const
+        {
+            return m_stateMachine.IsInPass();
+        }
+
+        ID3D11DeviceContext* GetDeviceContext() const
+        {
+            return m_profilerApi.pDeviceContext;
+        }
+
+        bool BeginSession(ID3D11DeviceContext* pDeviceContext, const SessionOptions& sessionOptions)
+        {
+            if (IsInSession())
+            {
+                NV_PERF_LOG_ERR(10, "already in a session\n");
+                return false;
+            }
+            if (!nv::perf::D3D11IsNvidiaDevice(pDeviceContext) || !nv::perf::profiler::D3D11IsGpuSupported(pDeviceContext))
+            {
+                NV_PERF_LOG_ERR(10, "device is not supported for profiling\n");
+                return false;
+            }
+
+            NVPA_Status nvpaStatus;
+
+            NVPW_D3D11_Profiler_CalcTraceBufferSize_Params calcTraceBufferSizeParam = { NVPW_D3D11_Profiler_CalcTraceBufferSize_Params_STRUCT_SIZE };
+            calcTraceBufferSizeParam.maxRangesPerPass = sessionOptions.maxNumRanges;
+            calcTraceBufferSizeParam.avgRangeNameLength = sessionOptions.avgRangeNameLength;
+            nvpaStatus = NVPW_D3D11_Profiler_CalcTraceBufferSize(&calcTraceBufferSizeParam);
+            if (nvpaStatus)
+            {
+                return false;
+            }
+
+            NVPW_D3D11_Profiler_DeviceContext_BeginSession_Params beginSessionParams = { NVPW_D3D11_Profiler_DeviceContext_BeginSession_Params_STRUCT_SIZE };
+            beginSessionParams.pDeviceContext = pDeviceContext;
+            beginSessionParams.numTraceBuffers = sessionOptions.numTraceBuffers;
+            beginSessionParams.traceBufferSize = calcTraceBufferSizeParam.traceBufferSize;
+            beginSessionParams.maxRangesPerPass = sessionOptions.maxNumRanges;
+            beginSessionParams.maxLaunchesPerPass = sessionOptions.maxNumRanges;
+            nvpaStatus = NVPW_D3D11_Profiler_DeviceContext_BeginSession(&beginSessionParams);
+            if (nvpaStatus)
+            {
+                if (nvpaStatus == NVPA_STATUS_INSUFFICIENT_PRIVILEGE)
+                {
+                    NV_PERF_LOG_ERR(10, "Failed to start profiler session: profiling permissions not enabled.  Please follow these instructions: https://developer.nvidia.com/ERR_NVGPUCTRPERM\n");
+                }
+                else if (nvpaStatus == NVPA_STATUS_INSUFFICIENT_DRIVER_VERSION)
+                {
+                    NV_PERF_LOG_ERR(10, "Failed to start profiler session: insufficient driver version.  Please install the latest NVIDIA driver from https://www.nvidia.com\n");
+                }
+                else
+                {
+                    NV_PERF_LOG_ERR(10, "Failed to start profiler session: unknown error.  It may be a resource conflict - only one profiler session can run at a time per GPU.\n");
+                }
+                return false;
+            }
+
+            m_profilerApi.sessionOptions = sessionOptions;
+            m_profilerApi.pDeviceContext = pDeviceContext;
+            return true;
+        }
+
+        bool EndSession()
+        {
+            if (!IsInSession())
+            {
+                NV_PERF_LOG_ERR(10, "must be called in a session\n");
+                return false;
+            }
+
+            m_stateMachine.Reset();
+            m_profilerApi.Reset();
+            return true;
+        }
+
+        bool EnqueueCounterCollection(const SetConfigParams& config)
+        {
+            const bool status = m_stateMachine.EnqueueCounterCollection(config);
+            return status;
+        }
+
+        bool EnqueueCounterCollection(const CounterConfiguration& configuration, uint16_t numNestingLevels = 1, size_t numStatisticalSamples = 1)
+        {
+            const bool status = m_stateMachine.EnqueueCounterCollection(SetConfigParams(configuration, numNestingLevels, numStatisticalSamples));
+            return status;
+        }
+
+        bool BeginPass()
+        {
+            if (!IsInSession())
+            {
+                NV_PERF_LOG_ERR(10, "must be called in a session\n");
+                return false;
+            }
+
+            const bool status = m_stateMachine.BeginPass();
+            return status;
+        }
+
+        bool EndPass()
+        {
+            if (!IsInSession())
+            {
+                NV_PERF_LOG_ERR(10, "must be called in a session\n");
+                return false;
+            }
+
+            const bool status = m_stateMachine.EndPass();
+            return status;
+        }
+
+        bool PushRange(const char* pRangeName)
+        {
+            const bool status = m_stateMachine.PushRange(pRangeName);
+            return status;
+        }
+
+        bool PopRange()
+        {
+            const bool status = m_stateMachine.PopRange();
+            return status;
+        }
+
+        bool DecodeCounters(DecodeResult& decodeResult)
+        {
+            if (!IsInSession())
+            {
+                NV_PERF_LOG_ERR(10, "must be called in a session\n");
+                return false;
+            }
+
+            const bool status = m_stateMachine.DecodeCounters(decodeResult);
+            return status;
+        }
+
+        bool AllPassesSubmitted() const
+        {
+            const bool allPassesSubmitted = m_stateMachine.AllPassesSubmitted();
+            return allPassesSubmitted;
+        }
+    };
+}}}
--- a/ruins64k/tools/NvPerfUtility/include/NvPerfRangeProfilerD3D12.h
+++ b/ruins64k/tools/NvPerfUtility/include/NvPerfRangeProfilerD3D12.h
@@ -0,0 +1,419 @@
+/*
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+#pragma once
+
+#include <stdio.h>
+#include <thread>
+#include <vector>
+#include "NvPerfInit.h"
+#include "NvPerfCounterConfiguration.h"
+#include "NvPerfRangeProfiler.h"
+#include "NvPerfD3D12.h"
+
+struct ID3D12CommandQueue;
+
+namespace nv { namespace perf { namespace profiler {
+
+    class RangeProfilerD3D12
+    {
+    protected:
+        struct ProfilerApi : RangeProfilerStateMachine::IProfilerApi
+        {
+            CComPtr<ID3D12CommandQueue> pCommandQueue;
+            SessionOptions sessionOptions;
+
+            ProfilerApi()
+                : pCommandQueue(nullptr)
+                , sessionOptions()
+            {
+            }
+
+            virtual bool CreateCounterData(const SetConfigParams& config, std::vector<uint8_t>& counterDataImage, std::vector<uint8_t>& counterDataScratch) const override
+            {
+                NVPA_Status nvpaStatus;
+
+                NVPW_D3D12_Profiler_CounterDataImageOptions counterDataImageOptions = { NVPW_D3D12_Profiler_CounterDataImageOptions_STRUCT_SIZE };
+                counterDataImageOptions.pCounterDataPrefix = config.pCounterDataPrefix;
+                counterDataImageOptions.counterDataPrefixSize = config.counterDataPrefixSize;
+                counterDataImageOptions.maxNumRanges = static_cast<uint32_t>(sessionOptions.maxNumRanges);
+                counterDataImageOptions.maxNumRangeTreeNodes = static_cast<uint32_t>(2 * sessionOptions.maxNumRanges);
+                counterDataImageOptions.maxRangeNameLength = static_cast<uint32_t>(sessionOptions.avgRangeNameLength);
+
+                NVPW_D3D12_Profiler_CounterDataImage_CalculateSize_Params calculateSizeParams = { NVPW_D3D12_Profiler_CounterDataImage_CalculateSize_Params_STRUCT_SIZE };
+                calculateSizeParams.pOptions = &counterDataImageOptions;
+                calculateSizeParams.counterDataImageOptionsSize = NVPW_D3D12_Profiler_CounterDataImageOptions_STRUCT_SIZE;
+                nvpaStatus = NVPW_D3D12_Profiler_CounterDataImage_CalculateSize(&calculateSizeParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+
+                counterDataImage.resize(calculateSizeParams.counterDataImageSize);
+
+                NVPW_D3D12_Profiler_CounterDataImage_Initialize_Params initializeParams = { NVPW_D3D12_Profiler_CounterDataImage_Initialize_Params_STRUCT_SIZE };
+                initializeParams.counterDataImageOptionsSize = NVPW_D3D12_Profiler_CounterDataImageOptions_STRUCT_SIZE;
+                initializeParams.pOptions = &counterDataImageOptions;
+                initializeParams.counterDataImageSize = calculateSizeParams.counterDataImageSize;
+                initializeParams.pCounterDataImage = &counterDataImage[0];
+                nvpaStatus = NVPW_D3D12_Profiler_CounterDataImage_Initialize(&initializeParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+
+                NVPW_D3D12_Profiler_CounterDataImage_CalculateScratchBufferSize_Params scratchBufferSizeParams = { NVPW_D3D12_Profiler_CounterDataImage_CalculateScratchBufferSize_Params_STRUCT_SIZE };
+                scratchBufferSizeParams.counterDataImageSize = calculateSizeParams.counterDataImageSize;
+                scratchBufferSizeParams.pCounterDataImage = initializeParams.pCounterDataImage;
+                nvpaStatus = NVPW_D3D12_Profiler_CounterDataImage_CalculateScratchBufferSize(&scratchBufferSizeParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+
+                counterDataScratch.resize(scratchBufferSizeParams.counterDataScratchBufferSize);
+
+                NVPW_D3D12_Profiler_CounterDataImage_InitializeScratchBuffer_Params initScratchBufferParams = { NVPW_D3D12_Profiler_CounterDataImage_InitializeScratchBuffer_Params_STRUCT_SIZE };
+                initScratchBufferParams.counterDataImageSize = calculateSizeParams.counterDataImageSize;
+                initScratchBufferParams.pCounterDataImage = initializeParams.pCounterDataImage;
+                initScratchBufferParams.counterDataScratchBufferSize = scratchBufferSizeParams.counterDataScratchBufferSize;
+                initScratchBufferParams.pCounterDataScratchBuffer = &counterDataScratch[0];
+
+                nvpaStatus = NVPW_D3D12_Profiler_CounterDataImage_InitializeScratchBuffer(&initScratchBufferParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+
+                return true;
+            }
+
+            virtual bool SetConfig(const SetConfigParams& config) const override
+            {
+                NVPW_D3D12_Profiler_Queue_SetConfig_Params setConfigParams = { NVPW_D3D12_Profiler_Queue_SetConfig_Params_STRUCT_SIZE };
+                setConfigParams.pCommandQueue = pCommandQueue;
+                setConfigParams.pConfig = config.pConfigImage;
+                setConfigParams.configSize = config.configImageSize;
+                setConfigParams.minNestingLevel = 1;
+                setConfigParams.numNestingLevels = config.numNestingLevels;
+                setConfigParams.passIndex = 0;
+                setConfigParams.targetNestingLevel = 1;
+                NVPA_Status nvpaStatus = NVPW_D3D12_Profiler_Queue_SetConfig(&setConfigParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+
+                return true;
+            }
+
+            virtual bool BeginPass() const override
+            {
+                NVPW_D3D12_Profiler_Queue_BeginPass_Params beginPassParams = { NVPW_D3D12_Profiler_Queue_BeginPass_Params_STRUCT_SIZE };
+                beginPassParams.pCommandQueue = pCommandQueue;
+                NVPA_Status nvpaStatus = NVPW_D3D12_Profiler_Queue_BeginPass(&beginPassParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+                return true;
+            }
+
+            virtual bool EndPass() const override
+            {
+                NVPW_D3D12_Profiler_Queue_EndPass_Params endPassParams = { NVPW_D3D12_Profiler_Queue_EndPass_Params_STRUCT_SIZE };
+                endPassParams.pCommandQueue = pCommandQueue;
+                NVPA_Status nvpaStatus = NVPW_D3D12_Profiler_Queue_EndPass(&endPassParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+                return true;
+            }
+
+            virtual bool PushRange(const char* pRangeName) override
+            {
+                NVPW_D3D12_Profiler_Queue_PushRange_Params pushRangeParams = {NVPW_D3D12_Profiler_Queue_PushRange_Params_STRUCT_SIZE};
+                pushRangeParams.pRangeName = pRangeName;
+                pushRangeParams.rangeNameLength = 0;
+                pushRangeParams.pCommandQueue = pCommandQueue;
+                NVPA_Status nvpaStatus = NVPW_D3D12_Profiler_Queue_PushRange(&pushRangeParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+                return true;
+            }
+
+            virtual bool PopRange() override
+            {
+                NVPW_D3D12_Profiler_Queue_PopRange_Params popParams = {NVPW_D3D12_Profiler_Queue_PopRange_Params_STRUCT_SIZE};
+                popParams.pCommandQueue = pCommandQueue;
+                NVPA_Status nvpaStatus = NVPW_D3D12_Profiler_Queue_PopRange(&popParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+                return true;
+            }
+
+            virtual bool DecodeCounters(std::vector<uint8_t>& counterDataImage, std::vector<uint8_t>& counterDataScratch, bool& onePassDecoded, bool& allPassesDecoded) const
+            {
+                NVPW_D3D12_Profiler_Queue_DecodeCounters_Params decodeParams = { NVPW_D3D12_Profiler_Queue_DecodeCounters_Params_STRUCT_SIZE };
+                decodeParams.pCommandQueue = pCommandQueue;
+                decodeParams.counterDataImageSize = counterDataImage.size();
+                decodeParams.pCounterDataImage = counterDataImage.data();
+                decodeParams.counterDataScratchBufferSize = counterDataScratch.size();
+                decodeParams.pCounterDataScratchBuffer = counterDataScratch.data();
+                NVPA_Status nvpaStatus = NVPW_D3D12_Profiler_Queue_DecodeCounters(&decodeParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+
+                onePassDecoded = decodeParams.onePassCollected;
+                allPassesDecoded = decodeParams.allPassesCollected;
+                return true;
+            }
+
+            bool Initialize(ID3D12CommandQueue* pCommandQueue_, const SessionOptions& sessionOptions_)
+            {
+                pCommandQueue = pCommandQueue_;
+                sessionOptions = sessionOptions_;
+                return true;
+            }
+
+            void Reset()
+            {
+                NVPW_D3D12_Profiler_Queue_EndSession_Params endSessionParams = {NVPW_D3D12_Profiler_Queue_EndSession_Params_STRUCT_SIZE};
+                endSessionParams.pCommandQueue = pCommandQueue;
+                endSessionParams.timeout = INFINITE;
+                NVPA_Status nvpaStatus = NVPW_D3D12_Profiler_Queue_EndSession(&endSessionParams);
+                if (nvpaStatus)
+                {
+                    NV_PERF_LOG_ERR(10, "NVPW_D3D12_Profiler_Queue_EndSession failed, nvpaStatus = %d\n", nvpaStatus);
+                }
+
+                sessionOptions = {};
+                pCommandQueue = nullptr;
+            }
+        };
+
+    protected: // members
+        ProfilerApi m_profilerApi;
+        RangeProfilerStateMachine m_stateMachine;
+        std::thread m_spgoThread;
+        volatile bool m_spgoThreadExited;
+
+    private:
+        // non-copyable
+        RangeProfilerD3D12(const RangeProfilerD3D12&);
+
+        static void SpgoThreadProc(RangeProfilerD3D12* pRangeProfilerD3D12, ID3D12CommandQueue* pCommandQueue)
+        {
+            // Run continuously in the background, handling all BeginPass and EndPass GPU operations until EndSession().
+            NVPW_D3D12_Queue_ServicePendingGpuOperations_Params serviceGpuOpsParams = { NVPW_D3D12_Queue_ServicePendingGpuOperations_Params_STRUCT_SIZE };
+            serviceGpuOpsParams.pCommandQueue = pCommandQueue;
+            serviceGpuOpsParams.numOperations = 0; // run until EndSession()
+            serviceGpuOpsParams.timeout = INFINITE;
+            NVPA_Status nvpaStatus = NVPW_D3D12_Queue_ServicePendingGpuOperations(&serviceGpuOpsParams);
+            if (nvpaStatus)
+            {
+                // TODO: log an error
+            }
+
+            pRangeProfilerD3D12->m_spgoThreadExited = true;
+        }
+
+    public:
+        ~RangeProfilerD3D12()
+        {
+        }
+
+        RangeProfilerD3D12()
+            : m_profilerApi()
+            , m_stateMachine(m_profilerApi)
+            , m_spgoThread()
+            , m_spgoThreadExited()
+        {
+        }
+        // TODO: make this move friendly
+
+        bool IsInSession() const
+        {
+            return !!m_profilerApi.pCommandQueue;
+        }
+
+        bool IsInPass() const
+        {
+            return m_stateMachine.IsInPass();
+        }
+
+        ID3D12CommandQueue* GetCommandQueue() const
+        {
+            return m_profilerApi.pCommandQueue;
+        }
+
+        bool BeginSession(
+            ID3D12CommandQueue* pCommandQueue,
+            const SessionOptions& sessionOptions)
+        {
+            if (IsInSession())
+            {
+                NV_PERF_LOG_ERR(10, "already in a session\n");
+                return false;
+            }
+            if (!D3D12IsNvidiaDevice(pCommandQueue) || !D3D12IsGpuSupported(pCommandQueue))
+            {
+                // TODO: error - device is not supported for profiling
+                return false;
+            }
+
+            NVPA_Status nvpaStatus;
+
+            NVPW_D3D12_Profiler_CalcTraceBufferSize_Params calcTraceBufferSizeParam = { NVPW_D3D12_Profiler_CalcTraceBufferSize_Params_STRUCT_SIZE };
+            calcTraceBufferSizeParam.maxRangesPerPass = sessionOptions.maxNumRanges;
+            calcTraceBufferSizeParam.avgRangeNameLength = sessionOptions.avgRangeNameLength;
+            nvpaStatus = NVPW_D3D12_Profiler_CalcTraceBufferSize(&calcTraceBufferSizeParam);
+            if (nvpaStatus)
+            {
+                return false;
+            }
+
+            NVPW_D3D12_Profiler_Queue_BeginSession_Params beginSessionParams = { NVPW_D3D12_Profiler_Queue_BeginSession_Params_STRUCT_SIZE };
+            beginSessionParams.pCommandQueue = pCommandQueue;
+            beginSessionParams.numTraceBuffers = sessionOptions.numTraceBuffers;
+            beginSessionParams.traceBufferSize = calcTraceBufferSizeParam.traceBufferSize;
+            beginSessionParams.maxRangesPerPass = sessionOptions.maxNumRanges;
+            beginSessionParams.maxLaunchesPerPass = sessionOptions.maxNumRanges;
+            nvpaStatus = NVPW_D3D12_Profiler_Queue_BeginSession(&beginSessionParams);
+            if (nvpaStatus)
+            {
+                if (nvpaStatus == NVPA_STATUS_INSUFFICIENT_PRIVILEGE)
+                {
+                    NV_PERF_LOG_ERR(10, "Failed to start profiler session: profiling permissions not enabled.  Please follow these instructions: https://developer.nvidia.com/ERR_NVGPUCTRPERM\n");
+                }
+                else if (nvpaStatus == NVPA_STATUS_INSUFFICIENT_DRIVER_VERSION)
+                {
+                    NV_PERF_LOG_ERR(10, "Failed to start profiler session: insufficient driver version.  Please install the latest NVIDIA driver from https://www.nvidia.com\n");
+                }
+                else
+                {
+                    NV_PERF_LOG_ERR(10, "Failed to start profiler session: unknown error.  It may be a resource conflict - only one profiler session can run at a time per GPU.\n");
+                }
+                return false;
+            }
+
+            m_spgoThreadExited = false;
+            m_spgoThread = std::thread(SpgoThreadProc, this, pCommandQueue);
+
+            m_profilerApi.Initialize(pCommandQueue, sessionOptions);
+            return true;
+        }
+
+        bool EndSession()
+        {
+            if (!IsInSession())
+            {
+                NV_PERF_LOG_ERR(10, "must be called in a session\n");
+                return false;
+            }
+
+            m_stateMachine.Reset();
+            m_profilerApi.Reset();
+            m_spgoThread.join();
+            m_spgoThreadExited = false;
+
+            return true;
+        }
+
+        bool EnqueueCounterCollection(const SetConfigParams& config)
+        {
+            const bool status = m_stateMachine.EnqueueCounterCollection(config);
+            return status;
+        }
+
+        bool EnqueueCounterCollection(const CounterConfiguration& configuration, uint16_t numNestingLevels = 1, size_t numStatisticalSamples = 1)
+        {
+            const bool status = m_stateMachine.EnqueueCounterCollection(SetConfigParams(configuration, numNestingLevels, numStatisticalSamples));
+            return status;
+        }
+
+        bool BeginPass()
+        {
+            if (!IsInSession())
+            {
+                NV_PERF_LOG_ERR(10, "must be called in a session\n");
+                return false;
+            }
+
+            const bool status = m_stateMachine.BeginPass();
+            return status;
+        }
+
+        bool EndPass()
+        {
+            if (!IsInSession())
+            {
+                NV_PERF_LOG_ERR(10, "must be called in a session\n");
+                return false;
+            }
+
+            const bool status = m_stateMachine.EndPass();
+            return status;
+        }
+
+        // Convenience method to start a Queue-level range.  For CommandLists, use D3D12RangeCommands::PushRange.
+        bool PushRange(const char* pRangeName)
+        {
+            const bool status = m_stateMachine.PushRange(pRangeName);
+            return status;
+        }
+
+        // Convenience method to end a Queue-level range.  For CommandLists, use D3D12RangeCommands::PopRange.
+        bool PopRange()
+        {
+            const bool status = m_stateMachine.PopRange();
+            return status;
+        }
+
+        bool DecodeCounters(DecodeResult& decodeResult)
+        {
+            if (!IsInSession())
+            {
+                NV_PERF_LOG_ERR(10, "must be called in a session\n");
+                return false;
+            }
+
+            if (m_spgoThreadExited)
+            {
+                NV_PERF_LOG_ERR(10, "the background thread exited; possible hang on subsequent CPU-waiting-on-GPU calls\n");
+                return false;
+            }
+
+            const bool status = m_stateMachine.DecodeCounters(decodeResult);
+            return status;
+        }
+
+        bool AllPassesSubmitted() const
+        {
+            const bool allPassesSubmitted = m_stateMachine.AllPassesSubmitted();
+            return allPassesSubmitted;
+        }
+    };
+
+
+}}}
--- a/ruins64k/tools/NvPerfUtility/include/NvPerfRangeProfilerOpenGL.h
+++ b/ruins64k/tools/NvPerfUtility/include/NvPerfRangeProfilerOpenGL.h
@@ -0,0 +1,401 @@
+/*
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+#pragma once
+
+#include <stdio.h>
+#include <vector>
+#include "NvPerfInit.h"
+#include "NvPerfCounterConfiguration.h"
+#include "NvPerfRangeProfiler.h"
+#include "NvPerfOpenGL.h"
+
+namespace nv { namespace perf { namespace profiler {
+
+    class RangeProfilerOpenGL
+    {
+    protected:
+        struct ProfilerApi : RangeProfilerStateMachine::IProfilerApi
+        {
+            size_t maxQueueRangesPerPass;
+            size_t nextCommandBufferIdx;
+            SessionOptions sessionOptions;
+            NVPW_OpenGL_GraphicsContext* pGraphicsContext;
+
+            ProfilerApi()
+                : maxQueueRangesPerPass(1)
+                , nextCommandBufferIdx()
+                , sessionOptions()
+                , pGraphicsContext()
+            {
+            }
+
+            virtual bool CreateCounterData(const SetConfigParams& config, std::vector<uint8_t>& counterDataImage, std::vector<uint8_t>& counterDataScratch) const override
+            {
+                NVPA_Status nvpaStatus;
+
+                NVPW_OpenGL_Profiler_CounterDataImageOptions counterDataImageOption = { NVPW_OpenGL_Profiler_CounterDataImageOptions_STRUCT_SIZE };
+                counterDataImageOption.pCounterDataPrefix = config.pCounterDataPrefix;
+                counterDataImageOption.counterDataPrefixSize = config.counterDataPrefixSize;
+                counterDataImageOption.maxNumRanges = static_cast<uint32_t>(sessionOptions.maxNumRanges);
+                counterDataImageOption.maxNumRangeTreeNodes = static_cast<uint32_t>(2 * sessionOptions.maxNumRanges);
+                counterDataImageOption.maxRangeNameLength = static_cast<uint32_t>(sessionOptions.avgRangeNameLength);
+
+                NVPW_OpenGL_Profiler_CounterDataImage_CalculateSize_Params calculateSizeParams = { NVPW_OpenGL_Profiler_CounterDataImage_CalculateSize_Params_STRUCT_SIZE };
+                calculateSizeParams.pOptions = &counterDataImageOption;
+                calculateSizeParams.counterDataImageOptionsSize = NVPW_OpenGL_Profiler_CounterDataImageOptions_STRUCT_SIZE;
+                nvpaStatus = NVPW_OpenGL_Profiler_CounterDataImage_CalculateSize(&calculateSizeParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+
+                NVPW_OpenGL_Profiler_CounterDataImage_Initialize_Params initializeParams = { NVPW_OpenGL_Profiler_CounterDataImage_Initialize_Params_STRUCT_SIZE };
+                initializeParams.counterDataImageOptionsSize = NVPW_OpenGL_Profiler_CounterDataImageOptions_STRUCT_SIZE;
+                initializeParams.pOptions = &counterDataImageOption;
+                initializeParams.counterDataImageSize = calculateSizeParams.counterDataImageSize;
+
+                counterDataImage.resize(calculateSizeParams.counterDataImageSize);
+                initializeParams.pCounterDataImage = &counterDataImage[0];
+                nvpaStatus = NVPW_OpenGL_Profiler_CounterDataImage_Initialize(&initializeParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+
+                NVPW_OpenGL_Profiler_CounterDataImage_CalculateScratchBufferSize_Params scratchBufferSizeParams = { NVPW_OpenGL_Profiler_CounterDataImage_CalculateScratchBufferSize_Params_STRUCT_SIZE };
+                scratchBufferSizeParams.counterDataImageSize = calculateSizeParams.counterDataImageSize;
+                scratchBufferSizeParams.pCounterDataImage = initializeParams.pCounterDataImage;
+                nvpaStatus = NVPW_OpenGL_Profiler_CounterDataImage_CalculateScratchBufferSize(&scratchBufferSizeParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+                counterDataScratch.resize(scratchBufferSizeParams.counterDataScratchBufferSize);
+
+                NVPW_OpenGL_Profiler_CounterDataImage_InitializeScratchBuffer_Params initScratchBufferParams = { NVPW_OpenGL_Profiler_CounterDataImage_InitializeScratchBuffer_Params_STRUCT_SIZE };
+                initScratchBufferParams.counterDataImageSize = calculateSizeParams.counterDataImageSize;
+                initScratchBufferParams.pCounterDataImage = initializeParams.pCounterDataImage;
+                initScratchBufferParams.counterDataScratchBufferSize = scratchBufferSizeParams.counterDataScratchBufferSize;
+                initScratchBufferParams.pCounterDataScratchBuffer = &counterDataScratch[0];
+
+                nvpaStatus = NVPW_OpenGL_Profiler_CounterDataImage_InitializeScratchBuffer(&initScratchBufferParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+
+                return true;
+            }
+
+            virtual bool SetConfig(const SetConfigParams& config) const override
+            {
+                NVPW_OpenGL_Profiler_GraphicsContext_SetConfig_Params setConfigParams = { NVPW_OpenGL_Profiler_GraphicsContext_SetConfig_Params_STRUCT_SIZE };
+                setConfigParams.pConfig = config.pConfigImage;
+                setConfigParams.configSize = config.configImageSize;
+                setConfigParams.minNestingLevel = 1;
+                setConfigParams.numNestingLevels = config.numNestingLevels;
+                setConfigParams.passIndex = 0;
+                setConfigParams.targetNestingLevel = 1;
+                NVPA_Status nvpaStatus = NVPW_OpenGL_Profiler_GraphicsContext_SetConfig(&setConfigParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+
+                return true;
+            }
+
+            virtual bool BeginPass() const override
+            {
+                NVPW_OpenGL_Profiler_GraphicsContext_BeginPass_Params beginPassParams = { NVPW_OpenGL_Profiler_GraphicsContext_BeginPass_Params_STRUCT_SIZE };
+                NVPA_Status nvpaStatus = NVPW_OpenGL_Profiler_GraphicsContext_BeginPass(&beginPassParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+                return true;
+            }
+
+            virtual bool EndPass() const override
+            {
+                NVPW_OpenGL_Profiler_GraphicsContext_EndPass_Params endPassParams = { NVPW_OpenGL_Profiler_GraphicsContext_EndPass_Params_STRUCT_SIZE };
+                NVPA_Status nvpaStatus = NVPW_OpenGL_Profiler_GraphicsContext_EndPass(&endPassParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+                return true;
+            }
+
+            virtual bool PushRange(const char* pRangeName) override
+            {
+                NVPW_OpenGL_Profiler_GraphicsContext_PushRange_Params pushRangeParams = {NVPW_OpenGL_Profiler_GraphicsContext_PushRange_Params_STRUCT_SIZE};
+                pushRangeParams.pRangeName = pRangeName;
+                NVPA_Status nvpaStatus = NVPW_OpenGL_Profiler_GraphicsContext_PushRange(&pushRangeParams);
+                if (nvpaStatus)
+                {
+                    NV_PERF_LOG_ERR(10, "NVPW_OpenGL_Profiler_GraphicsContext_PushRange failed, nvpaStatus = %d\n", nvpaStatus);
+                    return false;
+                }
+                return true;
+            }
+
+            virtual bool PopRange() override
+            {
+                NVPW_OpenGL_Profiler_GraphicsContext_PopRange_Params popRangeParams = {NVPW_OpenGL_Profiler_GraphicsContext_PopRange_Params_STRUCT_SIZE};
+                NVPA_Status nvpaStatus = NVPW_OpenGL_Profiler_GraphicsContext_PopRange(&popRangeParams);
+                if (nvpaStatus)
+                {
+                    NV_PERF_LOG_ERR(10, "NVPW_OpenGL_Profiler_GraphicsContext_PopRange failed, nvpaStatus = %d\n", nvpaStatus);
+                    return false;
+                }
+                return true;
+            }
+            virtual bool DecodeCounters(std::vector<uint8_t>& counterDataImage, std::vector<uint8_t>& counterDataScratch, bool& onePassDecoded, bool& allPassesDecoded) const
+            {
+                NVPW_OpenGL_Profiler_GraphicsContext_DecodeCounters_Params decodeParams = { NVPW_OpenGL_Profiler_GraphicsContext_DecodeCounters_Params_STRUCT_SIZE };
+                decodeParams.counterDataImageSize = counterDataImage.size();
+                decodeParams.pCounterDataImage = counterDataImage.data();
+                decodeParams.counterDataScratchBufferSize = counterDataScratch.size();
+                decodeParams.pCounterDataScratchBuffer = counterDataScratch.data();
+                decodeParams.pGraphicsContext = pGraphicsContext;
+                NVPA_Status nvpaStatus = NVPW_OpenGL_Profiler_GraphicsContext_DecodeCounters(&decodeParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+
+                onePassDecoded = decodeParams.onePassCollected;
+                allPassesDecoded = decodeParams.allPassesCollected;
+                return true;
+            }
+
+            bool Initialize(const SessionOptions& sessionOptions_)
+            {
+                NVPW_OpenGL_GetCurrentGraphicsContext_Params getCurrentGraphicsContextParams = {NVPW_OpenGL_GetCurrentGraphicsContext_Params_STRUCT_SIZE};
+                NVPA_Status nvpaStatus = NVPW_OpenGL_GetCurrentGraphicsContext(&getCurrentGraphicsContextParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+                pGraphicsContext = getCurrentGraphicsContextParams.pGraphicsContext;
+                sessionOptions = sessionOptions_;
+                return true;
+            }
+
+            void Reset()
+            {
+                NVPW_OpenGL_Profiler_GraphicsContext_EndSession_Params endSessionParams = {NVPW_OpenGL_Profiler_GraphicsContext_EndSession_Params_STRUCT_SIZE};
+                NVPA_Status nvpaStatus = NVPW_OpenGL_Profiler_GraphicsContext_EndSession(&endSessionParams);
+                if (nvpaStatus)
+                {
+                    NV_PERF_LOG_ERR(10, "NVPW_OpenGL_Profiler_GraphicsContext_EndSession failed, nvpaStatus = %d\n", nvpaStatus);
+                }
+                sessionOptions = {};
+                pGraphicsContext = nullptr;
+            }
+        };
+
+    protected: // members
+        ProfilerApi m_profilerApi;
+        RangeProfilerStateMachine m_stateMachine;
+
+    private:
+        // non-copyable
+        RangeProfilerOpenGL(const RangeProfilerOpenGL&);
+
+    public:
+        ~RangeProfilerOpenGL()
+        {
+        }
+
+        RangeProfilerOpenGL()
+            : m_profilerApi()
+            , m_stateMachine(m_profilerApi)
+        {
+        }
+
+        bool IsInSession() const
+        {
+            return m_profilerApi.pGraphicsContext;
+        }
+
+        bool IsInPass() const
+        {
+            return m_stateMachine.IsInPass();
+        }
+
+        bool SetMaxQueueRangesPerPass(size_t maxQueueRangesPerPass)
+        {
+            if (IsInSession())
+            {
+                NV_PERF_LOG_ERR(10, "SetMaxQueueRangesPerPass must be called before the session starts.\n");
+                return false;
+            }
+            m_profilerApi.maxQueueRangesPerPass = maxQueueRangesPerPass;
+            return true;
+        }
+
+        bool BeginSession(
+            const SessionOptions& sessionOptions)
+        {
+            if (IsInSession())
+            {
+                NV_PERF_LOG_ERR(10, "already in a session\n");
+                return false;
+            }
+            if (!OpenGLIsNvidiaDevice() || !OpenGLIsGpuSupported())
+            {
+                // TODO: error - device is not supported for profiling
+                return false;
+            }
+
+            NVPA_Status nvpaStatus;
+
+            NVPW_OpenGL_Profiler_CalcTraceBufferSize_Params calcTraceBufferSizeParam = { NVPW_OpenGL_Profiler_CalcTraceBufferSize_Params_STRUCT_SIZE };
+            calcTraceBufferSizeParam.maxRangesPerPass = sessionOptions.maxNumRanges;
+            calcTraceBufferSizeParam.avgRangeNameLength = sessionOptions.avgRangeNameLength;
+            nvpaStatus = NVPW_OpenGL_Profiler_CalcTraceBufferSize(&calcTraceBufferSizeParam);
+            if (nvpaStatus)
+            {
+                return false;
+            }
+
+            NVPW_OpenGL_Profiler_GraphicsContext_BeginSession_Params beginSessionParams = { NVPW_OpenGL_Profiler_GraphicsContext_BeginSession_Params_STRUCT_SIZE };
+            beginSessionParams.numTraceBuffers = sessionOptions.numTraceBuffers;
+            beginSessionParams.traceBufferSize = calcTraceBufferSizeParam.traceBufferSize;
+            beginSessionParams.maxRangesPerPass = sessionOptions.maxNumRanges;
+            beginSessionParams.maxLaunchesPerPass = sessionOptions.maxNumRanges;
+            nvpaStatus = NVPW_OpenGL_Profiler_GraphicsContext_BeginSession(&beginSessionParams);
+            if (nvpaStatus)
+            {
+                if (nvpaStatus == NVPA_STATUS_INSUFFICIENT_PRIVILEGE)
+                {
+                    NV_PERF_LOG_ERR(10, "Failed to start profiler session: profiling permissions not enabled.  Please follow these instructions: https://developer.nvidia.com/nvidia-development-tools-solutions-ERR_NVGPUCTRPERM-permission-issue-performance-counters \n");
+                }
+                else if (nvpaStatus == NVPA_STATUS_INSUFFICIENT_DRIVER_VERSION)
+                {
+                    NV_PERF_LOG_ERR(10, "Failed to start profiler session: insufficient driver version.  Please install the latest NVIDIA driver from https://www.nvidia.com \n");
+                }
+                else
+                {
+                    NV_PERF_LOG_ERR(10, "Failed to start profiler session: unknown error.  It may be a resource conflict - only one profiler session can run at a time per GPU.\n");
+                }
+                return false;
+            }
+
+            if(!m_profilerApi.Initialize(sessionOptions))
+            {
+                return false;
+            }
+
+            return true;
+        }
+
+        bool EndSession()
+        {
+            if (!IsInSession())
+            {
+                NV_PERF_LOG_ERR(10, "must be called in a session\n");
+                return false;
+            }
+
+            m_stateMachine.Reset();
+            m_profilerApi.Reset();
+            return true;
+        }
+
+
+        bool EnqueueCounterCollection(const SetConfigParams& config)
+        {
+            const bool status = m_stateMachine.EnqueueCounterCollection(config);
+            return status;
+        }
+
+        bool EnqueueCounterCollection(const CounterConfiguration& configuration, uint16_t numNestingLevels = 1, size_t numStatisticalSamples = 1)
+        {
+            const bool status = m_stateMachine.EnqueueCounterCollection(SetConfigParams(configuration, numNestingLevels, numStatisticalSamples));
+            return status;
+        }
+
+        bool BeginPass()
+        {
+            if (!IsInSession())
+            {
+                NV_PERF_LOG_ERR(10, "must be called in a session\n");
+                return false;
+            }
+
+            const bool status = m_stateMachine.BeginPass();
+            return status;
+        }
+
+        bool EndPass()
+        {
+            if (!IsInSession())
+            {
+                NV_PERF_LOG_ERR(10, "must be called in a session\n");
+                return false;
+            }
+
+            const bool status = m_stateMachine.EndPass();
+            return status;
+        }
+
+        bool PushRange(const char* pRangeName)
+        {
+            if (!IsInPass())
+            {
+                return true;
+            }
+
+            const bool status = m_stateMachine.PushRange(pRangeName);
+            return status;
+        }
+
+        bool PopRange()
+        {
+            if (!IsInPass())
+            {
+                return true;
+            }
+
+            const bool status = m_stateMachine.PopRange();
+            return status;
+        }
+
+        bool DecodeCounters(DecodeResult& decodeResult)
+        {
+            if (!IsInSession())
+            {
+                NV_PERF_LOG_ERR(10, "must be called in a session\n");
+                return false;
+            }
+
+            const bool status = m_stateMachine.DecodeCounters(decodeResult);
+            return status;
+        }
+
+        bool AllPassesSubmitted() const
+        {
+            const bool allPassesSubmitted = m_stateMachine.AllPassesSubmitted();
+            return allPassesSubmitted;
+        }
+    };
+
+
+}}}
--- a/ruins64k/tools/NvPerfUtility/include/NvPerfRangeProfilerVulkan.h
+++ b/ruins64k/tools/NvPerfUtility/include/NvPerfRangeProfilerVulkan.h
@@ -0,0 +1,574 @@
+/*
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+#pragma once
+
+#include <stdio.h>
+#include <thread>
+#include <vector>
+#include "NvPerfInit.h"
+#include "NvPerfCounterConfiguration.h"
+#include "NvPerfRangeProfiler.h"
+#include "NvPerfVulkan.h"
+
+namespace nv { namespace perf { namespace profiler {
+
+    class RangeProfilerVulkan
+    {
+    protected:
+        struct ProfilerApi : RangeProfilerStateMachine::IProfilerApi
+        {
+            VkQueue queue;
+            VkDevice device;
+            VkCommandPool commandPool;
+            size_t maxQueueRangesPerPass;
+            std::vector<VkCommandBuffer> rangeCommandBuffers;
+            std::vector<VkFence> rangeFences;
+            size_t nextCommandBufferIdx;
+            SessionOptions sessionOptions;
+
+            ProfilerApi()
+                : queue()
+                , device()
+                , commandPool()
+                , maxQueueRangesPerPass(1)
+                , nextCommandBufferIdx()
+                , sessionOptions()
+            {
+            }
+
+            virtual bool CreateCounterData(const SetConfigParams& config, std::vector<uint8_t>& counterDataImage, std::vector<uint8_t>& counterDataScratch) const override
+            {
+                NVPA_Status nvpaStatus;
+
+                NVPW_VK_Profiler_CounterDataImageOptions counterDataImageOptions = { NVPW_VK_Profiler_CounterDataImageOptions_STRUCT_SIZE };
+                counterDataImageOptions.pCounterDataPrefix = config.pCounterDataPrefix;
+                counterDataImageOptions.counterDataPrefixSize = config.counterDataPrefixSize;
+                counterDataImageOptions.maxNumRanges = static_cast<uint32_t>(sessionOptions.maxNumRanges);
+                counterDataImageOptions.maxNumRangeTreeNodes = static_cast<uint32_t>(2 * sessionOptions.maxNumRanges);
+                counterDataImageOptions.maxRangeNameLength = static_cast<uint32_t>(sessionOptions.avgRangeNameLength);
+
+                NVPW_VK_Profiler_CounterDataImage_CalculateSize_Params calculateSizeParams = { NVPW_VK_Profiler_CounterDataImage_CalculateSize_Params_STRUCT_SIZE };
+                calculateSizeParams.pOptions = &counterDataImageOptions;
+                calculateSizeParams.counterDataImageOptionsSize = NVPW_VK_Profiler_CounterDataImageOptions_STRUCT_SIZE;
+                nvpaStatus = NVPW_VK_Profiler_CounterDataImage_CalculateSize(&calculateSizeParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+
+                counterDataImage.resize(calculateSizeParams.counterDataImageSize);
+
+                NVPW_VK_Profiler_CounterDataImage_Initialize_Params initializeParams = { NVPW_VK_Profiler_CounterDataImage_Initialize_Params_STRUCT_SIZE };
+                initializeParams.counterDataImageOptionsSize = NVPW_VK_Profiler_CounterDataImageOptions_STRUCT_SIZE;
+                initializeParams.pOptions = &counterDataImageOptions;
+                initializeParams.counterDataImageSize = calculateSizeParams.counterDataImageSize;
+                initializeParams.pCounterDataImage = &counterDataImage[0];
+                nvpaStatus = NVPW_VK_Profiler_CounterDataImage_Initialize(&initializeParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+
+                NVPW_VK_Profiler_CounterDataImage_CalculateScratchBufferSize_Params scratchBufferSizeParams = { NVPW_VK_Profiler_CounterDataImage_CalculateScratchBufferSize_Params_STRUCT_SIZE };
+                scratchBufferSizeParams.counterDataImageSize = calculateSizeParams.counterDataImageSize;
+                scratchBufferSizeParams.pCounterDataImage = initializeParams.pCounterDataImage;
+                nvpaStatus = NVPW_VK_Profiler_CounterDataImage_CalculateScratchBufferSize(&scratchBufferSizeParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+
+                counterDataScratch.resize(scratchBufferSizeParams.counterDataScratchBufferSize);
+
+                NVPW_VK_Profiler_CounterDataImage_InitializeScratchBuffer_Params initScratchBufferParams = { NVPW_VK_Profiler_CounterDataImage_InitializeScratchBuffer_Params_STRUCT_SIZE };
+                initScratchBufferParams.counterDataImageSize = calculateSizeParams.counterDataImageSize;
+                initScratchBufferParams.pCounterDataImage = initializeParams.pCounterDataImage;
+                initScratchBufferParams.counterDataScratchBufferSize = scratchBufferSizeParams.counterDataScratchBufferSize;
+                initScratchBufferParams.pCounterDataScratchBuffer = &counterDataScratch[0];
+
+                nvpaStatus = NVPW_VK_Profiler_CounterDataImage_InitializeScratchBuffer(&initScratchBufferParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+
+                return true;
+            }
+
+            virtual bool SetConfig(const SetConfigParams& config) const override
+            {
+                NVPW_VK_Profiler_Queue_SetConfig_Params setConfigParams = { NVPW_VK_Profiler_Queue_SetConfig_Params_STRUCT_SIZE };
+                setConfigParams.queue = queue;
+                setConfigParams.pConfig = config.pConfigImage;
+                setConfigParams.configSize = config.configImageSize;
+                setConfigParams.minNestingLevel = 1;
+                setConfigParams.numNestingLevels = config.numNestingLevels;
+                setConfigParams.passIndex = 0;
+                setConfigParams.targetNestingLevel = 1;
+                NVPA_Status nvpaStatus = NVPW_VK_Profiler_Queue_SetConfig(&setConfigParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+
+                return true;
+            }
+
+            virtual bool BeginPass() const override
+            {
+                NVPW_VK_Profiler_Queue_BeginPass_Params beginPassParams = { NVPW_VK_Profiler_Queue_BeginPass_Params_STRUCT_SIZE };
+                beginPassParams.queue = queue;
+                NVPA_Status nvpaStatus = NVPW_VK_Profiler_Queue_BeginPass(&beginPassParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+                return true;
+            }
+
+            virtual bool EndPass() const override
+            {
+                NVPW_VK_Profiler_Queue_EndPass_Params endPassParams = { NVPW_VK_Profiler_Queue_EndPass_Params_STRUCT_SIZE };
+                endPassParams.queue = queue;
+                NVPA_Status nvpaStatus = NVPW_VK_Profiler_Queue_EndPass(&endPassParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+                return true;
+            }
+
+            template <typename Functor>
+            bool SubmitRangeCommandBufferFunctor(Functor&& functor)
+            {
+                VkFence fence = rangeFences[nextCommandBufferIdx];
+                VkResult vkResult = vkWaitForFences(device, 1, &fence, false, 0);
+                if (vkResult == VK_TIMEOUT)
+                {
+                    NV_PERF_LOG_ERR(10, "No more command buffer available for queue level ranges, consider increasing sessionOptions.maxNumRange\n");
+                    return false;
+                }
+
+                if (vkResult)
+                {
+                    NV_PERF_LOG_ERR(10, "vkWaitForFences failed, VkResult = %d\n", vkResult);
+                    return false;
+                }
+
+                VkCommandBuffer commandBuffer = rangeCommandBuffers[nextCommandBufferIdx];
+                ++nextCommandBufferIdx;
+                if (nextCommandBufferIdx >= rangeCommandBuffers.size())
+                {
+                    nextCommandBufferIdx = 0;
+                }
+
+                vkResult = vkResetCommandBuffer(commandBuffer, VK_COMMAND_BUFFER_RESET_RELEASE_RESOURCES_BIT);
+                if (vkResult)
+                {
+                    NV_PERF_LOG_ERR(10, "vkResetCommandBuffer failed, VkResult = %d\n", vkResult);
+                    return false;
+                }
+
+                VkCommandBufferBeginInfo commandBufferBeginInfo = {VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO};
+                vkResult = vkBeginCommandBuffer(commandBuffer, &commandBufferBeginInfo);
+                if (vkResult)
+                {
+                    NV_PERF_LOG_ERR(10, "vkBeginCommandBuffer failed, VkResult = %d\n", vkResult);
+                    return false;
+                }
+                if (!functor(commandBuffer))
+                {
+                    return false;
+                }
+
+                vkResult = vkEndCommandBuffer(commandBuffer);
+                if (vkResult)
+                {
+                    NV_PERF_LOG_ERR(10, "vkEndCommandBuffer failed, VkResult = %d\n", vkResult);
+                    return false;
+                }
+
+                vkResult = vkResetFences(device, 1, &fence);
+                if (vkResult)
+                {
+                    NV_PERF_LOG_ERR(10, "vkResetFences failed, VkResult = %d\n", vkResult);
+                    return false;
+                }
+
+                VkSubmitInfo submitInfo = {VK_STRUCTURE_TYPE_SUBMIT_INFO};
+                submitInfo.commandBufferCount = 1;
+                submitInfo.pCommandBuffers = &commandBuffer;
+                vkResult = vkQueueSubmit(queue, 1, &submitInfo, fence);
+                if (vkResult)
+                {
+                    NV_PERF_LOG_ERR(10, "vkQueueSubmit failed, VkResult = %d\n", vkResult);
+                    return false;
+                }
+                return true;
+            }
+
+            virtual bool PushRange(const char* pRangeName) override
+            {
+                return SubmitRangeCommandBufferFunctor([&](VkCommandBuffer commandBuffer)
+                {
+                    NVPW_VK_Profiler_CommandBuffer_PushRange_Params pushRangeParams = {NVPW_VK_Profiler_CommandBuffer_PushRange_Params_STRUCT_SIZE};
+                    pushRangeParams.commandBuffer = commandBuffer;
+                    pushRangeParams.pRangeName = pRangeName;
+                    NVPA_Status nvpaStatus = NVPW_VK_Profiler_CommandBuffer_PushRange(&pushRangeParams);
+                    if (nvpaStatus)
+                    {
+                        NV_PERF_LOG_ERR(10, "NVPW_VK_Profiler_CommandBuffer_PushRange failed, nvpaStatus = %d\n", nvpaStatus);
+                        return false;
+                    }
+                    return true;
+                });
+            }
+
+            virtual bool PopRange() override
+            {
+                return SubmitRangeCommandBufferFunctor([&](VkCommandBuffer commandBuffer)
+                {
+                    NVPW_VK_Profiler_CommandBuffer_PopRange_Params popRangeParams = {NVPW_VK_Profiler_CommandBuffer_PopRange_Params_STRUCT_SIZE};
+                    popRangeParams.commandBuffer = commandBuffer;
+                    NVPA_Status nvpaStatus = NVPW_VK_Profiler_CommandBuffer_PopRange(&popRangeParams);
+                    if (nvpaStatus)
+                    {
+                        NV_PERF_LOG_ERR(10, "NVPW_VK_Profiler_CommandBuffer_PopRange failed, nvpaStatus = %d\n", nvpaStatus);
+                        return false;
+                    }
+                    return true;
+                });
+            }
+            virtual bool DecodeCounters(std::vector<uint8_t>& counterDataImage, std::vector<uint8_t>& counterDataScratch, bool& onePassDecoded, bool& allPassesDecoded) const
+            {
+                NVPW_VK_Profiler_Queue_DecodeCounters_Params decodeParams = { NVPW_VK_Profiler_Queue_DecodeCounters_Params_STRUCT_SIZE };
+                decodeParams.queue = queue;
+                decodeParams.counterDataImageSize = counterDataImage.size();
+                decodeParams.pCounterDataImage = counterDataImage.data();
+                decodeParams.counterDataScratchBufferSize = counterDataScratch.size();
+                decodeParams.pCounterDataScratchBuffer = counterDataScratch.data();
+                NVPA_Status nvpaStatus = NVPW_VK_Profiler_Queue_DecodeCounters(&decodeParams);
+                if (nvpaStatus)
+                {
+                    return false;
+                }
+
+                onePassDecoded = decodeParams.onePassCollected;
+                allPassesDecoded = decodeParams.allPassesCollected;
+                return true;
+            }
+
+            bool Initialize(VkDevice device_, VkQueue queue_, uint32_t queueFamilyIndex, const SessionOptions& sessionOptions_)
+            {
+                device = device_;
+                queue = queue_;
+                sessionOptions = sessionOptions_;
+
+                VkCommandPoolCreateInfo commandPoolCreateInfo = {VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO};
+                commandPoolCreateInfo.queueFamilyIndex = queueFamilyIndex;
+                commandPoolCreateInfo.flags = VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT;
+                VkResult vkResult = vkCreateCommandPool(device, &commandPoolCreateInfo, nullptr, &commandPool);
+                if (vkResult)
+                {
+                    return false;
+                }
+
+                const size_t maxRangeCommandBuffers = maxQueueRangesPerPass * 2 * sessionOptions.numTraceBuffers;
+                rangeCommandBuffers.resize(maxRangeCommandBuffers);
+                VkCommandBufferAllocateInfo commandBufferAllocateInfo = {VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO};
+                commandBufferAllocateInfo.commandPool = commandPool;
+                commandBufferAllocateInfo.commandBufferCount = (uint32_t)maxRangeCommandBuffers;
+                vkResult = vkAllocateCommandBuffers(device, &commandBufferAllocateInfo, rangeCommandBuffers.data());
+                if (vkResult)
+                {
+                    return false;
+                }
+
+                rangeFences.resize(maxRangeCommandBuffers);
+                VkFenceCreateInfo fenceCreateInfo = {VK_STRUCTURE_TYPE_FENCE_CREATE_INFO};
+                fenceCreateInfo.flags = VK_FENCE_CREATE_SIGNALED_BIT;
+                for (auto& rangeFence : rangeFences)
+                {
+                    vkResult = vkCreateFence(device, &fenceCreateInfo, nullptr, &rangeFence);
+                    if (vkResult)
+                    {
+                        return false;
+                    }
+                }
+
+                return true;
+            }
+
+            void Reset()
+            {
+                NVPW_VK_Profiler_Queue_EndSession_Params endSessionParams = {NVPW_VK_Profiler_Queue_EndSession_Params_STRUCT_SIZE};
+                endSessionParams.queue = queue;
+                endSessionParams.timeout = 0xFFFFFFFF;
+                NVPA_Status nvpaStatus = NVPW_VK_Profiler_Queue_EndSession(&endSessionParams);
+                if (nvpaStatus)
+                {
+                    NV_PERF_LOG_ERR(10, "NVPW_VK_Profiler_Queue_EndSession failed, nvpaStatus = %d\n", nvpaStatus);
+                }
+
+                sessionOptions = {};
+                nextCommandBufferIdx = 0;
+
+                vkFreeCommandBuffers(device, commandPool, (uint32_t)rangeCommandBuffers.size(), rangeCommandBuffers.data());
+                rangeCommandBuffers.clear();
+
+                vkDestroyCommandPool(device, commandPool, nullptr);
+                commandPool = VK_NULL_HANDLE;
+
+                for (auto fence : rangeFences)
+                {
+                    vkDestroyFence(device, fence, nullptr);
+                }
+                queue = VK_NULL_HANDLE;
+                device = VK_NULL_HANDLE;
+            }
+        };
+
+    protected: // members
+        ProfilerApi m_profilerApi;
+        RangeProfilerStateMachine m_stateMachine;
+        std::thread m_spgoThread;
+        volatile bool m_spgoThreadExited;
+
+    private:
+        // non-copyable
+        RangeProfilerVulkan(const RangeProfilerVulkan&);
+
+        static void SpgoThreadProc(RangeProfilerVulkan* pRangeProfiler, VkQueue queue)
+        {
+            // Run continuously in the background, handling all BeginPass and EndPass GPU operations until EndSession().
+            NVPW_VK_Queue_ServicePendingGpuOperations_Params serviceGpuOpsParams = { NVPW_VK_Queue_ServicePendingGpuOperations_Params_STRUCT_SIZE };
+            serviceGpuOpsParams.queue = queue;
+            serviceGpuOpsParams.numOperations = 0; // run until EndSession()
+            serviceGpuOpsParams.timeout = 0xFFFFFFFF;
+            NVPA_Status nvpaStatus = NVPW_VK_Queue_ServicePendingGpuOperations(&serviceGpuOpsParams);
+            if (nvpaStatus)
+            {
+                // TODO: log an error
+            }
+
+            pRangeProfiler->m_spgoThreadExited = true;
+        }
+
+    public:
+        ~RangeProfilerVulkan()
+        {
+        }
+
+        RangeProfilerVulkan()
+            : m_profilerApi()
+            , m_stateMachine(m_profilerApi)
+            , m_spgoThread()
+            , m_spgoThreadExited()
+        {
+        }
+        // TODO: make this move friendly
+
+        bool IsInSession() const
+        {
+            return !!m_profilerApi.queue;
+        }
+
+        bool IsInPass() const
+        {
+            return m_stateMachine.IsInPass();
+        }
+
+        VkQueue GetVkQueue() const
+        {
+            return m_profilerApi.queue;
+        }
+
+        bool SetMaxQueueRangesPerPass(size_t maxQueueRangesPerPass)
+        {
+            if (IsInSession())
+            {
+                NV_PERF_LOG_ERR(10, "SetMaxQueueRangesPerPass must be called before the session starts.\n");
+                return false;
+            }
+            m_profilerApi.maxQueueRangesPerPass = maxQueueRangesPerPass;
+            return true;
+        }
+
+        bool BeginSession(
+            VkInstance instance,
+            VkPhysicalDevice physicalDevice,
+            VkDevice device,
+            VkQueue queue,
+            uint32_t queueFamilyIndex,
+            const SessionOptions& sessionOptions)
+        {
+            if (IsInSession())
+            {
+                NV_PERF_LOG_ERR(10, "already in a session\n");
+                return false;
+            }
+            if (!VulkanIsNvidiaDevice(physicalDevice) || !VulkanIsGpuSupported(instance, physicalDevice, device))
+            {
+                // TODO: error - device is not supported for profiling
+                return false;
+            }
+
+            NVPA_Status nvpaStatus;
+
+            NVPW_VK_Profiler_CalcTraceBufferSize_Params calcTraceBufferSizeParam = { NVPW_VK_Profiler_CalcTraceBufferSize_Params_STRUCT_SIZE };
+            calcTraceBufferSizeParam.maxRangesPerPass = sessionOptions.maxNumRanges;
+            calcTraceBufferSizeParam.avgRangeNameLength = sessionOptions.avgRangeNameLength;
+            nvpaStatus = NVPW_VK_Profiler_CalcTraceBufferSize(&calcTraceBufferSizeParam);
+            if (nvpaStatus)
+            {
+                return false;
+            }
+
+            NVPW_VK_Profiler_Queue_BeginSession_Params beginSessionParams = { NVPW_VK_Profiler_Queue_BeginSession_Params_STRUCT_SIZE };
+            beginSessionParams.instance = instance;
+            beginSessionParams.physicalDevice = physicalDevice;
+            beginSessionParams.device = device;
+            beginSessionParams.queue = queue;
+            beginSessionParams.pfnGetInstanceProcAddr = (void*)vkGetInstanceProcAddr;
+            beginSessionParams.pfnGetDeviceProcAddr = (void*)vkGetDeviceProcAddr;
+            beginSessionParams.numTraceBuffers = sessionOptions.numTraceBuffers;
+            beginSessionParams.traceBufferSize = calcTraceBufferSizeParam.traceBufferSize;
+            beginSessionParams.maxRangesPerPass = sessionOptions.maxNumRanges;
+            beginSessionParams.maxLaunchesPerPass = sessionOptions.maxNumRanges;
+            nvpaStatus = NVPW_VK_Profiler_Queue_BeginSession(&beginSessionParams);
+            if (nvpaStatus)
+            {
+                if (nvpaStatus == NVPA_STATUS_INSUFFICIENT_PRIVILEGE)
+                {
+                    NV_PERF_LOG_ERR(10, "Failed to start profiler session: profiling permissions not enabled.  Please follow these instructions: https://developer.nvidia.com/ERR_NVGPUCTRPERM\n");
+                }
+                else if (nvpaStatus == NVPA_STATUS_INSUFFICIENT_DRIVER_VERSION)
+                {
+                    NV_PERF_LOG_ERR(10, "Failed to start profiler session: insufficient driver version.  Please install the latest NVIDIA driver from https://www.nvidia.com\n");
+                }
+                else
+                {
+                    NV_PERF_LOG_ERR(10, "Failed to start profiler session: unknown error.  It may be a resource conflict - only one profiler session can run at a time per GPU.\n");
+                }
+                return false;
+            }
+
+            m_spgoThreadExited = false;
+            m_spgoThread = std::thread(SpgoThreadProc, this, queue);
+            if(!m_profilerApi.Initialize(device, queue, queueFamilyIndex, sessionOptions))
+            {
+                return false;
+            }
+
+            return true;
+        }
+
+        bool EndSession()
+        {
+            if (!IsInSession())
+            {
+                NV_PERF_LOG_ERR(10, "must be called in a session\n");
+                return false;
+            }
+
+            m_stateMachine.Reset();
+            m_profilerApi.Reset();
+
+            m_spgoThread.join();
+            m_spgoThreadExited = false;
+
+            return true;
+        }
+
+
+        bool EnqueueCounterCollection(const SetConfigParams& config)
+        {
+            const bool status = m_stateMachine.EnqueueCounterCollection(config);
+            return status;
+        }
+
+        bool EnqueueCounterCollection(const CounterConfiguration& configuration, uint16_t numNestingLevels = 1, size_t numStatisticalSamples = 1)
+        {
+            const bool status = m_stateMachine.EnqueueCounterCollection(SetConfigParams(configuration, numNestingLevels, numStatisticalSamples));
+            return status;
+        }
+
+        bool BeginPass()
+        {
+            if (!IsInSession())
+            {
+                NV_PERF_LOG_ERR(10, "must be called in a session\n");
+                return false;
+            }
+
+            const bool status = m_stateMachine.BeginPass();
+            return status;
+        }
+
+        bool EndPass()
+        {
+            if (!IsInSession())
+            {
+                NV_PERF_LOG_ERR(10, "must be called in a session\n");
+                return false;
+            }
+
+            const bool status = m_stateMachine.EndPass();
+            return status;
+        }
+
+        // Convenience method to start a Queue-level range.  For CommandLists, use VulkanRangeCommands::PushRange.
+        bool PushRange(const char* pRangeName)
+        {
+            const bool status = m_stateMachine.PushRange(pRangeName);
+            return status;
+        }
+
+        // Convenience method to end a Queue-level range.  For CommandLists, use VulkanRangeCommands::PopRange.
+        bool PopRange()
+        {
+            const bool status = m_stateMachine.PopRange();
+            return status;
+        }
+
+        bool DecodeCounters(DecodeResult& decodeResult)
+        {
+            if (!IsInSession())
+            {
+                NV_PERF_LOG_ERR(10, "must be called in a session\n");
+                return false;
+            }
+
+            if (m_spgoThreadExited)
+            {
+                NV_PERF_LOG_ERR(10, "the background thread exited; possible hang on subsequent CPU-waiting-on-GPU calls\n");
+                return false;
+            }
+
+            const bool status = m_stateMachine.DecodeCounters(decodeResult);
+            return status;
+        }
+
+        bool AllPassesSubmitted() const
+        {
+            const bool allPassesSubmitted = m_stateMachine.AllPassesSubmitted();
+            return allPassesSubmitted;
+        }
+    };
+
+
+}}}
--- a/ruins64k/tools/NvPerfUtility/include/NvPerfReportDefinition.h
+++ b/ruins64k/tools/NvPerfUtility/include/NvPerfReportDefinition.h
@@ -0,0 +1,34 @@
+/*
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+#pragma once
+#include <stddef.h>
+
+namespace nv { namespace perf {
+
+    struct ReportDefinition
+    {
+        const char* const* ppCounterNames;
+        size_t numCounters;
+        const char* const* ppRatioNames;
+        size_t numRatios;
+        const char* const* ppThroughputNames;
+        size_t numThroughputs;
+
+        const char* pReportHtml;
+    };
+
+} }
--- a/ruins64k/tools/NvPerfUtility/include/NvPerfReportDefinitionGA10X.h
+++ b/ruins64k/tools/NvPerfUtility/include/NvPerfReportDefinitionGA10X.h
--- a/ruins64k/tools/NvPerfUtility/include/NvPerfReportDefinitionGV100.h
+++ b/ruins64k/tools/NvPerfUtility/include/NvPerfReportDefinitionGV100.h
--- a/ruins64k/tools/NvPerfUtility/include/NvPerfReportDefinitionHAL.h
+++ b/ruins64k/tools/NvPerfUtility/include/NvPerfReportDefinitionHAL.h
@@ -0,0 +1,79 @@
+/*
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+#pragma once
+
+#include <string.h>
+#include "NvPerfInit.h"
+#include "NvPerfReportDefinition.h"
+#include "NvPerfReportDefinitionGV100.h"
+#include "NvPerfReportDefinitionTU10X.h"
+#include "NvPerfReportDefinitionTU11X.h"
+#include "NvPerfReportDefinitionGA10X.h"
+
+namespace nv { namespace perf {
+
+    namespace PerRangeReport {
+
+        inline ReportDefinition GetReportDefinition(const char* pChipName)
+        {
+            if (!strcmp(pChipName, "GV100"))
+            {
+                return gv100::PerRangeReport::GetReportDefinition();
+            }
+            else if (!strcmp(pChipName, "TU102") || !strcmp(pChipName, "TU104") || !strcmp(pChipName, "TU106"))
+            {
+                return tu10x::PerRangeReport::GetReportDefinition();
+            }
+            else if (!strcmp(pChipName, "TU116") || !strcmp(pChipName, "TU117"))
+            {
+                return tu11x::PerRangeReport::GetReportDefinition();
+            }
+            else if (!strcmp(pChipName, "GA102") || !strcmp(pChipName, "GA104") || !strcmp(pChipName, "GA106"))
+            {
+                return ga10x::PerRangeReport::GetReportDefinition();
+            }
+            return {};
+        }
+
+    } // namespace PerRangeReport
+
+    namespace SummaryReport {
+
+        inline ReportDefinition GetReportDefinition(const char* pChipName)
+        {
+            if (!strcmp(pChipName, "GV100"))
+            {
+                return gv100::SummaryReport::GetReportDefinition();
+            }
+            else if (!strcmp(pChipName, "TU102") || !strcmp(pChipName, "TU104") || !strcmp(pChipName, "TU106"))
+            {
+                return tu10x::SummaryReport::GetReportDefinition();
+            }
+            else if (!strcmp(pChipName, "TU116") || !strcmp(pChipName, "TU117"))
+            {
+                return tu11x::SummaryReport::GetReportDefinition();
+            }
+            else if (!strcmp(pChipName, "GA102") || !strcmp(pChipName, "GA104") || !strcmp(pChipName, "GA106"))
+            {
+                return ga10x::SummaryReport::GetReportDefinition();
+            }
+            return {};
+        }
+
+    } // namespace SummaryReport
+
+} }
--- a/ruins64k/tools/NvPerfUtility/include/NvPerfReportDefinitionTU10X.h
+++ b/ruins64k/tools/NvPerfUtility/include/NvPerfReportDefinitionTU10X.h
--- a/ruins64k/tools/NvPerfUtility/include/NvPerfReportDefinitionTU11X.h
+++ b/ruins64k/tools/NvPerfUtility/include/NvPerfReportDefinitionTU11X.h
--- a/ruins64k/tools/NvPerfUtility/include/NvPerfReportGenerator.h
+++ b/ruins64k/tools/NvPerfUtility/include/NvPerfReportGenerator.h
--- a/ruins64k/tools/NvPerfUtility/include/NvPerfReportGeneratorD3D11.h
+++ b/ruins64k/tools/NvPerfUtility/include/NvPerfReportGeneratorD3D11.h
@@ -0,0 +1,414 @@
+/*
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+#pragma once
+
+#include "NvPerfReportGenerator.h"
+#include "NvPerfD3D11.h"
+#include "NvPerfRangeProfilerD3D11.h"
+
+namespace nv { namespace perf { namespace profiler {
+
+    class ReportGeneratorD3D11
+    {
+    protected:
+        struct ReportProfiler : ReportGeneratorStateMachine::IReportProfiler
+        {
+            RangeProfilerD3D11 rangeProfiler;
+
+            ReportProfiler()
+                : rangeProfiler()
+            {
+            }
+
+            virtual bool IsInSession() const override
+            {
+                return rangeProfiler.IsInSession();
+            }
+            virtual bool IsInPass() const override
+            {
+                return rangeProfiler.IsInPass();
+            }
+            virtual bool EndSession() override
+            {
+                return rangeProfiler.EndSession();
+            }
+            virtual bool EnqueueCounterCollection(const SetConfigParams& config) override
+            {
+                return rangeProfiler.EnqueueCounterCollection(config);
+            }
+            virtual bool BeginPass() override
+            {
+                return rangeProfiler.BeginPass();
+            }
+            virtual bool EndPass() override
+            {
+                return rangeProfiler.EndPass();
+            }
+            virtual bool PushRange(const char* pRangeName) override
+            {
+                return rangeProfiler.PushRange(pRangeName);
+            }
+            virtual bool PopRange() override
+            {
+                return rangeProfiler.PopRange();
+            }
+            virtual bool DecodeCounters(DecodeResult& decodeResult) override
+            {
+                return rangeProfiler.DecodeCounters(decodeResult);
+            }
+            virtual bool AllPassesSubmitted() const override
+            {
+                return rangeProfiler.AllPassesSubmitted();
+            }
+        };
+
+    protected:
+        ReportProfiler m_reportProfiler;
+        ReportGeneratorStateMachine m_stateMachine;
+
+        // When enabled, OnFrameStart() will check whether its argument's ID3D12Device == m_pDevice.
+        bool m_enableDeviceContextValidation;
+        CComPtr<ID3D11Device> m_pDevice;
+        ReportGeneratorInitStatus m_initStatus;  // the state of InitializeReportGenerator()
+
+    protected:
+        bool BeginSessionWithOptions(ID3D11DeviceContext* pDeviceContext, const SessionOptions* pSessionOptions = nullptr)
+        {
+            SessionOptions sessionOptions = {};
+            sessionOptions.maxNumRanges = ReportGeneratorStateMachine::MaxNumRangesDefault;
+            if (pSessionOptions)
+            {
+                sessionOptions = *pSessionOptions;
+            }
+
+            if (!m_reportProfiler.rangeProfiler.BeginSession(pDeviceContext, sessionOptions))
+            {
+                NV_PERF_LOG_ERR(10, "m_reportProfiler.rangeProfiler.BeginSession failed\n");
+                return false;
+            }
+            return true;
+        }
+
+        bool IsDeviceContextValid(ID3D11DeviceContext* pDeviceContext, const char* pFunctionName) const
+        {
+            if (!m_enableDeviceContextValidation)
+            {
+                return true;  // when validation is disabled, always assume the pDeviceContext is valid
+            }
+
+            if (!m_pDevice)
+            {
+                NV_PERF_LOG_WRN(50, "Cannot validate DeviceContext.  Please call EnableDeviceContextValidation(true) before InitializeReportGenerator().\n");
+                return true;  // allow it to proceed unvalidated
+            }
+
+            CComPtr<ID3D11Device> pDevice;
+            pDeviceContext->GetDevice(&pDevice);
+            if (!pDevice)
+            {
+                NV_PERF_LOG_ERR(10, "pDeviceContext->GetDevice() failed\n");
+                return false;
+            }
+
+            if (!pDevice.IsEqualObject(m_pDevice))
+            {
+                NV_PERF_LOG_ERR(10, "The pDeviceContext passed to %s does not match the ID3D11Device passed to InitializeReportGenerator().\n", pFunctionName);
+                return false;
+            }
+
+            return true;
+        }
+
+    public:
+        DeviceIdentifiers deviceIdentifiers;
+        std::vector<std::string> additionalMetrics;
+
+    public:
+        ~ReportGeneratorD3D11()
+        {
+            Reset();
+        }
+
+        ReportGeneratorD3D11()
+            : m_reportProfiler()
+            , m_stateMachine(m_reportProfiler)
+            , m_enableDeviceContextValidation(true)
+            , m_pDevice()
+            , m_initStatus(ReportGeneratorInitStatus::NeverCalled)
+            , deviceIdentifiers()
+            , additionalMetrics()
+        {
+        }
+
+        ReportGeneratorInitStatus GetInitStatus() const
+        {
+            return m_initStatus;
+        }
+
+        /// Ends all current sessions and frees all internal memory.
+        /// This object may be reused by calling InitializeReportGenerator() again.
+        /// Does not reset deviceIdentifiers.
+        void Reset()
+        {
+            if (m_reportProfiler.rangeProfiler.IsInSession())
+            {
+                const bool endSessionStatus = m_reportProfiler.rangeProfiler.EndSession();
+                if (!endSessionStatus)
+                {
+                    NV_PERF_LOG_ERR(10, "m_reportProfiler.EndSession failed\n");
+                }
+            }
+
+            m_stateMachine.Reset();
+
+            m_pDevice.Release();
+            if (m_initStatus != ReportGeneratorInitStatus::NeverCalled)
+            {
+                m_initStatus = ReportGeneratorInitStatus::Reset;
+            }
+        }
+
+        bool InitializeReportGenerator(ID3D11Device* pDevice)
+        {
+            m_pDevice.Release();
+            m_initStatus = ReportGeneratorInitStatus::Failed;
+
+            // Can this device be profiled by Nsight Perf SDK?
+            if (!nv::perf::D3D11IsNvidiaDevice(pDevice))
+            {
+                NV_PERF_LOG_ERR(10, "%ls is not an NVIDIA Device\n", D3D11GetDeviceName(pDevice).c_str());
+                return false;
+            }
+
+            if (!InitializeNvPerf())
+            {
+                NV_PERF_LOG_ERR(10, "InitializeNvPerf failed\n");
+                return false;
+            }
+
+            if (!nv::perf::D3D11LoadDriver())
+            {
+                NV_PERF_LOG_ERR(10, "Could not load driver\n");
+                return false;
+            }
+
+            if (!nv::perf::profiler::D3D11IsGpuSupported(pDevice))
+            {
+                NV_PERF_LOG_ERR(10, "GPU is not supported\n");
+                return false;
+            }
+
+            deviceIdentifiers = D3D11GetDeviceIdentifiers(pDevice);
+            if (!deviceIdentifiers.pChipName)
+            {
+                NV_PERF_LOG_ERR(10, "Unrecognaized GPU\n");
+                return false;
+            }
+
+            auto createMetricsEvaluator = [&](std::vector<uint8_t>& scratchBuffer) {
+                const size_t scratchBufferSize = nv::perf::D3D11CalculateMetricsEvaluatorScratchBufferSize(deviceIdentifiers.pChipName);
+                if (!scratchBufferSize)
+                {
+                    return (NVPW_MetricsEvaluator*)nullptr;
+                }
+                scratchBuffer.resize(scratchBufferSize);
+                NVPW_MetricsEvaluator* pMetricsEvaluator = nv::perf::D3D11CreateMetricsEvaluator(scratchBuffer.data(), scratchBuffer.size(), deviceIdentifiers.pChipName);
+                return pMetricsEvaluator;
+            };
+            auto createRawMetricsConfig = [&]() {
+                NVPA_RawMetricsConfig* pRawMetricsConfig = nv::perf::profiler::D3D11CreateRawMetricsConfig(deviceIdentifiers.pChipName);
+                return pRawMetricsConfig;
+            };
+
+            if (!m_stateMachine.InitializeReportMetrics(deviceIdentifiers, createMetricsEvaluator, createRawMetricsConfig, additionalMetrics))
+            {
+                NV_PERF_LOG_ERR(100, "m_stateMachine.InitializeReportMetrics failed\n");
+                return false;
+            }
+
+            if (m_enableDeviceContextValidation)
+            {
+                m_pDevice = pDevice;
+            }
+            m_initStatus = ReportGeneratorInitStatus::Succeeded;
+
+            NV_PERF_LOG_INF(50, "Initialization succeeded\n");
+
+            return true;
+        }
+
+        /// Explicitly starts a session.  This allows you to control resource allocation.
+        /// Calling this function is optional; by default, OnFrameStart() will start a session if this isn't called.
+        /// The session must be explicitly ended by calling Reset().
+        /// The pDeviceContext must belong the ID3D11Device passed into InitializeReportGenerator().
+        bool BeginSession(ID3D11DeviceContext* pDeviceContext, const SessionOptions* pSessionOptions = nullptr)
+        {
+            if (m_initStatus != ReportGeneratorInitStatus::Succeeded)
+            {
+                NV_PERF_LOG_WRN(100, "skipping; the state of InitializeReportGenerator() is %s.\n", ToCString(m_initStatus));
+                return false;
+            }
+            if (!IsDeviceContextValid(pDeviceContext, "BeginSession"))
+            {
+                return false;
+            }
+
+            auto beginSessionFn = [&]() {
+                return BeginSessionWithOptions(pDeviceContext, pSessionOptions);
+            };
+            if (!m_stateMachine.OnFrameStart(beginSessionFn))
+            {
+                return false;
+            }
+
+            return true;
+        }
+
+        /// Automatically starts collecting counters after StartCollectionOnNextFrame().
+        /// Call this at the start of each frame.
+        /// The pDeviceContext must belong the ID3D11Device passed into InitializeReportGenerator().
+        bool OnFrameStart(ID3D11DeviceContext* pDeviceContext)
+        {
+            if (m_initStatus != ReportGeneratorInitStatus::Succeeded)
+            {
+                NV_PERF_LOG_WRN(100, "skipping; the state of InitializeReportGenerator() is %s.\n", ToCString(m_initStatus));
+                return false;
+            }
+            if (!IsDeviceContextValid(pDeviceContext, "OnFrameStart"))
+            {
+                return false;
+            }
+
+            auto beginSessionFn = [&]() {
+                return BeginSessionWithOptions(pDeviceContext);
+            };
+            if (!m_stateMachine.OnFrameStart(beginSessionFn))
+            {
+                return false;
+            }
+
+            return true;
+        }
+
+        /// Advances the counter-collection state-machine after rendering.
+        /// Call this at the end of each frame.
+        bool OnFrameEnd()
+        {
+            if (m_initStatus != ReportGeneratorInitStatus::Succeeded)
+            {
+                NV_PERF_LOG_WRN(100, "skipping; the state of InitializeReportGenerator() is %s.\n", ToCString(m_initStatus));
+                return false;
+            }
+
+            if (!m_stateMachine.OnFrameEnd())
+            {
+                return false;
+            }
+            return true;
+        }
+
+        bool PushRange(const char* pRangeName)
+        {
+            if (m_initStatus != ReportGeneratorInitStatus::Succeeded)
+            {
+                NV_PERF_LOG_WRN(100, "skipping; the state of InitializeReportGenerator() is %s.\n", ToCString(m_initStatus));
+                return false;
+            }
+            if (!m_reportProfiler.IsInPass())
+            {
+                NV_PERF_LOG_WRN(100, "skipping; not in a profiler pass");
+                return false;
+            }
+            if (!m_reportProfiler.PushRange(pRangeName))
+            {
+                return false;
+            }
+            return true;
+        }
+
+        bool PopRange()
+        {
+            if (m_initStatus != ReportGeneratorInitStatus::Succeeded)
+            {
+                NV_PERF_LOG_WRN(100, "skipping; the state of InitializeReportGenerator() is %s.\n", ToCString(m_initStatus));
+                return false;
+            }
+            if (!m_reportProfiler.IsInPass())
+            {
+                NV_PERF_LOG_WRN(100, "skipping; not in a profiler pass");
+                return false;
+            }
+            if (!m_reportProfiler.PopRange())
+            {
+                return false;
+            }
+            return true;
+        }
+
+        /// Reports true after StartCollectionOnNextFrame() is called, until the HTML Report has been written to disk.
+        /// This state is cleared by OnFrameEnd().
+        bool IsCollectingReport() const
+        {
+            return m_stateMachine.IsCollectingReport();
+        }
+
+        /// Enqueues report collection, starting on the next frame.
+        bool StartCollectionOnNextFrame(const char* pDirectoryName, AppendDateTime appendDateTime)
+        {
+            if (m_initStatus != ReportGeneratorInitStatus::Succeeded)
+            {
+                NV_PERF_LOG_WRN(100, "skipping; the state of InitializeReportGenerator() is %s.\n", ToCString(m_initStatus));
+                return false;
+            }
+            return m_stateMachine.StartCollectionOnNextFrame(pDirectoryName, appendDateTime);
+        }
+
+        /// Enables a frame-level parent range.
+        /// When enabled (non-NULL, non-empty pRangeName), every frame will have a parent range.
+        /// Pass in NULL or an empty string to disable this behavior.
+        /// The pRangeName string is copied by value, and may be modified or freed after this function returns.
+        void SetFrameLevelRangeName(const char* pRangeName)
+        {
+            m_stateMachine.SetFrameLevelRangeName(pRangeName);
+        }
+
+        /// Retrieves the current frame-level parent range.  An empty string signifies no parent range.
+        const std::string& GetFrameLevelRangeName() const
+        {
+            return m_stateMachine.GetFrameLevelRangeName();
+        }
+
+        /// Sets the number of Push/Pop nesting levels to collect in the report.
+        void SetNumNestingLevels(uint16_t numNestingLevels)
+        {
+            m_stateMachine.SetNumNestingLevels(numNestingLevels);
+        }
+
+        /// Retrieves the number of Push/Pop nesting levels being collected in the report.
+        uint16_t GetNumNestingLevels() const
+        {
+            return m_stateMachine.GetNumNestingLevels();
+        }
+
+        /// When enabled, OnFrameStart() will check whether its argument's ID3D11DeviceContext
+        /// corresponds to the device passed into InitializeReportGenerator().
+        void EnableDeviceContextValidation(bool enable = true)
+        {
+            m_enableDeviceContextValidation = enable;
+        }
+    };
+
+}}}
--- a/ruins64k/tools/NvPerfUtility/include/NvPerfReportGeneratorD3D12.h
+++ b/ruins64k/tools/NvPerfUtility/include/NvPerfReportGeneratorD3D12.h
@@ -0,0 +1,394 @@
+/*
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+#pragma once
+#include "NvPerfReportGenerator.h"
+#include "NvPerfRangeProfilerD3D12.h"
+
+namespace nv { namespace perf { namespace profiler {
+    
+    class ReportGeneratorD3D12
+    {
+    protected:
+        struct ReportProfiler : public ReportGeneratorStateMachine::IReportProfiler
+        {
+            RangeProfilerD3D12 rangeProfiler;
+
+            ReportProfiler()
+                : rangeProfiler()
+            {
+            }
+
+            virtual bool IsInSession() const override
+            {
+                return rangeProfiler.IsInSession();
+            }
+            virtual bool IsInPass() const override
+            {
+                return rangeProfiler.IsInPass();
+            }
+            virtual bool EndSession() override
+            {
+                return rangeProfiler.EndSession();
+            }
+            virtual bool EnqueueCounterCollection(const SetConfigParams& config) override
+            {
+                return rangeProfiler.EnqueueCounterCollection(config);
+            }
+            virtual bool BeginPass() override
+            {
+                return rangeProfiler.BeginPass();
+            }
+            virtual bool EndPass() override
+            {
+                return rangeProfiler.EndPass();
+            }
+            virtual bool PushRange(const char* pRangeName) override
+            {
+                return rangeProfiler.PushRange(pRangeName);
+            }
+            virtual bool PopRange() override
+            {
+                return rangeProfiler.PopRange();
+            }
+            virtual bool DecodeCounters(DecodeResult& decodeResult) override
+            {
+                return rangeProfiler.DecodeCounters(decodeResult);
+            }
+            virtual bool AllPassesSubmitted() const override
+            {
+                return rangeProfiler.AllPassesSubmitted();
+            }
+        };
+
+    protected:
+        ReportProfiler m_reportProfiler;
+        ReportGeneratorStateMachine m_stateMachine;
+
+        // When enabled, OnFrameStart() will check whether its argument's ID3D12Device == m_pDevice.
+        bool m_enableCommandQueueValidation;
+        CComPtr<ID3D12Device> m_pDevice;
+        ReportGeneratorInitStatus m_initStatus;  // the state of InitializeReportGenerator()
+
+    protected:
+        bool BeginSessionWithOptions(ID3D12CommandQueue* pCommandQueue, const SessionOptions* pSessionOptions = nullptr)
+        {
+            SessionOptions sessionOptions = {};
+            sessionOptions.maxNumRanges = ReportGeneratorStateMachine::MaxNumRangesDefault;
+            if (pSessionOptions)
+            {
+                sessionOptions = *pSessionOptions;
+            }
+
+            if (!m_reportProfiler.rangeProfiler.BeginSession(pCommandQueue, sessionOptions))
+            {
+                NV_PERF_LOG_ERR(10, "m_reportProfiler.rangeProfiler.BeginSession failed\n");
+                return false;
+            }
+
+            return true;
+        }
+
+        bool IsCommandQueueValid(ID3D12CommandQueue* pCommandQueue, const char* pFunctionName) const
+        {
+            if (!m_enableCommandQueueValidation)
+            {
+                return true;  // when validation is disabled, always assume the CommandQueue is valid
+            }
+
+            if (!m_pDevice)
+            {
+                NV_PERF_LOG_WRN(50, "Cannot validate CommandQueue.  Please call EnableCommandQueueValidation(true) before InitializeReportGenerator().\n");
+                return true;  // allow it to proceed unvalidated
+            }
+
+            CComPtr<ID3D12Device> pDevice;
+            HRESULT hr = pCommandQueue->GetDevice(IID_PPV_ARGS(&pDevice));
+            if (FAILED(hr) || !pDevice)
+            {
+                NV_PERF_LOG_ERR(10, "pCommandQueue->GetDevice() failed\n");
+                return false;
+            }
+
+            if (!pDevice.IsEqualObject(m_pDevice))
+            {
+                NV_PERF_LOG_ERR(10, "The pCommandQueue passed to %s does not match the ID3D12Device passed to InitializeReportGenerator().\n", pFunctionName);
+                return false;
+            }
+
+            return true;
+        }
+
+    public:
+        /// RangeCommands is safe to use on any CommandList belonging to the ID3D12Device used for initialization.
+        /// RangeCommands perform no operation when called on unsupported or non-NVIDIA devices.
+        D3D12RangeCommands rangeCommands;
+        /// NVIDIA device identifiers.
+        DeviceIdentifiers deviceIdentifiers;
+        std::vector<std::string> additionalMetrics;
+
+    public:
+        ~ReportGeneratorD3D12()
+        {
+            Reset();
+        }
+
+        ReportGeneratorD3D12()
+            : m_reportProfiler()
+            , m_stateMachine(m_reportProfiler)
+            , m_enableCommandQueueValidation(true)
+            , m_pDevice()
+            , m_initStatus(ReportGeneratorInitStatus::NeverCalled)
+            , rangeCommands()
+            , deviceIdentifiers()
+            , additionalMetrics()
+        {
+        }
+
+        ReportGeneratorInitStatus GetInitStatus() const
+        {
+            return m_initStatus;
+        }
+
+        /// Ends all current sessions and frees all internal memory.
+        /// This object may be reused by calling InitializeReportGenerator() again.
+        /// Does not reset rangeCommands and deviceIdentifiers.
+        void Reset()
+        {
+            if (m_reportProfiler.rangeProfiler.IsInSession())
+            {
+                const bool endSessionStatus = m_reportProfiler.rangeProfiler.EndSession();
+                if (!endSessionStatus)
+                {
+                    NV_PERF_LOG_ERR(10, "m_reportProfiler.EndSession failed\n");
+                }
+            }
+
+            m_stateMachine.Reset();
+
+            m_pDevice.Release();
+            if (m_initStatus != ReportGeneratorInitStatus::NeverCalled)
+            {
+                m_initStatus = ReportGeneratorInitStatus::Reset;
+            }
+        }
+
+        /// Initialize this object on the provided ID3D12Device.
+        bool InitializeReportGenerator(ID3D12Device* pDevice)
+        {
+            // Do this first, in case this object is re-initialized on a different device.
+            rangeCommands.Initialize(pDevice);
+
+            m_pDevice.Release();
+            m_initStatus = ReportGeneratorInitStatus::Failed;
+
+            // Can this device be profiled by Nsight Perf SDK?
+            if (!nv::perf::D3D12IsNvidiaDevice(pDevice))
+            {
+                NV_PERF_LOG_ERR(10, "%ls is not an NVIDIA Device\n", D3D12GetDeviceName(pDevice).c_str());
+                return false;
+            }
+
+            if (!InitializeNvPerf())
+            {
+                NV_PERF_LOG_ERR(10, "InitializeNvPerf failed\n");
+                return false;
+            }
+
+            if (!nv::perf::D3D12LoadDriver())
+            {
+                NV_PERF_LOG_ERR(10, "Could not load driver\n");
+                return false;
+            }
+
+            if (!nv::perf::profiler::D3D12IsGpuSupported(pDevice))
+            {
+                NV_PERF_LOG_ERR(10, "GPU is not supported\n");
+                return false;
+            }
+
+            deviceIdentifiers = D3D12GetDeviceIdentifiers(pDevice);
+            if (!deviceIdentifiers.pChipName)
+            {
+                NV_PERF_LOG_ERR(10, "Unrecognaized GPU\n");
+                return false;
+            }
+
+            auto createMetricsEvaluator = [&](std::vector<uint8_t>& scratchBuffer) {
+                const size_t scratchBufferSize = nv::perf::D3D12CalculateMetricsEvaluatorScratchBufferSize(deviceIdentifiers.pChipName);
+                if (!scratchBufferSize)
+                {
+                    return (NVPW_MetricsEvaluator*)nullptr;
+                }
+                scratchBuffer.resize(scratchBufferSize);
+                NVPW_MetricsEvaluator* pMetricsEvaluator = nv::perf::D3D12CreateMetricsEvaluator(scratchBuffer.data(), scratchBuffer.size(), deviceIdentifiers.pChipName);
+                return pMetricsEvaluator;
+            };
+            auto createRawMetricsConfig = [&]() {
+                NVPA_RawMetricsConfig* pRawMetricsConfig = nv::perf::profiler::D3D12CreateRawMetricsConfig(deviceIdentifiers.pChipName);
+                return pRawMetricsConfig;
+            };
+            if (!m_stateMachine.InitializeReportMetrics(deviceIdentifiers, createMetricsEvaluator, createRawMetricsConfig, additionalMetrics))
+            {
+                NV_PERF_LOG_ERR(100, "m_stateMachine.InitializeReportMetrics failed\n");
+                return false;
+            }
+
+            if (m_enableCommandQueueValidation)
+            {
+                m_pDevice = pDevice;
+            }
+            m_initStatus = ReportGeneratorInitStatus::Succeeded;
+
+            NV_PERF_LOG_INF(50, "Initialization succeeded\n");
+
+            return true;
+        }
+
+        /// Explicitly starts a session.  This allows you to control resource allocation.
+        /// Calling this function is optional; by default, OnFrameStart() will start a session if this isn't called.
+        /// The session must be explicitly ended by calling Reset().
+        /// The pCommandQueue must belong the ID3D12Device passed into InitializeReportGenerator().
+        bool BeginSession(ID3D12CommandQueue* pCommandQueue, const SessionOptions* pSessionOptions = nullptr)
+        {
+            if (m_initStatus != ReportGeneratorInitStatus::Succeeded)
+            {
+                NV_PERF_LOG_WRN(100, "skipping; the state of InitializeReportGenerator() is %s.\n", ToCString(m_initStatus));
+                return false;
+            }
+            if (!IsCommandQueueValid(pCommandQueue, "BeginSession"))
+            {
+                return false;
+            }
+
+            auto beginSessionFn = [&]() {
+                return BeginSessionWithOptions(pCommandQueue, pSessionOptions);
+            };
+            if (!m_stateMachine.BeginSession(beginSessionFn))
+            {
+                return false;
+            }
+            return true;
+        }
+
+        /// Automatically starts collecting counters after StartCollectionOnNextFrame().
+        /// Call this at the start of each frame.
+        /// The pCommandQueue must belong the ID3D12Device passed into InitializeReportGenerator().
+        bool OnFrameStart(ID3D12CommandQueue* pCommandQueue)
+        {
+            if (m_initStatus != ReportGeneratorInitStatus::Succeeded)
+            {
+                NV_PERF_LOG_WRN(100, "skipping; the state of InitializeReportGenerator() is %s.\n", ToCString(m_initStatus));
+                return false;
+            }
+            if (!IsCommandQueueValid(pCommandQueue, "OnFrameStart"))
+            {
+                return false;
+            }
+
+            auto beginSessionFn = [&]() {
+                return BeginSessionWithOptions(pCommandQueue);
+            };
+            if (!m_stateMachine.OnFrameStart(beginSessionFn))
+            {
+                return false;
+            }
+            return true;
+        }
+
+        /// Advances the counter-collection state-machine after rendering.
+        /// Call this at the end of each frame.
+        bool OnFrameEnd()
+        {
+            if (m_initStatus != ReportGeneratorInitStatus::Succeeded)
+            {
+                NV_PERF_LOG_WRN(100, "skipping; the state of InitializeReportGenerator() is %s.\n", ToCString(m_initStatus));
+                return false;
+            }
+
+            if (!m_stateMachine.OnFrameEnd())
+            {
+                return false;
+            }
+            return true;
+        }
+
+        /// Reports true after StartCollectionOnNextFrame() is called, until the HTML Report has been written to disk.
+        /// This state is cleared by OnFrameEnd().
+        bool IsCollectingReport() const
+        {
+            return m_stateMachine.IsCollectingReport();
+        }
+
+        const std::string& GetReportDirectoryName() const
+        {
+            return m_stateMachine.GetReportDirectoryName();
+        }
+
+        /// Enqueues report collection, starting on the next frame.
+        bool StartCollectionOnNextFrame(const char* pDirectoryName, AppendDateTime appendDateTime)
+        {
+            if (m_initStatus != ReportGeneratorInitStatus::Succeeded)
+            {
+                NV_PERF_LOG_WRN(100, "skipping; the state of InitializeReportGenerator() is %s.\n", ToCString(m_initStatus));
+                return false;
+            }
+            return m_stateMachine.StartCollectionOnNextFrame(pDirectoryName, appendDateTime);
+        }
+
+        /// Enables a frame-level parent range.
+        /// When enabled (non-NULL, non-empty pRangeName), every frame will have a parent range.
+        /// This is also convenient for programs that have no CommandList-level ranges.
+        /// Pass in NULL or an empty string to disable this behavior.
+        /// The pRangeName string is copied by value, and may be modified or freed after this function returns.
+        void SetFrameLevelRangeName(const char* pRangeName)
+        {
+            m_stateMachine.SetFrameLevelRangeName(pRangeName);
+        }
+
+        /// Retrieves the current frame-level parent range.  An empty string signifies no parent range.
+        const std::string& GetFrameLevelRangeName() const
+        {
+            return m_stateMachine.GetFrameLevelRangeName();
+        }
+
+        /// Sets the number of Push/Pop nesting levels to collect in the report.
+        void SetNumNestingLevels(uint16_t numNestingLevels)
+        {
+            m_stateMachine.SetNumNestingLevels(numNestingLevels);
+        }
+
+        /// Retrieves the number of Push/Pop nesting levels being collected in the report.
+        uint16_t GetNumNestingLevels() const
+        {
+            return m_stateMachine.GetNumNestingLevels();
+        }
+
+        /// Open the report directory in file browser after perf data collection.
+        /// The default behavor is false, and can be changed by enviroment variable NV_PERF_OPEN_REPORT_DIR_AFTER_COLLECTION.
+        void SetOpenReportDirectoryAfterCollection(bool openReportDirectoryAfterCollection)
+        {
+            m_stateMachine.SetOpenReportDirectoryAfterCollection(openReportDirectoryAfterCollection);
+        }
+
+        /// When enabled, OnFrameStart() will check whether its argument's ID3D12Device
+        /// corresponds to the device passed into InitializeReportGenerator().
+        void EnableCommandQueueValidation(bool enable = true)
+        {
+            m_enableCommandQueueValidation = enable;
+        }
+    };
+
+}}}
--- a/ruins64k/tools/NvPerfUtility/include/NvPerfReportGeneratorOpenGL.h
+++ b/ruins64k/tools/NvPerfUtility/include/NvPerfReportGeneratorOpenGL.h
@@ -0,0 +1,367 @@
+/*
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+#pragma once
+#include "NvPerfReportGenerator.h"
+#include "NvPerfRangeProfilerOpenGL.h"
+
+namespace nv { namespace perf { namespace profiler {
+
+    class ReportGeneratorOpenGL
+    {
+    protected:
+        struct ReportProfiler : public ReportGeneratorStateMachine::IReportProfiler
+        {
+            RangeProfilerOpenGL rangeProfiler;
+
+            ReportProfiler()
+                : rangeProfiler()
+            {
+            }
+
+            virtual bool IsInSession() const override
+            {
+                return rangeProfiler.IsInSession();
+            }
+            virtual bool IsInPass() const override
+            {
+                return rangeProfiler.IsInPass();
+            }
+            virtual bool EndSession() override
+            {
+                return rangeProfiler.EndSession();
+            }
+            virtual bool EnqueueCounterCollection(const SetConfigParams& config) override
+            {
+                return rangeProfiler.EnqueueCounterCollection(config);
+            }
+            virtual bool BeginPass() override
+            {
+                return rangeProfiler.BeginPass();
+            }
+            virtual bool EndPass() override
+            {
+                return rangeProfiler.EndPass();
+            }
+            virtual bool PushRange(const char* pRangeName) override
+            {
+                return rangeProfiler.PushRange(pRangeName);
+            }
+            virtual bool PopRange() override
+            {
+                return rangeProfiler.PopRange();
+            }
+            virtual bool DecodeCounters(DecodeResult& decodeResult) override
+            {
+                return rangeProfiler.DecodeCounters(decodeResult);
+            }
+            virtual bool AllPassesSubmitted() const override
+            {
+                return rangeProfiler.AllPassesSubmitted();
+            }
+        };
+
+    protected:
+        ReportProfiler m_reportProfiler;
+        ReportGeneratorStateMachine m_stateMachine;
+
+        // OpenGL device state, set at initialize
+        ReportGeneratorInitStatus m_initStatus;  // the state of InitializeReportGenerator()
+
+    protected:
+        bool BeginSessionWithOptions(
+            const SessionOptions* pSessionOptions = nullptr)
+        {
+            SessionOptions sessionOptions = {};
+            sessionOptions.maxNumRanges = ReportGeneratorStateMachine::MaxNumRangesDefault;
+            if (pSessionOptions)
+            {
+                sessionOptions = *pSessionOptions;
+            }
+
+            if (!m_reportProfiler.rangeProfiler.BeginSession(sessionOptions))
+            {
+                NV_PERF_LOG_ERR(10, "m_reportProfiler.rangeProfiler.BeginSession failed\n");
+                return false;
+            }
+
+            return true;
+        }
+
+    public:
+        DeviceIdentifiers deviceIdentifiers;
+        std::vector<std::string> additionalMetrics;
+
+    public:
+        ~ReportGeneratorOpenGL()
+        {
+            Reset();
+        }
+
+        ReportGeneratorOpenGL()
+            : m_reportProfiler()
+            , m_stateMachine(m_reportProfiler)
+            , m_initStatus(ReportGeneratorInitStatus::NeverCalled)
+            , deviceIdentifiers()
+            , additionalMetrics()
+        {
+        }
+
+        ReportGeneratorInitStatus GetInitStatus() const
+        {
+            return m_initStatus;
+        }
+
+        /// Ends all current sessions and frees all internal memory.
+        /// This object may be reused by calling InitializeReportGenerator() again.
+        /// Does not reset rangeCommands and deviceIdentifiers.
+        void Reset()
+        {
+            if (m_reportProfiler.rangeProfiler.IsInSession())
+            {
+                const bool endSessionStatus = m_reportProfiler.rangeProfiler.EndSession();
+                if (!endSessionStatus)
+                {
+                    NV_PERF_LOG_ERR(10, "m_reportProfiler.EndSession failed\n");
+                }
+            }
+
+            m_stateMachine.Reset();
+
+            if (m_initStatus != ReportGeneratorInitStatus::NeverCalled)
+            {
+                m_initStatus = ReportGeneratorInitStatus::Reset;
+            }
+        }
+
+        /// Initialize this object on the provided current context.
+        bool InitializeReportGenerator()
+        {
+            m_initStatus = ReportGeneratorInitStatus::Failed;
+
+            // Can this device be profiled by Nsight Perf SDK?
+            if (!nv::perf::OpenGLIsNvidiaDevice())
+            {
+                NV_PERF_LOG_ERR(10, "%s is not an NVIDIA Device\n", OpenGLGetDeviceName().c_str());
+                return false;
+            }
+
+            if (!InitializeNvPerf())
+            {
+                NV_PERF_LOG_ERR(10, "InitializeNvPerf failed\n");
+                return false;
+            }
+
+            if (!nv::perf::OpenGLLoadDriver())
+            {
+                NV_PERF_LOG_ERR(10, "Could not load driver\n");
+                return false;
+            }
+
+            if (!nv::perf::profiler::OpenGLIsGpuSupported())
+            {
+                NV_PERF_LOG_ERR(10, "GPU is not supported\n");
+                return false;
+            }
+
+            deviceIdentifiers = OpenGLGetDeviceIdentifiers();
+            if (!deviceIdentifiers.pChipName)
+            {
+                NV_PERF_LOG_ERR(10, "Unrecognized GPU\n");
+                return false;
+            }
+
+            auto createMetricsEvaluator = [&](std::vector<uint8_t>& scratchBuffer) {
+                const size_t scratchBufferSize = nv::perf::OpenGLCalculateMetricsEvaluatorScratchBufferSize(deviceIdentifiers.pChipName);
+                if (!scratchBufferSize)
+                {
+                    return (NVPW_MetricsEvaluator*)nullptr;
+                }
+                scratchBuffer.resize(scratchBufferSize);
+                NVPW_MetricsEvaluator* pMetricsEvaluator = nv::perf::OpenGLCreateMetricsEvaluator(scratchBuffer.data(), scratchBuffer.size(), deviceIdentifiers.pChipName);
+                return pMetricsEvaluator;
+            };
+            auto createRawMetricsConfig = [&]() {
+                NVPA_RawMetricsConfig* pRawMetricsConfig = nv::perf::profiler::OpenGLCreateRawMetricsConfig(deviceIdentifiers.pChipName);
+                return pRawMetricsConfig;
+            };
+            if (!m_stateMachine.InitializeReportMetrics(deviceIdentifiers, createMetricsEvaluator, createRawMetricsConfig, additionalMetrics))
+            {
+                NV_PERF_LOG_ERR(10, "m_stateMachine.InitializeReportMetrics failed\n");
+                return false;
+            }
+
+            m_initStatus = ReportGeneratorInitStatus::Succeeded;
+
+            NV_PERF_LOG_INF(50, "Initialization succeeded\n");
+
+            return true;
+        }
+
+        /// Explicitly starts a session.  This allows you to control resource allocation.
+        /// Calling this function is optional; by default, OnFrameStart() will start a session if this isn't called.
+        /// The session must be explicitly ended by calling Reset().
+        bool BeginSession(const SessionOptions* pSessionOptions = nullptr)
+        {
+            if (m_initStatus != ReportGeneratorInitStatus::Succeeded)
+            {
+                NV_PERF_LOG_WRN(100, "skipping; the state of InitializeReportGenerator() is %s.\n", ToCString(m_initStatus));
+                return false;
+            }
+
+            auto beginSessionFn = [&]() {
+                return BeginSessionWithOptions(pSessionOptions);
+            };
+            if (!m_stateMachine.BeginSession(beginSessionFn))
+            {
+                return false;
+            }
+            return true;
+        }
+
+        /// Automatically starts collecting counters for a report, after StartCollectionOnNextFrame().
+        /// Call this at the start of each frame.
+        bool OnFrameStart()
+        {
+            if (m_initStatus != ReportGeneratorInitStatus::Succeeded)
+            {
+                NV_PERF_LOG_WRN(100, "skipping; the state of InitializeReportGenerator() is %s.\n", ToCString(m_initStatus));
+                return false;
+            }
+
+            auto beginSessionFn = [&]() {
+                return BeginSessionWithOptions();
+            };
+            if (!m_stateMachine.OnFrameStart(beginSessionFn))
+            {
+                return false;
+            }
+            return true;
+        }
+
+        /// Advances the counter-collection state-machine after rendering.
+        /// Call this at the end of each frame.
+        bool OnFrameEnd()
+        {
+            if (m_initStatus != ReportGeneratorInitStatus::Succeeded)
+            {
+                NV_PERF_LOG_WRN(100, "skipping; the state of InitializeReportGenerator() is %s.\n", ToCString(m_initStatus));
+                return false;
+            }
+
+            if (!m_stateMachine.OnFrameEnd())
+            {
+                return false;
+            }
+            return true;
+        }
+
+        bool PushRange(const char* pRangeName)
+        {
+            if (m_initStatus != ReportGeneratorInitStatus::Succeeded)
+            {
+                NV_PERF_LOG_WRN(100, "skipping; the state of InitializeReportGenerator() is %s.\n", ToCString(m_initStatus));
+                return false;
+            }
+            if (!m_reportProfiler.IsInPass())
+            {
+                NV_PERF_LOG_WRN(100, "skipping; not in a profiler pass");
+                return false;
+            }
+            if (!m_reportProfiler.PushRange(pRangeName))
+            {
+                return false;
+            }
+            return true;
+        }
+
+        bool PopRange()
+        {
+            if (m_initStatus != ReportGeneratorInitStatus::Succeeded)
+            {
+                NV_PERF_LOG_WRN(100, "skipping; the state of InitializeReportGenerator() is %s.\n", ToCString(m_initStatus));
+                return false;
+            }
+            if (!m_reportProfiler.IsInPass())
+            {
+                NV_PERF_LOG_WRN(100, "skipping; not in a profiler pass");
+                return false;
+            }
+            if (!m_reportProfiler.PopRange())
+            {
+                return false;
+            }
+            return true;
+        }
+
+        /// Reports true after StartCollectionOnNextFrame() is called, until the HTML Report has been written to disk.
+        /// This state is cleared by OnFrameEnd().
+        bool IsCollectingReport() const
+        {
+            return m_stateMachine.IsCollectingReport();
+        }
+
+        const std::string& GetReportDirectoryName() const
+        {
+            return m_stateMachine.GetReportDirectoryName();
+        }
+
+        /// Enqueues report collection, starting on the next frame.
+        bool StartCollectionOnNextFrame(const char* pDirectoryName, AppendDateTime appendDateTime)
+        {
+            if (m_initStatus != ReportGeneratorInitStatus::Succeeded)
+            {
+                NV_PERF_LOG_WRN(100, "skipping; the state of InitializeReportGenerator() is %s.\n", ToCString(m_initStatus));
+                return false;
+            }
+            return m_stateMachine.StartCollectionOnNextFrame(pDirectoryName, appendDateTime);
+        }
+
+        /// Enables a frame-level parent range.
+        /// When enabled (non-NULL, non-empty pRangeName), every frame will have a parent range.
+        /// This is also convenient for programs that have no CommandList-level ranges.
+        /// Pass in NULL or an empty string to disable this behavior.
+        /// The pRangeName string is copied by value, and may be modified or freed after this function returns.
+        void SetFrameLevelRangeName(const char* pRangeName)
+        {
+            m_stateMachine.SetFrameLevelRangeName(pRangeName);
+        }
+
+        /// Retrieves the current frame-level parent range.  An empty string signifies no parent range.
+        const std::string& GetFrameLevelRangeName() const
+        {
+            return m_stateMachine.GetFrameLevelRangeName();
+        }
+
+        /// Sets the number of Push/Pop nesting levels to collect in the report.
+        void SetNumNestingLevels(uint16_t numNestingLevels)
+        {
+            m_stateMachine.SetNumNestingLevels(numNestingLevels);
+        }
+
+        /// Retrieves the number of Push/Pop nesting levels being collected in the report.
+        uint16_t GetNumNestingLevels() const
+        {
+            return m_stateMachine.GetNumNestingLevels();
+        }
+
+        /// Open the report directory in file browser after perf data collection.
+        /// The default behavor is false, and can be changed by enviroment variable NV_PERF_OPEN_REPORT_DIR_AFTER_COLLECTION.
+        void SetOpenReportDirectoryAfterCollection(bool openReportDirectoryAfterCollection)
+        {
+            m_stateMachine.SetOpenReportDirectoryAfterCollection(openReportDirectoryAfterCollection);
+        }
+    };
+}}}
--- a/ruins64k/tools/NvPerfUtility/include/NvPerfReportGeneratorVulkan.h
+++ b/ruins64k/tools/NvPerfUtility/include/NvPerfReportGeneratorVulkan.h
@@ -0,0 +1,358 @@
+/*
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+#pragma once
+#include "NvPerfReportGenerator.h"
+#include "NvPerfRangeProfilerVulkan.h"
+
+namespace nv { namespace perf { namespace profiler {
+
+    class ReportGeneratorVulkan
+    {
+    protected:
+        struct ReportProfiler : public ReportGeneratorStateMachine::IReportProfiler
+        {
+            RangeProfilerVulkan rangeProfiler;
+
+            ReportProfiler()
+                : rangeProfiler()
+            {
+            }
+
+            virtual bool IsInSession() const override
+            {
+                return rangeProfiler.IsInSession();
+            }
+            virtual bool IsInPass() const override
+            {
+                return rangeProfiler.IsInPass();
+            }
+            virtual bool EndSession() override
+            {
+                return rangeProfiler.EndSession();
+            }
+            virtual bool EnqueueCounterCollection(const SetConfigParams& config) override
+            {
+                return rangeProfiler.EnqueueCounterCollection(config);
+            }
+            virtual bool BeginPass() override
+            {
+                return rangeProfiler.BeginPass();
+            }
+            virtual bool EndPass() override
+            {
+                return rangeProfiler.EndPass();
+            }
+            virtual bool PushRange(const char* pRangeName) override
+            {
+                return rangeProfiler.PushRange(pRangeName);
+            }
+            virtual bool PopRange() override
+            {
+                return rangeProfiler.PopRange();
+            }
+            virtual bool DecodeCounters(DecodeResult& decodeResult) override
+            {
+                return rangeProfiler.DecodeCounters(decodeResult);
+            }
+            virtual bool AllPassesSubmitted() const override
+            {
+                return rangeProfiler.AllPassesSubmitted();
+            }
+        };
+
+    protected:
+        ReportProfiler m_reportProfiler;
+        ReportGeneratorStateMachine m_stateMachine;
+
+        // vulkan device state, set at initialize
+        VkInstance m_instance;
+        VkPhysicalDevice m_physicalDevice;
+        VkDevice m_device;
+        ReportGeneratorInitStatus m_initStatus;  // the state of InitializeReportGenerator()
+
+    protected:
+        bool BeginSessionWithOptions(
+            VkInstance instance,
+            VkPhysicalDevice physicalDevice,
+            VkDevice device,
+            VkQueue queue,
+            uint32_t queueFamilyIndex,
+            const SessionOptions* pSessionOptions = nullptr)
+        {
+            SessionOptions sessionOptions = {};
+            sessionOptions.maxNumRanges = ReportGeneratorStateMachine::MaxNumRangesDefault;
+            if (pSessionOptions)
+            {
+                sessionOptions = *pSessionOptions;
+            }
+
+            if (!m_reportProfiler.rangeProfiler.BeginSession(instance, physicalDevice, device, queue, queueFamilyIndex, sessionOptions))
+            {
+                NV_PERF_LOG_ERR(10, "m_reportProfiler.rangeProfiler.BeginSession failed\n");
+                return false;
+            }
+
+            return true;
+        }
+
+    public:
+        /// RangeCommands is safe to use on any CommandBuffer belonging to the VkDevice used for initialization.
+        /// RangeCommands perform no operation when called on unsupported or non-NVIDIA devices.
+        VulkanRangeCommands rangeCommands;
+        DeviceIdentifiers deviceIdentifiers;
+        std::vector<std::string> additionalMetrics;
+
+    public:
+        ~ReportGeneratorVulkan()
+        {
+            Reset();
+        }
+
+        ReportGeneratorVulkan()
+            : m_reportProfiler()
+            , m_stateMachine(m_reportProfiler)
+            , m_instance(VK_NULL_HANDLE)
+            , m_physicalDevice(VK_NULL_HANDLE)
+            , m_device(VK_NULL_HANDLE)
+            , m_initStatus(ReportGeneratorInitStatus::NeverCalled)
+            , rangeCommands()
+            , deviceIdentifiers()
+            , additionalMetrics()
+        {
+        }
+
+        ReportGeneratorInitStatus GetInitStatus() const
+        {
+            return m_initStatus;
+        }
+
+        /// Ends all current sessions and frees all internal memory.
+        /// This object may be reused by calling InitializeReportGenerator() again.
+        /// Does not reset rangeCommands and deviceIdentifiers.
+        void Reset()
+        {
+            if (m_reportProfiler.rangeProfiler.IsInSession())
+            {
+                const bool endSessionStatus = m_reportProfiler.rangeProfiler.EndSession();
+                if (!endSessionStatus)
+                {
+                    NV_PERF_LOG_ERR(10, "m_reportProfiler.EndSession failed\n");
+                }
+            }
+
+            m_stateMachine.Reset();
+
+            m_device = VK_NULL_HANDLE;
+            m_physicalDevice = VK_NULL_HANDLE;
+            m_instance = VK_NULL_HANDLE;
+            if (m_initStatus != ReportGeneratorInitStatus::NeverCalled)
+            {
+                m_initStatus = ReportGeneratorInitStatus::Reset;
+            }
+        }
+
+        /// Initialize this object on the provided VkDevice.
+        bool InitializeReportGenerator(VkInstance instance, VkPhysicalDevice physicalDevice, VkDevice device)
+        {
+            // Do this first, in case this object was previously initialized on an NVIDIA device, and is now re-initialized on non-NVIDIA.
+            rangeCommands.Initialize(physicalDevice);
+
+            m_instance = VK_NULL_HANDLE;
+            m_physicalDevice = VK_NULL_HANDLE;
+            m_device = VK_NULL_HANDLE;
+            m_initStatus = ReportGeneratorInitStatus::Failed;
+
+            // Can this device be profiled by Nsight Perf SDK?
+            if (!nv::perf::VulkanIsNvidiaDevice(physicalDevice))
+            {
+                NV_PERF_LOG_ERR(10, "%ls is not an NVIDIA Device\n", VulkanGetDeviceName(physicalDevice).c_str());
+                return false;
+            }
+
+            if (!InitializeNvPerf())
+            {
+                NV_PERF_LOG_ERR(10, "InitializeNvPerf failed\n");
+                return false;
+            }
+
+            if (!nv::perf::VulkanLoadDriver(instance))
+            {
+                NV_PERF_LOG_ERR(10, "Could not load driver\n");
+                return false;
+            }
+
+            if (!nv::perf::profiler::VulkanIsGpuSupported(instance, physicalDevice, device))
+            {
+                NV_PERF_LOG_ERR(10, "GPU is not supported\n");
+                return false;
+            }
+
+            deviceIdentifiers = VulkanGetDeviceIdentifiers(instance, physicalDevice, device);
+            if (!deviceIdentifiers.pChipName)
+            {
+                NV_PERF_LOG_ERR(10, "Unrecognized GPU\n");
+                return false;
+            }
+
+            auto createMetricsEvaluator = [&](std::vector<uint8_t>& scratchBuffer) {
+                const size_t scratchBufferSize = nv::perf::VulkanCalculateMetricsEvaluatorScratchBufferSize(deviceIdentifiers.pChipName);
+                if (!scratchBufferSize)
+                {
+                    return (NVPW_MetricsEvaluator*)nullptr;
+                }
+                scratchBuffer.resize(scratchBufferSize);
+                NVPW_MetricsEvaluator* pMetricsEvaluator = nv::perf::VulkanCreateMetricsEvaluator(scratchBuffer.data(), scratchBuffer.size(), deviceIdentifiers.pChipName);
+                return pMetricsEvaluator;
+            };
+            auto createRawMetricsConfig = [&]() {
+                NVPA_RawMetricsConfig* pRawMetricsConfig = nv::perf::profiler::VulkanCreateRawMetricsConfig(deviceIdentifiers.pChipName);
+                return pRawMetricsConfig;
+            };
+            if (!m_stateMachine.InitializeReportMetrics(deviceIdentifiers, createMetricsEvaluator, createRawMetricsConfig, additionalMetrics))
+            {
+                NV_PERF_LOG_ERR(10, "m_stateMachine.InitializeReportMetrics failed\n");
+                return false;
+            }
+
+            m_instance = instance;
+            m_physicalDevice = physicalDevice;
+            m_device = device;
+            m_initStatus = ReportGeneratorInitStatus::Succeeded;
+
+            NV_PERF_LOG_INF(50, "Initialization succeeded\n");
+
+            return true;
+        }
+
+        /// Explicitly starts a session.  This allows you to control resource allocation.
+        /// Calling this function is optional; by default, OnFrameStart() will start a session if this isn't called.
+        /// The session must be explicitly ended by calling Reset().
+        /// The queue must belong the VkDevice passed into InitializeReportGenerator().
+        bool BeginSession(VkQueue queue, uint32_t queueFamilyIndex, const SessionOptions* pSessionOptions = nullptr)
+        {
+            if (m_initStatus != ReportGeneratorInitStatus::Succeeded)
+            {
+                NV_PERF_LOG_WRN(100, "skipping; the state of InitializeReportGenerator() is %s.\n", ToCString(m_initStatus));
+                return false;
+            }
+
+            auto beginSessionFn = [&]() {
+                return BeginSessionWithOptions(m_instance, m_physicalDevice, m_device, queue, queueFamilyIndex, pSessionOptions);
+            };
+            if (!m_stateMachine.BeginSession(beginSessionFn))
+            {
+                return false;
+            }
+            return true;
+        }
+
+        /// Automatically starts collecting counters for a report, after StartCollectionOnNextFrame().
+        /// Call this at the start of each frame.
+        /// The queue must belong the VkDevice passed into InitializeReportGenerator().
+        bool OnFrameStart(VkQueue queue, uint32_t queueFamilyIndex)
+        {
+            if (m_initStatus != ReportGeneratorInitStatus::Succeeded)
+            {
+                NV_PERF_LOG_WRN(100, "skipping; the state of InitializeReportGenerator() is %s.\n", ToCString(m_initStatus));
+                return false;
+            }
+
+            auto beginSessionFn = [&]() {
+                return BeginSessionWithOptions(m_instance, m_physicalDevice, m_device, queue, queueFamilyIndex);
+            };
+            if (!m_stateMachine.OnFrameStart(beginSessionFn))
+            {
+                return false;
+            }
+            return true;
+        }
+
+        /// Advances the counter-collection state-machine after rendering.
+        /// Call this at the end of each frame.
+        bool OnFrameEnd()
+        {
+            if (m_initStatus != ReportGeneratorInitStatus::Succeeded)
+            {
+                NV_PERF_LOG_WRN(100, "skipping; the state of InitializeReportGenerator() is %s.\n", ToCString(m_initStatus));
+                return false;
+            }
+
+            if (!m_stateMachine.OnFrameEnd())
+            {
+                return false;
+            }
+            return true;
+        }
+
+        /// Reports true after StartCollectionOnNextFrame() is called, until the HTML Report has been written to disk.
+        /// This state is cleared by OnFrameEnd().
+        bool IsCollectingReport() const
+        {
+            return m_stateMachine.IsCollectingReport();
+        }
+
+        const std::string& GetReportDirectoryName() const
+        {
+            return m_stateMachine.GetReportDirectoryName();
+        }
+
+        /// Enqueues report collection, starting on the next frame.
+        bool StartCollectionOnNextFrame(const char* pDirectoryName, AppendDateTime appendDateTime)
+        {
+            if (m_initStatus != ReportGeneratorInitStatus::Succeeded)
+            {
+                NV_PERF_LOG_WRN(100, "skipping; the state of InitializeReportGenerator() is %s.\n", ToCString(m_initStatus));
+                return false;
+            }
+            return m_stateMachine.StartCollectionOnNextFrame(pDirectoryName, appendDateTime);
+        }
+
+        /// Enables a frame-level parent range.
+        /// When enabled (non-NULL, non-empty pRangeName), every frame will have a parent range.
+        /// This is also convenient for programs that have no CommandList-level ranges.
+        /// Pass in NULL or an empty string to disable this behavior.
+        /// The pRangeName string is copied by value, and may be modified or freed after this function returns.
+        void SetFrameLevelRangeName(const char* pRangeName)
+        {
+            m_stateMachine.SetFrameLevelRangeName(pRangeName);
+        }
+
+        /// Retrieves the current frame-level parent range.  An empty string signifies no parent range.
+        const std::string& GetFrameLevelRangeName() const
+        {
+            return m_stateMachine.GetFrameLevelRangeName();
+        }
+
+        /// Sets the number of Push/Pop nesting levels to collect in the report.
+        void SetNumNestingLevels(uint16_t numNestingLevels)
+        {
+            m_stateMachine.SetNumNestingLevels(numNestingLevels);
+        }
+
+        /// Retrieves the number of Push/Pop nesting levels being collected in the report.
+        uint16_t GetNumNestingLevels() const
+        {
+            return m_stateMachine.GetNumNestingLevels();
+        }
+
+        /// Open the report directory in file browser after perf data collection.
+        /// The default behavor is false, and can be changed by enviroment variable NV_PERF_OPEN_REPORT_DIR_AFTER_COLLECTION.
+        void SetOpenReportDirectoryAfterCollection(bool openReportDirectoryAfterCollection)
+        {
+            m_stateMachine.SetOpenReportDirectoryAfterCollection(openReportDirectoryAfterCollection);
+        }
+    };
+}}}
--- a/ruins64k/tools/NvPerfUtility/include/NvPerfVulkan.h
+++ b/ruins64k/tools/NvPerfUtility/include/NvPerfVulkan.h
@@ -0,0 +1,374 @@
+/*
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+#pragma once
+
+#include <vulkan/vulkan.h>
+#include "NvPerfInit.h"
+#include "NvPerfDeviceProperties.h"
+#include "nvperf_vulkan_host.h"
+#include "nvperf_vulkan_target.h"
+
+namespace nv { namespace perf {
+
+    //
+    // Vulkan Only Utilities
+    //
+
+    inline std::string VulkanGetDeviceName(VkPhysicalDevice physicalDevice)
+    {
+        VkPhysicalDeviceProperties properties;
+        vkGetPhysicalDeviceProperties(physicalDevice, &properties);
+        return properties.deviceName;
+    }
+
+    inline bool VulkanIsNvidiaDevice(VkPhysicalDevice physicalDevice)
+    {
+        VkPhysicalDeviceProperties properties;
+        vkGetPhysicalDeviceProperties(physicalDevice, &properties);
+        if (properties.vendorID != NVIDIA_VENDOR_ID)
+        {
+            return false;
+        }
+
+        return true;
+    }
+
+    inline uint32_t VulkanGetInstanceApiVersion()
+    {
+        PFN_vkEnumerateInstanceVersion pfnVkEnumerateInstanceVersion = (PFN_vkEnumerateInstanceVersion)vkGetInstanceProcAddr(VK_NULL_HANDLE, "vkEnumerateInstanceVersion");
+        //This API doesn't exist on 1.0 loader
+        if (!pfnVkEnumerateInstanceVersion)
+        {
+            return VK_API_VERSION_1_0;
+        }
+        
+        uint32_t loaderVersion;
+        VkResult res = pfnVkEnumerateInstanceVersion(&loaderVersion);
+        if (res != VK_SUCCESS)
+        {
+            NV_PERF_LOG_ERR(10, "Couldn't enumerate instance version!\n");
+            return 0;
+        }
+        return loaderVersion;
+    }
+
+    inline uint32_t VulkanGetPhysicalDeviceApiVersion(VkPhysicalDevice physicalDevice)
+    {
+        VkPhysicalDeviceProperties properties;
+        vkGetPhysicalDeviceProperties(physicalDevice, &properties);
+        return properties.apiVersion;
+    }
+
+    //
+    // Vulkan NvPerf Utilities
+    //
+    inline bool VulkanAppendInstanceRequiredExtensions(std::vector<const char*>& instanceExtensionNames)
+    {
+        NVPW_VK_Profiler_GetRequiredInstanceExtensions_Params getRequiredInstanceExtensionsParams = { NVPW_VK_Profiler_GetRequiredInstanceExtensions_Params_STRUCT_SIZE };
+        getRequiredInstanceExtensionsParams.apiVersion = VulkanGetInstanceApiVersion();
+
+        NVPA_Status nvpaStatus = NVPW_VK_Profiler_GetRequiredInstanceExtensions(&getRequiredInstanceExtensionsParams);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(10, "NVPW_VK_Profiler_GetRequiredInstanceExtensions failed\n");
+            return false;
+        }
+
+        if (!getRequiredInstanceExtensionsParams.isOfficiallySupportedVersion)
+        {
+            uint32_t major = VK_VERSION_MAJOR(getRequiredInstanceExtensionsParams.apiVersion);
+            uint32_t minor = VK_VERSION_MINOR(getRequiredInstanceExtensionsParams.apiVersion);
+            uint32_t patch = VK_VERSION_PATCH(getRequiredInstanceExtensionsParams.apiVersion);
+            // not an error - NvPerf treats any unknown version as the same as its latest known version.
+            //                Unknown version warnings should be reported back to the Nsight Perf team to get official support
+            NV_PERF_LOG_WRN(10, "Vulkan Instance API Version: %u.%u.%u - is not an officially supported version\n", major, minor, patch);
+        }
+
+        for (uint32_t extensionIndex=0; extensionIndex < getRequiredInstanceExtensionsParams.numInstanceExtensionNames; ++ extensionIndex)
+        {
+            instanceExtensionNames.push_back(getRequiredInstanceExtensionsParams.ppInstanceExtensionNames[extensionIndex]);
+        }
+        return true;
+    }
+
+    inline bool VulkanAppendDeviceRequiredExtensions(VkInstance instance, VkPhysicalDevice physicalDevice, void* pfnGetInstanceProcAddr, std::vector<const char*>& deviceExtensionNames)
+    {
+        if (!VulkanIsNvidiaDevice(physicalDevice))
+        {
+            return true; // do nothing on non-NVIDIA devices
+        }
+
+        NVPW_VK_Profiler_GetRequiredDeviceExtensions_Params getRequiredDeviceExtensionsParams = { NVPW_VK_Profiler_GetRequiredDeviceExtensions_Params_STRUCT_SIZE };
+        getRequiredDeviceExtensionsParams.apiVersion = VulkanGetPhysicalDeviceApiVersion(physicalDevice);
+
+        // optional parameters - this allows NvPerf to query if certain advanced features are available for use
+        getRequiredDeviceExtensionsParams.instance = instance;
+        getRequiredDeviceExtensionsParams.physicalDevice = physicalDevice;
+        getRequiredDeviceExtensionsParams.pfnGetInstanceProcAddr = pfnGetInstanceProcAddr;
+
+        NVPA_Status nvpaStatus = NVPW_VK_Profiler_GetRequiredDeviceExtensions(&getRequiredDeviceExtensionsParams);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(10, "NVPW_VK_Profiler_GetRequiredDeviceExtensions failed\n");
+            return false;
+        }
+
+        if (!getRequiredDeviceExtensionsParams.isOfficiallySupportedVersion)
+        {
+            uint32_t major = VK_VERSION_MAJOR(getRequiredDeviceExtensionsParams.apiVersion);
+            uint32_t minor = VK_VERSION_MINOR(getRequiredDeviceExtensionsParams.apiVersion);
+            uint32_t patch = VK_VERSION_PATCH(getRequiredDeviceExtensionsParams.apiVersion);
+            // not an error - NvPerf treats any unknown version as the same as its latest known version.
+            //                Unknown version warnings should be reported back to the Nsight Perf team to get official support
+            NV_PERF_LOG_WRN(100, "Vulkan Device API Version: %u.%u.%u - is not an officially supported version\n", major, minor, patch);
+        }
+
+        for (uint32_t extensionIndex=0; extensionIndex < getRequiredDeviceExtensionsParams.numDeviceExtensionNames; ++ extensionIndex)
+        {
+            deviceExtensionNames.push_back(getRequiredDeviceExtensionsParams.ppDeviceExtensionNames[extensionIndex]);
+        }
+
+        return true;
+    }
+
+    inline bool VulkanAppendRequiredExtensions(std::vector<const char*>& instanceExtensionNames, std::vector<const char*>& deviceExtensionNames)
+    {
+        bool status = VulkanAppendInstanceRequiredExtensions(instanceExtensionNames);
+        if (!status)
+        {
+            return false;
+        }
+
+        status = VulkanAppendDeviceRequiredExtensions(VK_NULL_HANDLE, VK_NULL_HANDLE, nullptr, deviceExtensionNames);
+        if (!status)
+        {
+            return false;
+        }
+
+        return true;
+    }
+
+    inline bool VulkanLoadDriver(VkInstance instance)
+    {
+        NVPW_VK_LoadDriver_Params loadDriverParams = { NVPW_VK_LoadDriver_Params_STRUCT_SIZE };
+        loadDriverParams.instance = instance;
+        NVPA_Status nvpaStatus = NVPW_VK_LoadDriver(&loadDriverParams);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(10, "NVPW_VK_LoadDriver failed\n");
+            return false;
+        }
+        return true;
+    }
+
+    inline size_t VulkanGetNvperfDeviceIndex(VkInstance instance, VkPhysicalDevice physicalDevice, VkDevice device, size_t sliIndex = 0)
+    {
+        NVPW_VK_Device_GetDeviceIndex_Params getDeviceIndexParams = { NVPW_VK_Device_GetDeviceIndex_Params_STRUCT_SIZE };
+        getDeviceIndexParams.instance = instance;
+        getDeviceIndexParams.physicalDevice = physicalDevice;
+        getDeviceIndexParams.device = device;
+        getDeviceIndexParams.sliIndex = sliIndex;
+        getDeviceIndexParams.pfnGetInstanceProcAddr = (void*)vkGetInstanceProcAddr;
+        getDeviceIndexParams.pfnGetDeviceProcAddr = (void*)vkGetDeviceProcAddr;
+
+        NVPA_Status nvpaStatus = NVPW_VK_Device_GetDeviceIndex(&getDeviceIndexParams);
+        if (nvpaStatus)
+        {
+            return ~size_t(0);
+        }
+
+        return getDeviceIndexParams.deviceIndex;
+    }
+
+    inline DeviceIdentifiers VulkanGetDeviceIdentifiers(VkInstance instance, VkPhysicalDevice physicalDevice, VkDevice device, size_t sliIndex = 0)
+    {
+        const size_t deviceIndex = VulkanGetNvperfDeviceIndex(instance, physicalDevice, device, sliIndex);
+
+        DeviceIdentifiers deviceIdentifiers = GetDeviceIdentifiers(deviceIndex);
+        return deviceIdentifiers;
+    }
+
+    inline NVPW_Device_ClockStatus VulkanGetDeviceClockState(VkInstance instance, VkPhysicalDevice physicalDevice, VkDevice device)
+    {
+        size_t nvperfDeviceIndex = VulkanGetNvperfDeviceIndex(instance, physicalDevice, device);
+        return GetDeviceClockState(nvperfDeviceIndex);
+    }
+
+    inline bool VulkanSetDeviceClockState(VkInstance instance, VkPhysicalDevice physicalDevice, VkDevice device, NVPW_Device_ClockSetting clockStatus)
+    {
+        size_t nvperfDeviceIndex = VulkanGetNvperfDeviceIndex(instance, physicalDevice, device);
+        return SetDeviceClockState(nvperfDeviceIndex, clockStatus);
+    }
+
+    inline bool VulkanSetDeviceClockState(VkInstance instance, VkPhysicalDevice physicalDevice, VkDevice device, NVPW_Device_ClockStatus clockStatus)
+    {
+        size_t nvperfDeviceIndex = VulkanGetNvperfDeviceIndex(instance, physicalDevice, device);
+        return SetDeviceClockState(nvperfDeviceIndex, clockStatus);
+    }
+
+    inline size_t VulkanCalculateMetricsEvaluatorScratchBufferSize(const char* pChipName)
+    {
+        NVPW_VK_MetricsEvaluator_CalculateScratchBufferSize_Params calculateScratchBufferSizeParams = { NVPW_VK_MetricsEvaluator_CalculateScratchBufferSize_Params_STRUCT_SIZE };
+        calculateScratchBufferSizeParams.pChipName = pChipName;
+        NVPA_Status nvpaStatus = NVPW_VK_MetricsEvaluator_CalculateScratchBufferSize(&calculateScratchBufferSizeParams);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(20, "NVPW_VK_MetricsEvaluator_CalculateScratchBufferSize failed\n");
+            return 0;
+        }
+        return calculateScratchBufferSizeParams.scratchBufferSize;
+    }
+
+    inline NVPW_MetricsEvaluator* VulkanCreateMetricsEvaluator(uint8_t* pScratchBuffer, size_t scratchBufferSize, const char* pChipName)
+    {
+        NVPW_VK_MetricsEvaluator_Initialize_Params initializeParams = { NVPW_VK_MetricsEvaluator_Initialize_Params_STRUCT_SIZE };
+        initializeParams.pScratchBuffer = pScratchBuffer;
+        initializeParams.scratchBufferSize = scratchBufferSize;
+        initializeParams.pChipName = pChipName;
+        NVPA_Status nvpaStatus = NVPW_VK_MetricsEvaluator_Initialize(&initializeParams);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(20, "NVPW_VK_MetricsEvaluator_Initialize failed\n");
+            return nullptr;
+        }
+        return initializeParams.pMetricsEvaluator;
+    }
+
+}}
+
+namespace nv { namespace perf { namespace profiler {
+
+    inline NVPA_RawMetricsConfig* VulkanCreateRawMetricsConfig(const char* pChipName)
+    {
+        NVPW_VK_RawMetricsConfig_Create_Params configParams = { NVPW_VK_RawMetricsConfig_Create_Params_STRUCT_SIZE };
+        configParams.activityKind = NVPA_ACTIVITY_KIND_PROFILER;
+        configParams.pChipName = pChipName;
+
+        NVPA_Status nvpaStatus = NVPW_VK_RawMetricsConfig_Create(&configParams);
+        if (nvpaStatus)
+        {
+            return nullptr;
+        }
+
+        return configParams.pRawMetricsConfig;
+    }
+
+    inline bool VulkanIsGpuSupported(VkInstance instance, VkPhysicalDevice physicalDevice, VkDevice device, size_t sliIndex = 0)
+    {
+        const size_t deviceIndex = VulkanGetNvperfDeviceIndex(instance, physicalDevice, device, sliIndex);
+
+        NVPW_VK_Profiler_IsGpuSupported_Params params = { NVPW_VK_Profiler_IsGpuSupported_Params_STRUCT_SIZE };
+        params.deviceIndex = deviceIndex;
+        NVPA_Status nvpaStatus = NVPW_VK_Profiler_IsGpuSupported(&params);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(10, "NVPW_VK_Profiler_IsGpuSupported failed on %s\n", VulkanGetDeviceName(physicalDevice).c_str());
+            return false;
+        }
+
+        if (!params.isSupported)
+        {
+            NV_PERF_LOG_ERR(10, "%s is not supported\n", VulkanGetDeviceName(physicalDevice).c_str());
+            if (params.gpuArchitectureSupportLevel != NVPW_GPU_ARCHITECTURE_SUPPORT_LEVEL_SUPPORTED)
+            {
+                const DeviceIdentifiers deviceIdentifiers = VulkanGetDeviceIdentifiers(instance, physicalDevice, device, sliIndex);
+                NV_PERF_LOG_ERR(10, "Unsupported GPU architecture %s\n", deviceIdentifiers.pChipName);
+            }
+            if (params.sliSupportLevel == NVPW_SLI_SUPPORT_LEVEL_UNSUPPORTED)
+            {
+                NV_PERF_LOG_ERR(10, "Devices in SLI configuration are not supported.\n");
+            }
+            return false;
+        }
+
+        return true;
+    }
+
+    inline bool VulkanPushRange(VkCommandBuffer commandBuffer, const char* pRangeName)
+    {
+        NVPW_VK_Profiler_CommandBuffer_PushRange_Params pushRangeParams = { NVPW_VK_Profiler_CommandBuffer_PushRange_Params_STRUCT_SIZE };
+        pushRangeParams.pRangeName = pRangeName;
+        pushRangeParams.rangeNameLength = 0;
+        pushRangeParams.commandBuffer = commandBuffer;
+        NVPA_Status nvpaStatus = NVPW_VK_Profiler_CommandBuffer_PushRange(&pushRangeParams);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(50, "NVPW_VK_Profiler_CommandBuffer_PushRange failed\n");
+            return false;
+        }
+        return true;
+    }
+    inline bool VulkanPopRange(VkCommandBuffer commandBuffer)
+    {
+        NVPW_VK_Profiler_CommandBuffer_PopRange_Params popParams = { NVPW_VK_Profiler_CommandBuffer_PopRange_Params_STRUCT_SIZE };
+        popParams.commandBuffer = commandBuffer;
+        NVPA_Status nvpaStatus = NVPW_VK_Profiler_CommandBuffer_PopRange(&popParams);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(50, "NVPW_VK_Profiler_CommandBuffer_PopRange failed\n");
+            return false;
+        }
+        return true;
+    }
+
+    inline bool VulkanPushRange_Nop(VkCommandBuffer commandBuffer, const char* pRangeName)
+    {
+        return false;
+    }
+    inline bool VulkanPopRange_Nop(VkCommandBuffer commandBuffer)
+    {
+        return false;
+    }
+
+    // 
+    struct VulkanRangeCommands
+    {
+        bool isNvidiaDevice;
+        bool(*PushRange)(VkCommandBuffer commandBuffer, const char* pRangeName);
+        bool(*PopRange)(VkCommandBuffer commandBuffer);
+
+    public:
+        VulkanRangeCommands()
+            : isNvidiaDevice(false)
+            , PushRange(&VulkanPushRange_Nop)
+            , PopRange(&VulkanPopRange_Nop)
+        {
+        }
+
+        void Initialize(bool isNvidiaDevice_)
+        {
+            isNvidiaDevice = isNvidiaDevice_;
+            if (isNvidiaDevice_)
+            {
+                PushRange = &VulkanPushRange;
+                PopRange = &VulkanPopRange;
+            }
+            else
+            {
+                PushRange = &VulkanPushRange_Nop;
+                PopRange = &VulkanPopRange_Nop;
+            }
+        }
+
+        void Initialize(VkPhysicalDevice physicalDevice)
+        {
+            const bool isNvidiaDevice_ = VulkanIsNvidiaDevice(physicalDevice);
+            return Initialize(isNvidiaDevice_);
+        }
+    };
+
+}}}
--- a/ruins64k/tools/NvPerfUtility/tools/CMakeLists.txt
+++ b/ruins64k/tools/NvPerfUtility/tools/CMakeLists.txt
@@ -0,0 +1,30 @@
+#[[
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+]]
+
+cmake_minimum_required(VERSION 3.7 FATAL_ERROR)
+cmake_policy(VERSION 3.7)
+
+project(NvPerfTools)
+
+find_package(NvPerf        REQUIRED PATHS ${CMAKE_CURRENT_SOURCE_DIR}/../../cmake)
+find_package(NvPerfUtility REQUIRED PATHS ${CMAKE_CURRENT_SOURCE_DIR}/../../cmake)
+find_package(Vulkan        REQUIRED) # requires cmake 3.7
+
+set(CMAKE_RUNTIME_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/bin/")
+set(NvPerfUtilityImportsDir ${CMAKE_CURRENT_SOURCE_DIR}/../imports)
+
+add_subdirectory(ClockControl)
+add_subdirectory(GpuDiag)
--- a/ruins64k/tools/NvPerfUtility/tools/ClockControl/CMakeLists.txt
+++ b/ruins64k/tools/NvPerfUtility/tools/ClockControl/CMakeLists.txt
@@ -0,0 +1,44 @@
+#[[
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+]]
+
+set(SOURCES ClockControl.cpp)
+
+set(NVPERF_FILES)
+# Add NvPerf API
+foreach(NvPerf_INCLUDE_DIR IN LISTS NvPerf_INCLUDE_DIRS)
+    file(GLOB FILES "${NvPerf_INCLUDE_DIR}/*.h")
+    list(APPEND NVPERF_FILES ${FILES})
+endforeach()
+source_group("NvPerf" FILES ${NVPERF_FILES})
+
+set(NVPERF_UTILITY_FILES)
+# Add NvPerf Utility
+file(GLOB NVPERF_UTILITY_FILES "${NvPerfUtility_INCLUDE_DIRS}/*.h")
+source_group("NvPerfUtility" FILES ${NVPERF_UTILITY_FILES})
+
+set(VULKAN_FILES)
+file(GLOB VULKAN_FILES "${Vulkan_INCLUDE_DIRS}/vulkan/*.h")
+source_group("Vulkan" FILES ${VULKAN_FILES})
+
+add_executable(ClockControl ${SOURCES} ${NVPERF_FILES} ${NVPERF_UTILITY_FILES} ${VULKAN_FILES})
+target_include_directories(ClockControl PRIVATE ${Vulkan_INCLUDE_DIRS})
+target_link_libraries(ClockControl PRIVATE NvPerf NvPerfUtility ${Vulkan_LIBRARY})
+
+if(NOT WIN32)
+    target_link_libraries(ClockControl PRIVATE dl)
+endif()
+
+DeployNvPerf(ClockControl NvPerf)
--- a/ruins64k/tools/NvPerfUtility/tools/ClockControl/ClockControl.cpp
+++ b/ruins64k/tools/NvPerfUtility/tools/ClockControl/ClockControl.cpp
@@ -0,0 +1,306 @@
+/*
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <cstdint>
+#include <cstring>
+
+#include <vulkan/vulkan.h>
+
+#include <nvperf_host_impl.h>
+#include <NvPerfInit.h>
+#include <NvPerfVulkan.h>
+#include <nvperf_target.h>
+
+using namespace nv::perf;
+
+enum class Command
+{
+    Invalid,
+    Status,
+    Lock,
+    Unlock
+};
+
+struct ClockControlState
+{
+    Command command;
+    uint32_t deviceIdx;
+    uint32_t numDevices;
+
+    // Vulkan State
+    VkInstance instance;
+    std::vector<VkPhysicalDevice> physicalDevices;
+    std::vector<VkDevice> logicalDevices;
+};
+
+bool Initialize(ClockControlState& clockControlState)
+{
+    bool nvperfStatus = InitializeNvPerf();
+    if (!nvperfStatus)
+    {
+        return false;
+    }
+
+    // *LoadDriver must be called before using the NVPW device enumeration API.  Any GAPI will do.
+    //   We choose vulkan here because it is cross-platform.
+    VkApplicationInfo applicationInfo = {VK_STRUCTURE_TYPE_APPLICATION_INFO};
+    applicationInfo.pApplicationName = "ClockControl";
+    applicationInfo.applicationVersion = 1;
+    applicationInfo.apiVersion = VK_API_VERSION_1_0;
+
+    VkInstanceCreateInfo instanceCreateInfo = {VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO};
+    instanceCreateInfo.pApplicationInfo = &applicationInfo;
+
+    VkResult vulkanStatus = vkCreateInstance(&instanceCreateInfo, nullptr, &clockControlState.instance);
+    if (vulkanStatus != VK_SUCCESS)
+    {
+        NV_PERF_LOG_ERR(10, "vkCreateInstance failed!\n");
+        return false;
+    }
+
+    nvperfStatus = VulkanLoadDriver(clockControlState.instance);
+    if (!nvperfStatus)
+    {
+        return false;
+    }
+
+    vulkanStatus = vkEnumeratePhysicalDevices(clockControlState.instance, &clockControlState.numDevices, nullptr);
+    if (vulkanStatus != VK_SUCCESS)
+    {
+        NV_PERF_LOG_ERR(10, "Using vkEnumeratePhysicalDevices to retrieve numDevices failed!\n");
+        return false;
+    }
+
+    clockControlState.physicalDevices.resize(clockControlState.numDevices);
+    clockControlState.logicalDevices.resize(clockControlState.numDevices);
+
+    vulkanStatus = vkEnumeratePhysicalDevices(clockControlState.instance, &clockControlState.numDevices, clockControlState.physicalDevices.data());
+    if (vulkanStatus != VK_SUCCESS)
+    {
+        NV_PERF_LOG_ERR(10, "Using vkEnumeratePhysicalDevices to retrieve VkPhysicalDevices failed!\n");
+        return false;
+    }
+
+    VkDeviceCreateInfo deviceCreateInfo = {VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO};
+    for (uint32_t deviceIdx = 0; deviceIdx < clockControlState.numDevices; ++deviceIdx)
+    {
+        vulkanStatus = vkCreateDevice(clockControlState.physicalDevices[deviceIdx], &deviceCreateInfo, nullptr, &clockControlState.logicalDevices[deviceIdx]);
+        if (vulkanStatus != VK_SUCCESS)
+        {
+            NV_PERF_LOG_ERR(10, "vkCreateDevice failed for device index %u!\n", deviceIdx);
+            return false;
+        }
+    }
+
+    return true;
+}
+
+void Cleanup(ClockControlState& clockControlState)
+{
+    for (uint32_t deviceIdx = 0; deviceIdx < clockControlState.numDevices; ++deviceIdx)
+    {
+        vkDestroyDevice(clockControlState.logicalDevices[deviceIdx], nullptr);
+    }
+
+    vkDestroyInstance(clockControlState.instance, nullptr);
+}
+
+bool DoStatus(ClockControlState& clockControlState)
+{
+    for (uint32_t deviceIdx = 0; deviceIdx < clockControlState.numDevices; ++deviceIdx)
+    {
+        if (clockControlState.deviceIdx != (uint32_t)-1 && clockControlState.deviceIdx != deviceIdx)
+        {
+            continue;
+        }
+
+        bool isNvidiaDevice = VulkanIsNvidiaDevice(clockControlState.physicalDevices[deviceIdx]);
+        if (!isNvidiaDevice)
+        {
+            const std::string deviceName = VulkanGetDeviceName(clockControlState.physicalDevices[deviceIdx]);
+            printf("[%u] %s - Not a Nvidia device!\n", deviceIdx, deviceName.c_str());
+        }
+        else
+        {
+            DeviceIdentifiers deviceIdentifiers = VulkanGetDeviceIdentifiers(
+                clockControlState.instance,
+                clockControlState.physicalDevices[deviceIdx],
+                clockControlState.logicalDevices[deviceIdx]);
+
+            NVPW_Device_ClockStatus clockStatus = VulkanGetDeviceClockState(
+                clockControlState.instance,
+                clockControlState.physicalDevices[deviceIdx],
+                clockControlState.logicalDevices[deviceIdx]);
+
+            printf("[%u] %-17s - %s\n", deviceIdx, deviceIdentifiers.pDeviceName, ToCString(clockStatus));
+        }
+    }
+
+    return true;
+}
+
+bool DoLockUnlock(ClockControlState& clockControlState)
+{
+    NVPW_Device_ClockSetting clockSetting = NVPW_DEVICE_CLOCK_SETTING_INVALID;
+    std::string clockSettingStr = "invalid";
+    if (clockControlState.command == Command::Lock)
+    {
+        clockSetting = NVPW_DEVICE_CLOCK_SETTING_LOCK_TO_RATED_TDP;
+        clockSettingStr = "Locked to rated TDP";
+    }
+    else if (clockControlState.command == Command::Unlock)
+    {
+        clockSetting = NVPW_DEVICE_CLOCK_SETTING_DEFAULT;
+        clockSettingStr = "Unlocked";
+    }
+    else
+    {
+        NV_PERF_LOG_ERR(10, "Invalid command while trying to lock/unlock clock!\n");
+        return false;
+    }
+
+    for (uint32_t deviceIdx = 0; deviceIdx < clockControlState.numDevices; ++deviceIdx)
+    {
+        if (clockControlState.deviceIdx != (uint32_t)-1 && clockControlState.deviceIdx != deviceIdx)
+        {
+            continue;
+        }
+
+        bool isNvidiaDevice = VulkanIsNvidiaDevice(clockControlState.physicalDevices[deviceIdx]);
+        if (!isNvidiaDevice)
+        {
+            const std::string deviceName = VulkanGetDeviceName(clockControlState.physicalDevices[deviceIdx]);
+            printf("[%u] %s - Not an NVIDIA device!\n", deviceIdx, deviceName.c_str());
+        }
+        else
+        {
+            DeviceIdentifiers deviceIdentifiers = VulkanGetDeviceIdentifiers(
+                clockControlState.instance,
+                clockControlState.physicalDevices[deviceIdx],
+                clockControlState.logicalDevices[deviceIdx]);
+
+
+            bool success = VulkanSetDeviceClockState(
+                clockControlState.instance,
+                clockControlState.physicalDevices[deviceIdx],
+                clockControlState.logicalDevices[deviceIdx],
+                clockSetting);
+
+            printf("[%u] %-17s - %s\n", deviceIdx, deviceIdentifiers.pDeviceName, clockSettingStr.c_str());
+        }
+    }
+    return true;
+}
+
+void PrintUsage()
+{
+    printf("Usage: ClockControl <command> [deviceIdx]\n");
+    printf("\n");
+    printf("Allowed values for <command>:\n");
+    printf("  status        - display current clock setting per requested device\n");
+    printf("  lock          - lock the clock per requested device\n");
+    printf("  unlock        - unlock the clock per requested device\n");
+    printf("\n");
+    printf("Allowed values for [Options]:\n");
+    printf(" deviceIdx      - device index to set/get, default set/get all\n");
+    printf("\n");
+}
+
+bool ParseArguments(const int argc, const char* argv[], ClockControlState& clockControlState)
+{
+    clockControlState.command = Command::Invalid;
+    clockControlState.deviceIdx = (uint32_t)-1; // set all devices
+
+    if (argc < 2)
+    {
+        NV_PERF_LOG_ERR(10, "Missing <command> selection!\n");
+        PrintUsage();
+        return false;
+    }
+
+    for(uint32_t argidx = 1; argidx < (uint32_t)argc; ++argidx)
+    {
+        if (!strcmp(argv[argidx], "-h") || !strcmp(argv[argidx], "--help"))
+        {
+            PrintUsage();
+            exit(0);
+        }
+    }
+
+    if (!strcmp(argv[1], "status"))
+    {
+        clockControlState.command = Command::Status;
+    }
+    else if (!strcmp(argv[1], "lock"))
+    {
+        clockControlState.command = Command::Lock;
+    }
+    else if (!strcmp(argv[1], "unlock"))
+    {
+        clockControlState.command = Command::Unlock;
+    }
+    else
+    {
+        NV_PERF_LOG_ERR(10, "Invalid command selected.\n");
+        PrintUsage();
+        return false;
+    }
+
+    if (argc < 3)
+    {
+        // no device index set == set all
+        return true;
+    }
+
+    clockControlState.deviceIdx = (uint32_t)atoi(argv[2]);
+
+    return true;
+}
+
+int main(const int argc, const char* argv[])
+{
+    ClockControlState clockControlState;
+    if (!ParseArguments(argc, argv, clockControlState))
+    {
+        return 0;
+    }
+
+    if (!Initialize(clockControlState))
+    {
+        return 0;
+    }
+
+    switch (clockControlState.command)
+    {
+        case Command::Status:
+            DoStatus(clockControlState);
+            break;
+        case Command::Lock:
+        case Command::Unlock:
+            DoLockUnlock(clockControlState);
+            break;
+        default:
+            NV_PERF_LOG_ERR(10, "Invalid command set!\n");
+            PrintUsage();
+            return 1;
+    }
+
+    Cleanup(clockControlState);
+
+    return 0;
+}
--- a/ruins64k/tools/NvPerfUtility/tools/GpuDiag/CMakeLists.txt
+++ b/ruins64k/tools/NvPerfUtility/tools/GpuDiag/CMakeLists.txt
@@ -0,0 +1,70 @@
+#[[
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+]]
+
+set(NVPERF_FILES)
+# Add NvPerf API
+foreach(NvPerf_INCLUDE_DIR IN LISTS NvPerf_INCLUDE_DIRS)
+    file(GLOB FILES "${NvPerf_INCLUDE_DIR}/*.h")
+    list(APPEND NVPERF_FILES ${FILES})
+endforeach()
+source_group("NvPerf" FILES ${NVPERF_FILES})
+
+set(NVPERF_UTILITY_FILES)
+# Add NvPerf Utility
+file(GLOB NVPERF_UTILITY_FILES "${NvPerfUtility_INCLUDE_DIRS}/*.h")
+source_group("NvPerfUtility" FILES ${NVPERF_UTILITY_FILES})
+
+set(VULKAN_FILES)
+file(GLOB VULKAN_FILES "${Vulkan_INCLUDE_DIRS}/vulkan/*.h")
+source_group("Vulkan" FILES ${VULKAN_FILES})
+
+set(SOURCES
+    GpuDiag.cpp
+    GpuDiagHtmlTemplate.cpp
+)
+
+add_executable(GpuDiag
+    ${SOURCES}
+    ${NVPERF_FILES}
+    ${NVPERF_UTILITY_FILES}
+    ${VULKAN_FILES}
+)
+target_include_directories(GpuDiag
+    PRIVATE
+        ${Vulkan_INCLUDE_DIRS}
+        ${NvPerfUtilityImportsDir}/json-3.9.1
+)
+target_link_libraries(GpuDiag
+    PRIVATE
+        NvPerf
+        NvPerfUtility
+        ${Vulkan_LIBRARY}
+)
+
+if(WIN32)
+    target_link_libraries(GpuDiag
+        PRIVATE
+            d3d12
+            dxgi
+    )
+else()
+    target_link_libraries(GpuDiag
+        PRIVATE
+            dl
+    )
+endif()
+
+DeployNvPerf(GpuDiag NvPerf)
--- a/ruins64k/tools/NvPerfUtility/tools/GpuDiag/GpuDiag.cpp
+++ b/ruins64k/tools/NvPerfUtility/tools/GpuDiag/GpuDiag.cpp
@@ -0,0 +1,303 @@
+/*
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <cstdint>
+#include <cstring>
+#include <iostream>
+#include <fstream>
+#include <cassert>
+
+#include <nvperf_host_impl.h>
+#include <NvPerfInit.h>
+#include <nvperf_target.h>
+#include "GpuDiagGApi_VK.h"
+#if defined(_WIN32)
+#include "GpuDiagGApi_DX.h"
+#include "GpuDiagOS_Windows.h"
+#elif defined(__linux__)
+#include "GpuDiagOS_Linux.h"
+#endif
+
+namespace nv { namespace perf { namespace tool {
+
+    using namespace nlohmann;
+
+    extern const std::string HtmlTemplate;
+    const char* DEFAULT_HTML_OUTPUT_PATH = "GpuDiag.html";
+
+    struct Options
+    {
+        enum class Output
+        {
+            json,
+            html
+        };
+        Output output = Output::json;
+        std::string htmlPath = DEFAULT_HTML_OUTPUT_PATH;
+    };
+
+    struct GpuDiagState
+    {
+        vk::State vkState;
+#if defined(_WIN32)
+        dx::State dxState;
+        windows::State winState;
+#elif defined(__linux__)
+        linux_::State linuxState;
+#endif
+    };
+
+    static void PrintUsage()
+    {
+        printf("Usage: GpuDiag [--html [path_to_html_file]]\n");
+        printf("\n");
+        printf("By default it will print JSON to the console.\n");
+        printf("Use \"--html path_to_html_file\" to generate a html file.\n");
+        printf("The default \"path_to_html_file\" is %s in the current working directory.\n", DEFAULT_HTML_OUTPUT_PATH);
+    }
+
+    static bool ParseArguments(const int argc, const char* argv[], Options& options)
+    {
+        for(uint32_t argIdx = 1; argIdx < (uint32_t)argc; ++argIdx)
+        {
+            if (!strcmp(argv[argIdx], "-h") || !strcmp(argv[argIdx], "--help"))
+            {
+                PrintUsage();
+                exit(0);
+            }
+        }
+
+        options = Options{};
+        if (argc == 1)
+        {
+            return true;
+        }
+
+        if (!strcmp(argv[1], "--html"))
+        {
+            options.output = Options::Output::html;
+            if (argc >= 3)
+            {
+                options.htmlPath = argv[2];
+            }
+        }
+        else
+        {
+            NV_PERF_LOG_ERR(10, "Unknown argument specified: %s\n", argv[1]);
+            PrintUsage();
+            return false;
+        }
+
+        return true;
+    }
+
+    static size_t GetDeviceCount()
+    {
+        NVPW_GetDeviceCount_Params getDeviceCountParams = { NVPW_GetDeviceCount_Params_STRUCT_SIZE };
+        NVPA_Status nvpaStatus = NVPW_GetDeviceCount(&getDeviceCountParams);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(50, "Failed NVPW_GetDeviceCount: %u\n", nvpaStatus);
+            return ~0;
+        }
+        return getDeviceCountParams.numDevices;
+    }
+
+    static void AppendGlobalState(const GpuDiagState& state, ordered_json& node)
+    {
+        {
+            node["GraphicsDriverVersion"] = nullptr;
+            std::string driverVersion;
+            bool success = vk::GetDriverVersion(state.vkState, driverVersion);
+            if (success)
+            {
+                node["GraphicsDriverVersion"] = driverVersion;
+            }
+        }
+
+        // gpus
+        node["GPUs"] = ordered_json::array();
+        const size_t numDevices = GetDeviceCount();
+        if (numDevices == ~0)
+        {
+            NV_PERF_LOG_ERR(50, "Failed GetDeviceCount\n");
+        }
+        else 
+        {
+            auto& gpusNode = node["GPUs"];
+            for (size_t nvpwDeviceIndex = 0; nvpwDeviceIndex < numDevices; ++nvpwDeviceIndex)
+            {
+                gpusNode.emplace_back(ordered_json());
+                auto& gpu = gpusNode.back();
+
+                gpu["ProfilerDeviceIndex"] = nvpwDeviceIndex;
+                const DeviceIdentifiers deviceIdentifiers = GetDeviceIdentifiers(nvpwDeviceIndex);
+                gpu["DeviceName"] = deviceIdentifiers.pDeviceName;
+                gpu["ChipName"] = deviceIdentifiers.pChipName;
+
+                // use VK as a cross-platform way for querying vRamSize/ClockStatus etc
+                // first, map nvpw device index to vk device index
+                size_t vkDeviceIndex = 0;
+                for (; vkDeviceIndex < state.vkState.devices.size(); ++vkDeviceIndex)
+                {
+                    if (state.vkState.devices[vkDeviceIndex].nvpwDeviceIndex == nvpwDeviceIndex)
+                    {
+                        break;
+                    }
+                }
+
+                gpu["VideoMemorySize"] = nullptr;
+                gpu["ClockStatus"] = nullptr;
+                if (vkDeviceIndex == state.vkState.devices.size())
+                {
+                    NV_PERF_LOG_ERR(10, "Unable to find vkDeviceIndex for nvpwDeviceIndex: %u\n", nvpwDeviceIndex);
+                    continue;
+                }
+                const VkPhysicalDevice physicalDevice = state.vkState.devices[vkDeviceIndex].physical;
+                gpu["VideoMemorySize"] = vk::SizeToString(vk::GetVRamSize(physicalDevice));
+                gpu["ClockStatus"] = nv::perf::ToCString(GetDeviceClockState(nvpwDeviceIndex));
+            }
+        }
+    }
+
+    bool InitializeState(GpuDiagState& globalState)
+    {
+        bool nvperfStatus = InitializeNvPerf();
+        if (!nvperfStatus)
+        {
+            NV_PERF_LOG_ERR(10, "InitializeNvPerf failed!\n");
+            return false;
+        }
+        if (!vk::InitializeState(globalState.vkState))
+        {
+            NV_PERF_LOG_ERR(10, "vk::InitializeState failed!\n");
+            return false;
+        }
+#if defined(_WIN32)
+        if (!dx::InitializeState(globalState.dxState))
+        {
+            NV_PERF_LOG_ERR(10, "dx::InitializeState failed!\n");
+            return false;
+        }
+        if (!windows::InitializeState(globalState.winState))
+        {
+            NV_PERF_LOG_ERR(10, "windows::InitializeState failed!\n");
+            return false;
+        }
+#elif defined(__linux__)
+        if (!linux_::InitializeState(globalState.linuxState))
+        {
+            NV_PERF_LOG_ERR(10, "linux_::InitializeState failed!\n");
+            return false;
+        }
+#endif
+        return true;
+    }
+
+    void AppendState(const GpuDiagState& globalState, ordered_json& root)
+    {
+#if defined(_WIN32)
+        windows::AppendState(globalState.winState, root["Windows"]);
+#elif defined(__linux__)
+        linux_::AppendState(globalState.linuxState, root["Linux"]);
+#endif
+        AppendGlobalState(globalState, root["Global"]);
+        vk::AppendState(globalState.vkState, root["Vulkan"]);
+#if defined(_WIN32)
+        dx::AppendState(globalState.dxState, root["D3D"]);
+#endif
+    }
+
+    void CleanupState(GpuDiagState& globalState)
+    {
+        vk::CleanupState(globalState.vkState);
+#if defined(_WIN32)
+        dx::CleanupState(globalState.dxState);
+        windows::CleanupState(globalState.winState);
+#elif defined(__linux__)
+        linux_::CleanupState(globalState.linuxState);
+#endif
+    }
+
+    bool Output(const Options& options, const ordered_json& root)
+    {
+        const int indent = 4;
+        const std::string jsonStr = root.dump(indent);
+        if (options.output == Options::Output::json)
+        {
+            std::cout << jsonStr << std::endl;
+        }
+        else if (options.output == Options::Output::html)
+        {
+            std::ofstream html(options.htmlPath);
+            NV_PERF_LOG_INF(10, "Writing a html report to %s\n", options.htmlPath.c_str());
+            if (!html.is_open())
+            {
+                NV_PERF_LOG_ERR(10, "Failed to open file: %s\n", options.htmlPath.c_str());
+                return false;
+            }
+
+            const char* pJsonReplacementMarker = "/***JSON_DATA_HERE***/";
+            const size_t insertPoint = HtmlTemplate.find(pJsonReplacementMarker);
+            if (insertPoint == std::string::npos)
+            {
+                NV_PERF_LOG_ERR(10, "Invalid HTML template!\n");
+                assert(!"Invalid HTML template!");
+                return false;
+            }
+            html << HtmlTemplate.substr(0, insertPoint);
+            html << jsonStr;
+            html << HtmlTemplate.substr(insertPoint + strlen(pJsonReplacementMarker));
+        }
+        return true;
+    }
+
+}}} // nv::perf::tool
+
+int main(const int argc, const char* argv[])
+{
+    using namespace nv::perf;
+    using namespace nv::perf::tool;
+
+    Options options;
+    if (!ParseArguments(argc, argv, options))
+    {
+        NV_PERF_LOG_ERR(10, "Failed ParseArguments\n");
+        return -1;
+    }
+
+    nlohmann::ordered_json root;
+    {
+        GpuDiagState globalState;
+        if (!InitializeState(globalState))
+        {
+            NV_PERF_LOG_ERR(10, "Failed InitializeState\n");
+            return -1;
+        }
+        AppendState(globalState, root);
+        CleanupState(globalState);
+    }
+
+    if (!Output(options, root))
+    {
+        NV_PERF_LOG_ERR(10, "Failed Output\n");
+        return -1;
+    }
+
+    return 0;
+}
--- a/ruins64k/tools/NvPerfUtility/tools/GpuDiag/GpuDiagCommon.h
+++ b/ruins64k/tools/NvPerfUtility/tools/GpuDiag/GpuDiagCommon.h
@@ -0,0 +1,62 @@
+/*
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+#pragma once
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <cstdint>
+#include <cstring>
+#include <string>
+#include <sstream>
+#include <iomanip>
+
+namespace nv { namespace perf { namespace tool {
+
+    template <typename T>
+    inline std::string SizeToString(T size)
+    {
+        const char SIZE_UNITS[5][4] = { "B", "KiB", "MiB", "GiB", "TiB" };
+        double size_ = static_cast<double>(size);
+        size_t unitIdx = 0;
+        while (size_ >= 1024.0)
+        {
+            size_ /= 1024.0;
+            ++unitIdx;
+        }
+        std::stringstream ss;
+        ss.precision(2);
+        ss << std::fixed << size_ << " " << SIZE_UNITS[unitIdx];
+        return ss.str();
+    }
+
+    template <size_t ArraySize>
+    inline std::string IdToString(const uint8_t id[ArraySize])
+    {
+        std::stringstream ss;
+        ss << std::hex << std::setfill('0');
+        for (size_t ii = 0; ii < ArraySize; ++ii)
+        {
+            if (ii && !(ii % 2))
+            {
+                ss << "-";
+            }
+            ss << std::setw(2) << static_cast<uint32_t>(id[ii]);
+        }
+        return ss.str();
+    }
+
+}}} // nv::perf::tool
--- a/ruins64k/tools/NvPerfUtility/tools/GpuDiag/GpuDiagGApi_DX.h
+++ b/ruins64k/tools/NvPerfUtility/tools/GpuDiag/GpuDiagGApi_DX.h
@@ -0,0 +1,342 @@
+/*
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+#pragma once
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <cstdint>
+#include <cstring>
+#include <sstream>
+#include <iomanip>
+#include <vector>
+#include <cmath>
+#ifndef WIN32_LEAN_AND_MEAN
+#define WIN32_LEAN_AND_MEAN             // Exclude rarely-used stuff from Windows headers.
+#endif
+#include <windows.h>
+#include <d3d12.h>
+#include <dxgi1_6.h>
+#include <D3Dcompiler.h>
+#include <DirectXMath.h>
+#include <wrl.h>
+#include <shellapi.h>
+
+#include <json/json.hpp>
+
+#include <NvPerfInit.h>
+#include <NvPerfD3D12.h>
+
+#include "GpuDiagCommon.h"
+
+namespace nv { namespace perf { namespace tool { namespace dx {
+
+    using namespace nv::perf::tool;
+    using namespace nlohmann;
+
+    struct State
+    {
+        struct Device
+        {
+            size_t adapterIndex = ~0;
+            size_t nvpwDeviceIndex = ~0;
+            CComPtr<IDXGIAdapter1> pAdapter;
+            CComPtr<ID3D12Device> pDevice;
+            DXGI_ADAPTER_DESC1 adapterDesc;
+            CComPtr<ID3D12CommandQueue> pCommandQueue;
+        };
+        std::vector<Device> devices;
+        bool isDriverLoaded = false;
+    };
+
+    inline bool IsDebugLayerEnabled(ID3D12Device* pDevice)
+    {
+        CComPtr<ID3D12DebugDevice> pDebugDevice;
+        HRESULT hr = pDevice->QueryInterface(IID_PPV_ARGS(&pDebugDevice));
+        if (SUCCEEDED(hr))
+        {
+            return true;
+        }
+        return false;
+    }
+
+    inline NVPA_Status ProfilerSessionSupported(ID3D12CommandQueue* pCommandQueue)
+    {
+        NVPW_D3D12_Profiler_CalcTraceBufferSize_Params calcTraceBufferSizeParam = { NVPW_D3D12_Profiler_CalcTraceBufferSize_Params_STRUCT_SIZE };
+        calcTraceBufferSizeParam.maxRangesPerPass = 1;
+        calcTraceBufferSizeParam.avgRangeNameLength = 256;
+        NVPA_Status nvpaStatus = NVPW_D3D12_Profiler_CalcTraceBufferSize(&calcTraceBufferSizeParam);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(10, "NVPW_D3D12_Profiler_CalcTraceBufferSize failed\n");
+            return nvpaStatus;
+        }
+
+        NVPW_D3D12_Profiler_Queue_BeginSession_Params beginSessionParams = { NVPW_D3D12_Profiler_Queue_BeginSession_Params_STRUCT_SIZE };
+        beginSessionParams.pCommandQueue = pCommandQueue;
+        beginSessionParams.numTraceBuffers = 2;
+        beginSessionParams.traceBufferSize = calcTraceBufferSizeParam.traceBufferSize;
+        beginSessionParams.maxRangesPerPass = 1;
+        beginSessionParams.maxLaunchesPerPass = 1;
+        nvpaStatus = NVPW_D3D12_Profiler_Queue_BeginSession(&beginSessionParams);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(10, "NVPW_D3D12_Profiler_Queue_BeginSession failed\n");
+            return nvpaStatus;
+        }
+
+        NVPW_D3D12_Profiler_Queue_EndSession_Params endSessionParams = { NVPW_D3D12_Profiler_Queue_EndSession_Params_STRUCT_SIZE };
+        endSessionParams.pCommandQueue = pCommandQueue;
+        endSessionParams.timeout = INFINITE;
+        NVPA_Status endSessionStatus = NVPW_D3D12_Profiler_Queue_EndSession(&endSessionParams);
+        if (endSessionStatus)
+        {
+            NV_PERF_LOG_ERR(10, "NVPW_D3D12_Profiler_Queue_EndSession failed\n");
+        }
+        return nvpaStatus;
+    }
+
+    // if any of the DX calls fail, the function failes;
+    // but succeeding in NVPW calls is not a must, as per the purpose of this program
+    inline bool InitializeState(State& state)
+    {
+        HRESULT result = S_OK;
+        CComPtr<IDXGIFactory4> pFactory;
+        result = CreateDXGIFactory2(0, IID_PPV_ARGS(&pFactory));
+        if (result != S_OK)
+        {
+            NV_PERF_LOG_ERR(10, "CreateDXGIFactory2 failed!\n");
+            return false;
+        }
+
+        CComPtr<IDXGIAdapter1> pAdapter;
+        for (UINT adapterIndex = 0; DXGI_ERROR_NOT_FOUND != pFactory->EnumAdapters1(adapterIndex, &pAdapter); ++adapterIndex)
+        {
+            DXGI_ADAPTER_DESC1 adapterDesc = {};
+            result = pAdapter->GetDesc1(&adapterDesc);
+            if (result != S_OK)
+            {
+                NV_PERF_LOG_ERR(50, "pAdapter->GetDesc1 failed for adapter index %u!\n", adapterIndex);
+                return false;
+            }
+
+            State::Device device;
+            device.adapterIndex = adapterIndex;
+            device.adapterDesc = adapterDesc;
+            device.pAdapter = std::move(pAdapter);
+            pAdapter = nullptr;
+            result = D3D12CreateDevice(device.pAdapter, D3D_FEATURE_LEVEL_11_0, IID_PPV_ARGS(&device.pDevice));
+            if (result != S_OK)
+            {
+                NV_PERF_LOG_ERR(10, "D3D12CreateDevice failed for adapter index %u!\n", adapterIndex);
+                return false;
+            }
+
+            D3D12_COMMAND_QUEUE_DESC queueDesc = {};
+            queueDesc.Type = D3D12_COMMAND_LIST_TYPE_DIRECT;
+            queueDesc.Flags = D3D12_COMMAND_QUEUE_FLAG_NONE;
+            result = device.pDevice->CreateCommandQueue(&queueDesc, IID_PPV_ARGS(&device.pCommandQueue));
+            if (result != S_OK)
+            {
+                NV_PERF_LOG_ERR(10, "Create a direct queue failed for adapter index %u!\n", adapterIndex);
+                // continue execution
+            }
+            state.devices.emplace_back(std::move(device));
+        }
+
+        // profiler-specific
+        bool success = true;
+        success = D3D12LoadDriver();
+        if (!success)
+        {
+            NV_PERF_LOG_ERR(10, "D3D12LoadDriver failed!\n");
+        }
+        else
+        {
+            state.isDriverLoaded = true;
+            for (size_t deviceIndex = 0; deviceIndex < state.devices.size(); ++deviceIndex)
+            {
+                State::Device& device = state.devices[deviceIndex];
+                if (D3D12IsNvidiaDevice(device.pDevice))
+                {
+                    const size_t sliIndex = 0;
+                    device.nvpwDeviceIndex = D3DGetNvperfDeviceIndex(device.pAdapter, sliIndex);
+                    if (device.nvpwDeviceIndex == ~size_t(0))
+                    {
+                        NV_PERF_LOG_ERR(50, "D3DGetNvperfDeviceIndex failed for adapter index %u!\n", deviceIndex);
+                    }
+                }
+            }
+        }
+
+        return true;
+    }
+
+    inline void AppendDeviceState(const State& state, size_t deviceIndex, ordered_json& node)
+    {
+        const State::Device& currentDevice = state.devices[deviceIndex];
+        const DXGI_ADAPTER_DESC1& adapterDesc = currentDevice.adapterDesc;
+        node["DXGIAdapterIndex"] = deviceIndex;
+        const std::string adapterName(adapterDesc.Description, adapterDesc.Description + wcslen(adapterDesc.Description));
+        node["Name"] = adapterName;
+        node["VendorId"] = adapterDesc.VendorId;
+        node["DeviceId"] = adapterDesc.DeviceId;
+        {
+            uint8_t luid[sizeof(adapterDesc.AdapterLuid)] = {};
+            memcpy(luid, &adapterDesc.AdapterLuid.LowPart, sizeof(adapterDesc.AdapterLuid.LowPart));
+            memcpy(luid + sizeof(adapterDesc.AdapterLuid.LowPart), &adapterDesc.AdapterLuid.HighPart, sizeof(adapterDesc.AdapterLuid.HighPart));
+            node["DeviceLUID"] = IdToString<sizeof(adapterDesc.AdapterLuid)>(luid);
+        }
+        node["DedicatedVideoMemory"] = SizeToString(adapterDesc.DedicatedVideoMemory);
+        node["DedicatedSystemMemory"] = SizeToString(adapterDesc.DedicatedSystemMemory);
+        node["SharedSystemMemory"] = SizeToString(adapterDesc.SharedSystemMemory);
+        node["IsDebugLayerForcedOn"] = IsDebugLayerEnabled(currentDevice.pDevice);
+
+        // displays
+        HRESULT hr = S_OK;
+        node["Displays"] = ordered_json::array();
+        auto& displays = node["Displays"];
+        for (uint32_t outputIdx = 0; ; ++outputIdx)
+        {
+            IDXGIOutput* pOutput = nullptr;
+            hr = currentDevice.pAdapter->EnumOutputs(outputIdx, &pOutput);
+            if (SUCCEEDED(hr))
+            {
+                DXGI_OUTPUT_DESC outputDesc;
+                hr = pOutput->GetDesc(&outputDesc);
+                if (!SUCCEEDED(hr))
+                {
+                    NV_PERF_LOG_ERR(10, "pOutput->GetDesc failed for outputIdx: %u!\n", outputIdx);
+                    continue;
+                }
+                auto display = ordered_json();
+                display["OutputIndex"] = outputIdx;
+                {
+                    // nlohmann/json doesn't yet support wchar, covert it to std::string
+                    // https://github.com/nlohmann/json/issues/2453
+                    // this may not be ideal for non-ascii chars
+                    std::wstring wstr(outputDesc.DeviceName);
+                    display["Description"] = std::string(wstr.begin(), wstr.end());
+                }
+                display["Left"] = outputDesc.DesktopCoordinates.left;
+                display["Top"] = outputDesc.DesktopCoordinates.top;
+                display["Width"] = std::abs(outputDesc.DesktopCoordinates.right - outputDesc.DesktopCoordinates.left);
+                display["Height"] = std::abs(outputDesc.DesktopCoordinates.bottom - outputDesc.DesktopCoordinates.top);
+                display["AttachedToDesktop"] = !!outputDesc.AttachedToDesktop;
+                displays.emplace_back(std::move(display));
+            }
+            else if (hr == DXGI_ERROR_NOT_FOUND)
+            {
+                break; // DFD
+            }
+            else
+            {
+                break;
+            }
+        }
+
+        // NV-specific
+        const bool isNvidiaDevice = (adapterDesc.VendorId == 0x10de);
+        node["IsNvidiaDevice"] = isNvidiaDevice;
+        node["ProfilerDeviceIndex"] = nullptr;
+        node["ProfilerIsGpuSupported"]["IsSupported"] = false;
+        node["ProfilerIsGpuSupported"]["GpuArchitectureSupported"] = nullptr;
+        node["ProfilerIsGpuSupported"]["SliSupportLevel"] = nullptr;
+        node["ProfilerIsGpuSupported"]["Advice"] = "Unrecognized device";
+        node["ProfilerIsSessionSupported"]["IsSupported"] = false;
+        node["ProfilerIsSessionSupported"]["Advice"] = "Unsupported Gpu";
+        auto success = [&]() {
+            if (!isNvidiaDevice || currentDevice.nvpwDeviceIndex == ~0)
+            {
+                return false;
+            }
+            node["ProfilerDeviceIndex"] = currentDevice.nvpwDeviceIndex;
+
+            NVPW_D3D12_Profiler_IsGpuSupported_Params isGpuSupportedParams = { NVPW_D3D12_Profiler_IsGpuSupported_Params_STRUCT_SIZE };
+            isGpuSupportedParams.deviceIndex = currentDevice.nvpwDeviceIndex;
+            NVPA_Status nvpaStatus = NVPW_D3D12_Profiler_IsGpuSupported(&isGpuSupportedParams);
+            if (nvpaStatus)
+            {
+                NV_PERF_LOG_ERR(10, "NVPW_D3D12_Profiler_IsGpuSupported failed\n");
+                return false;
+            }
+            node["ProfilerIsGpuSupported"]["GpuArchitectureSupported"] = true;
+            node["ProfilerIsGpuSupported"]["SliSupportLevel"] = true;
+            if (!isGpuSupportedParams.isSupported)
+            {
+                std::string unsupportedReason = "";
+                if (isGpuSupportedParams.gpuArchitectureSupportLevel != NVPW_GPU_ARCHITECTURE_SUPPORT_LEVEL_SUPPORTED)
+                {
+                    node["ProfilerIsGpuSupported"]["GpuArchitectureSupported"] = false;
+                    unsupportedReason += "Unsupported GPU architecture;";
+                }
+                if (isGpuSupportedParams.sliSupportLevel == NVPW_SLI_SUPPORT_LEVEL_UNSUPPORTED)
+                {
+                    node["ProfilerIsGpuSupported"]["SliSupportLevel"] = false;
+                    unsupportedReason += "Devices in SLI configuration are not supported;";
+                }
+                node["ProfilerIsGpuSupported"]["Advice"] = unsupportedReason;
+                return false;
+            }
+            node["ProfilerIsGpuSupported"]["IsSupported"] = true;
+            node["ProfilerIsGpuSupported"]["Advice"] = "";
+
+            // test if we can start a profiler session on this device
+            nvpaStatus = ProfilerSessionSupported(currentDevice.pCommandQueue);
+            if (nvpaStatus)
+            {
+                NV_PERF_LOG_ERR(10, "ProfilerSessionSupported failed\n");
+                std::string unsupportedReason;
+                if (nvpaStatus == NVPA_STATUS_INSUFFICIENT_PRIVILEGE)
+                {
+                    unsupportedReason = "Profiling permissions not enabled. Please follow these instructions: https://developer.nvidia.com/nvidia-development-tools-solutions-ERR_NVGPUCTRPERM-permission-issue-performance-counters";
+                }
+                else if (nvpaStatus == NVPA_STATUS_INSUFFICIENT_DRIVER_VERSION)
+                {
+                    unsupportedReason = "Insufficient driver version. Please install the latest NVIDIA driver from https://www.nvidia.com";
+                }
+                else
+                {
+                    unsupportedReason = "Unknown error";
+                }
+                node["ProfilerIsSessionSupported"]["Advice"] = unsupportedReason;
+                return false;
+            }
+            node["ProfilerIsSessionSupported"]["IsSupported"] = true;
+            node["ProfilerIsSessionSupported"]["Advice"] = "";
+            return true;
+        }();
+    }
+
+    inline void AppendState(const State& state, ordered_json& node)
+    {
+        HRESULT result = S_OK;
+        node["ProfilerDriverLoaded"] = state.isDriverLoaded;
+        auto& devices = node["Devices"];
+        for (size_t deviceIndex = 0; deviceIndex < state.devices.size(); ++deviceIndex)
+        {
+            devices.emplace_back(ordered_json());
+            auto& device = devices.back();
+            AppendDeviceState(state, deviceIndex, device);
+        }
+    }
+
+    inline void CleanupState(State& state)
+    {
+        state = State();
+    }
+
+}}}} // nv::perf::tool::dx
--- a/ruins64k/tools/NvPerfUtility/tools/GpuDiag/GpuDiagGApi_VK.h
+++ b/ruins64k/tools/NvPerfUtility/tools/GpuDiag/GpuDiagGApi_VK.h
@@ -0,0 +1,639 @@
+/*
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+#pragma once
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <cstdint>
+#include <cstring>
+#include <sstream>
+#include <iomanip>
+#include <vector>
+
+#include <vulkan/vulkan.h>
+#include <json/json.hpp>
+
+#include <NvPerfInit.h>
+#include <NvPerfVulkan.h>
+
+#include "GpuDiagCommon.h"
+
+#define NV_DRIVER_VERSION_MAJOR(vkDriverVersion) ((uint32_t)(vkDriverVersion) >> 22)
+#define NV_DRIVER_VERSION_MINOR(vkDriverVersion) (((uint32_t)(vkDriverVersion) >> 14) & 0xFF)
+#define NV_DRIVER_VERSION_PATCH(vkDriverVersion) ((uint32_t)(vkDriverVersion) & 0x3fff)
+
+namespace nv { namespace perf { namespace tool { namespace vk {
+
+    using namespace nv::perf::tool;
+    using namespace nlohmann;
+
+    struct State
+    {
+        struct Device
+        {
+            size_t vkDeviceIndex      = ~0;
+            size_t nvpwDeviceIndex    = ~0;
+            VkPhysicalDevice physical = nullptr;
+            VkDevice logical          = nullptr;
+            VkQueue queue             = nullptr;
+        };
+        VkInstance instance           = nullptr;
+        std::vector<Device> devices;
+        bool isDriverLoaded           = false;
+    };
+
+    inline const char* ToCString(VkPhysicalDeviceType deviceType)
+    {
+        switch (deviceType)
+        {
+            case VK_PHYSICAL_DEVICE_TYPE_OTHER:
+                return "Other";
+            case VK_PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU:
+                return "Integrated Gpu";
+            case VK_PHYSICAL_DEVICE_TYPE_DISCRETE_GPU:
+                return "Discrete Gpu";
+            case VK_PHYSICAL_DEVICE_TYPE_VIRTUAL_GPU:
+                return "Virtual Gpu";
+            case VK_PHYSICAL_DEVICE_TYPE_CPU:
+                return "Cpu";
+            default:
+                return "Unknown";
+        }
+    }
+
+    inline std::string GetApiVersion(const VkPhysicalDeviceProperties& properties)
+    {
+        std::stringstream ss;
+        ss << VK_VERSION_MAJOR(properties.apiVersion) << "." << VK_VERSION_MINOR(properties.apiVersion) << "." << VK_VERSION_PATCH(properties.apiVersion);
+        return ss.str();
+    }
+
+    // only works for NV device
+    inline std::string GetDriverVersion(const VkPhysicalDeviceProperties& properties)
+    {
+        std::stringstream ss;
+        ss << NV_DRIVER_VERSION_MAJOR(properties.driverVersion) << "." << NV_DRIVER_VERSION_MINOR(properties.driverVersion) << "." << NV_DRIVER_VERSION_PATCH(properties.driverVersion);
+        return ss.str();
+    }
+
+    // only works for NV device
+    inline std::string GetDriverVersion(VkPhysicalDevice physicalDevice)
+    {
+        VkPhysicalDeviceProperties properties;
+        vkGetPhysicalDeviceProperties(physicalDevice, &properties);
+        return GetDriverVersion(properties);
+    }
+
+    inline bool GetDriverVersion(const State& state, std::string& driverVersion)
+    {
+        for (const auto& device : state.devices)
+        {
+            if (VulkanIsNvidiaDevice(device.physical))
+            {
+                driverVersion = GetDriverVersion(device.physical);
+                return true;
+            }
+        }
+        return false;
+    }
+
+    inline size_t GetVRamSize(VkPhysicalDevice physicalDevice)
+    {
+        VkPhysicalDeviceMemoryProperties memoryProperties;
+        vkGetPhysicalDeviceMemoryProperties(physicalDevice, &memoryProperties);
+        for (size_t memoryHeapIndex = 0; memoryHeapIndex < memoryProperties.memoryHeapCount; ++memoryHeapIndex)
+        {
+            if (memoryProperties.memoryHeaps[memoryHeapIndex].flags & VK_MEMORY_HEAP_DEVICE_LOCAL_BIT)
+            {
+                const size_t vramSize = memoryProperties.memoryHeaps[memoryHeapIndex].size;
+                return vramSize;
+            }
+        }
+        return 0;
+    }
+
+    inline VkPhysicalDeviceIDProperties GetDeviceIdProperties(VkPhysicalDevice physicalDevice)
+    {
+        VkPhysicalDeviceIDProperties deviceIdProperties;
+        deviceIdProperties.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_ID_PROPERTIES;
+        deviceIdProperties.pNext = nullptr;
+
+        VkPhysicalDeviceProperties2 deviceProperties;
+        deviceProperties.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_PROPERTIES_2;
+        deviceProperties.pNext = &deviceIdProperties;
+        vkGetPhysicalDeviceProperties2(physicalDevice, &deviceProperties);
+        return deviceIdProperties;
+    }
+
+    inline std::string GetDeviceUUID(const VkPhysicalDeviceIDProperties& deviceIdProperties)
+    {
+        const std::string deviceUUID = IdToString<VK_UUID_SIZE>(deviceIdProperties.deviceUUID);
+        return deviceUUID;
+    }
+
+    inline std::string GetDeviceLUID(const VkPhysicalDeviceIDProperties& deviceIdProperties)
+    {
+        if (!deviceIdProperties.deviceLUIDValid)
+        {
+            return "Unknown";
+        }
+        const std::string deviceLUID = IdToString<VK_LUID_SIZE>(deviceIdProperties.deviceLUID);
+        return deviceLUID;
+    }
+
+    inline std::string GetDriverUUID(const VkPhysicalDeviceIDProperties& deviceIdProperties)
+    {
+        const std::string driverUUID = IdToString<VK_UUID_SIZE>(deviceIdProperties.driverUUID);
+        return driverUUID;
+    }
+
+    inline bool GetDeviceNodeMask(const VkPhysicalDeviceIDProperties& deviceIdProperties, uint32_t nodeMask)
+    {
+        if (!deviceIdProperties.deviceLUIDValid)
+        {
+            return false;
+        }
+        nodeMask = deviceIdProperties.deviceNodeMask;
+        return true;
+    }
+
+    inline bool GetAvailableInstanceLayerProperties(std::vector<VkLayerProperties>& properties)
+    {
+        uint32_t propertyCount;
+        VkResult vulkanStatus = vkEnumerateInstanceLayerProperties(&propertyCount, nullptr);
+        if (vulkanStatus != VK_SUCCESS)
+        {
+            NV_PERF_LOG_ERR(50, "vkEnumerateInstanceLayerProperties failed to retrieve the number of properties!\n");
+            return false;
+        }
+
+        properties.resize(propertyCount);
+        vulkanStatus = vkEnumerateInstanceLayerProperties(&propertyCount, properties.data());
+        if (vulkanStatus != VK_SUCCESS)
+        {
+            NV_PERF_LOG_ERR(50, "vkEnumerateInstanceLayerProperties failed to retrieve properties!\n");
+            return false;
+        }
+        return true;
+    }
+
+    inline uint32_t GetGraphicsOrComputeQueueFamilyIndex(VkPhysicalDevice physicalDevice)
+    {
+        uint32_t queueFamilyPropertyCount;
+        vkGetPhysicalDeviceQueueFamilyProperties(physicalDevice, &queueFamilyPropertyCount, nullptr);
+        std::vector<VkQueueFamilyProperties> queueFamilyProperties(queueFamilyPropertyCount);
+        vkGetPhysicalDeviceQueueFamilyProperties(physicalDevice, &queueFamilyPropertyCount, queueFamilyProperties.data());
+        for (uint32_t familyIndex = 0; familyIndex < queueFamilyProperties.size(); familyIndex++)
+        {
+            const VkQueueFlags queueFlags = queueFamilyProperties[familyIndex].queueFlags;
+            if ((queueFlags & VK_QUEUE_GRAPHICS_BIT) || (queueFlags & VK_QUEUE_COMPUTE_BIT))
+            {
+                return familyIndex;
+            }
+        }
+        NV_PERF_LOG_ERR(50, "Failed to find a supported queue family!\n");
+        return (uint32_t)~0;
+    }
+
+    inline bool GetRequiredDeviceExtensionSupportStatus(VkInstance instance, VkPhysicalDevice physicalDevice, std::map<const char*, bool>& requiredExtensionSupportStatus)
+    {
+        std::vector<const char*> requiredDeviceExtensionNames;
+        bool success = VulkanAppendDeviceRequiredExtensions(instance, physicalDevice, (void*)vkGetInstanceProcAddr, requiredDeviceExtensionNames);
+        if (!success)
+        {
+            NV_PERF_LOG_ERR(50, "VulkanAppendDeviceRequiredExtensions failed!\n");
+            return false;
+        }
+
+        uint32_t extCount = 0;
+        VkResult vulkanStatus = vkEnumerateDeviceExtensionProperties(physicalDevice, nullptr, &extCount, nullptr);
+        if (vulkanStatus != VK_SUCCESS)
+        {
+            NV_PERF_LOG_ERR(50, "vkEnumerateDeviceExtensionProperties failed!\n");
+            return false;
+        }
+
+        std::vector<VkExtensionProperties> supportedExtensions(extCount);
+        vulkanStatus = vkEnumerateDeviceExtensionProperties(physicalDevice, nullptr, &extCount, supportedExtensions.data());
+        if (vulkanStatus != VK_SUCCESS)
+        {
+            NV_PERF_LOG_ERR(50, "vkEnumerateDeviceExtensionProperties failed!\n");
+            return false;
+        }
+
+        for (auto required : requiredDeviceExtensionNames)
+        {
+            bool supported = false;
+            for (auto ext : supportedExtensions)
+            {
+                if (!strcmp(required, ext.extensionName))
+                {
+                    supported = true;
+                    break;
+                }
+            }
+            requiredExtensionSupportStatus[required] = supported;
+        }
+        return true;
+    }
+
+    inline bool GetRequiredInstanceExtensionSupportStatus(std::map<const char*, bool>& requiredExtensionSupportStatus)
+    {
+        std::vector<const char*> requiredInstanceExtensionNames;
+        bool success = VulkanAppendInstanceRequiredExtensions(requiredInstanceExtensionNames);
+        if (!success)
+        {
+            NV_PERF_LOG_ERR(50, "VulkanAppendInstanceRequiredExtensions failed!\n");
+            return false;
+        }
+
+        uint32_t propertyCount = 0;
+        // set `pLayerName` to nullptr to enumerate extensions from Vulkan implementation components, including
+        // loader, implicit layers and ICSs
+        VkResult vulkanStatus = vkEnumerateInstanceExtensionProperties(nullptr, &propertyCount, nullptr);
+        if (vulkanStatus != VK_SUCCESS)
+        {
+            NV_PERF_LOG_ERR(10, "Using vkEnumerateInstanceExtensionProperties to retrieve propertyCount failed!\n");
+            return false;
+        }
+
+        std::vector<VkExtensionProperties> supportedExtensions(propertyCount);
+        vulkanStatus = vkEnumerateInstanceExtensionProperties(nullptr, &propertyCount, supportedExtensions.data());
+        if (vulkanStatus != VK_SUCCESS)
+        {
+            NV_PERF_LOG_ERR(10, "Using vkEnumerateInstanceExtensionProperties to retrieve properties failed!\n");
+            return false;
+        }
+
+        for (auto required : requiredInstanceExtensionNames)
+        {
+            bool supported = false;
+            for (auto ext : supportedExtensions)
+            {
+                if (!strcmp(required, ext.extensionName))
+                {
+                    supported = true;
+                    break;
+                }
+            }
+            requiredExtensionSupportStatus[required] = supported;
+        }
+        return true;
+    }
+
+    inline NVPA_Status ProfilerSessionSupported(VkInstance instance, VkPhysicalDevice physicalDevice, VkDevice logicalDevice, VkQueue queue)
+    {
+        NVPW_VK_Profiler_CalcTraceBufferSize_Params calcTraceBufferSizeParam = { NVPW_VK_Profiler_CalcTraceBufferSize_Params_STRUCT_SIZE };
+        calcTraceBufferSizeParam.maxRangesPerPass = 1;
+        calcTraceBufferSizeParam.avgRangeNameLength = 256;
+        NVPA_Status nvpaStatus = NVPW_VK_Profiler_CalcTraceBufferSize(&calcTraceBufferSizeParam);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(10, "NVPW_VK_Profiler_CalcTraceBufferSize failed on %s\n", VulkanGetDeviceName(physicalDevice).c_str());
+            return nvpaStatus;
+        }
+
+        NVPW_VK_Profiler_Queue_BeginSession_Params beginSessionParams = { NVPW_VK_Profiler_Queue_BeginSession_Params_STRUCT_SIZE };
+        beginSessionParams.instance = instance;
+        beginSessionParams.physicalDevice = physicalDevice;
+        beginSessionParams.device = logicalDevice;
+        beginSessionParams.queue = queue;
+        beginSessionParams.pfnGetInstanceProcAddr = (void*)vkGetInstanceProcAddr;
+        beginSessionParams.pfnGetDeviceProcAddr = (void*)vkGetDeviceProcAddr;
+        beginSessionParams.numTraceBuffers = 2;
+        beginSessionParams.traceBufferSize = calcTraceBufferSizeParam.traceBufferSize;
+        beginSessionParams.maxRangesPerPass = 1;
+        beginSessionParams.maxLaunchesPerPass = 1;
+        nvpaStatus = NVPW_VK_Profiler_Queue_BeginSession(&beginSessionParams);
+        if (nvpaStatus)
+        {
+            NV_PERF_LOG_ERR(10, "NVPW_VK_Profiler_Queue_BeginSession failed on %s\n", VulkanGetDeviceName(physicalDevice).c_str());
+            return nvpaStatus;
+        }
+
+        NVPW_VK_Profiler_Queue_EndSession_Params endSessionParams = { NVPW_VK_Profiler_Queue_EndSession_Params_STRUCT_SIZE };
+        endSessionParams.queue = queue;
+        endSessionParams.timeout = 0xFFFFFFFF;
+        NVPA_Status endSessionStatus = NVPW_VK_Profiler_Queue_EndSession(&endSessionParams);
+        if (endSessionStatus)
+        {
+            NV_PERF_LOG_ERR(10, "NVPW_VK_Profiler_Queue_EndSession failed on %s\n", VulkanGetDeviceName(physicalDevice).c_str());
+        }
+        return nvpaStatus;
+    }
+
+    // if any of the VK calls fail, the function failes;
+    // but succeeding in NVPW calls is not a must, as per the purpose of this program
+    inline bool InitializeState(State& state)
+    {
+        VkResult vulkanStatus = VK_SUCCESS;
+        // instance
+        {
+            VkApplicationInfo applicationInfo = { VK_STRUCTURE_TYPE_APPLICATION_INFO };
+            applicationInfo.pApplicationName = "GpuDiag";
+            applicationInfo.applicationVersion = 1;
+            applicationInfo.apiVersion = VulkanGetInstanceApiVersion();
+            VkInstanceCreateInfo instanceCreateInfo = { VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO };
+            instanceCreateInfo.pApplicationInfo = &applicationInfo;
+
+            vulkanStatus = vkCreateInstance(&instanceCreateInfo, nullptr, &state.instance);
+            if (vulkanStatus != VK_SUCCESS)
+            {
+                NV_PERF_LOG_ERR(10, "vkCreateInstance failed!\n");
+                return false;
+            }
+        }
+
+        // physical devices
+        {
+            uint32_t numPhysicalDevices = 0;
+            vulkanStatus = vkEnumeratePhysicalDevices(state.instance, &numPhysicalDevices, nullptr);
+            if (vulkanStatus != VK_SUCCESS)
+            {
+                NV_PERF_LOG_ERR(10, "Using vkEnumeratePhysicalDevices to retrieve numDevices failed!\n");
+                return false;
+            }
+
+            state.devices.resize(numPhysicalDevices);
+            std::vector<VkPhysicalDevice> physicalDevices(numPhysicalDevices);
+            vulkanStatus = vkEnumeratePhysicalDevices(state.instance, &numPhysicalDevices, physicalDevices.data());
+            if (vulkanStatus != VK_SUCCESS)
+            {
+                NV_PERF_LOG_ERR(50, "Using vkEnumeratePhysicalDevices to retrieve VkPhysicalDevices failed!\n");
+                return false;
+            }
+            for (uint32_t deviceIndex = 0; deviceIndex < numPhysicalDevices; ++deviceIndex)
+            {
+                State::Device& device = state.devices[deviceIndex];
+                device.vkDeviceIndex = deviceIndex;
+                device.physical = physicalDevices[deviceIndex];
+            }
+        }
+
+        // logical devices
+        {
+            VkDeviceQueueCreateInfo queueInfo = { VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO };
+            float priority = 0.0;
+            queueInfo.pQueuePriorities = &priority;
+            queueInfo.queueCount = 1;
+            VkDeviceCreateInfo deviceCreateInfo = { VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO };
+            for (uint32_t deviceIndex = 0; deviceIndex < state.devices.size(); ++deviceIndex)
+            {
+                State::Device& device = state.devices[deviceIndex];
+                const uint32_t queueFamilyIndex = GetGraphicsOrComputeQueueFamilyIndex(device.physical);
+                if (queueFamilyIndex != uint32_t(~0))
+                {
+                    queueInfo.queueFamilyIndex = queueFamilyIndex;
+                    deviceCreateInfo.queueCreateInfoCount = 1;
+                    deviceCreateInfo.pQueueCreateInfos = &queueInfo;
+                }
+
+                std::vector<const char*> requiredDeviceExtensionNames;
+                bool success = VulkanAppendDeviceRequiredExtensions(state.instance, device.physical, (void*)vkGetInstanceProcAddr, requiredDeviceExtensionNames);
+                if (!success)
+                {
+                    NV_PERF_LOG_ERR(50, "VulkanAppendDeviceRequiredExtensions failed!\n");
+                }
+                deviceCreateInfo.enabledExtensionCount = requiredDeviceExtensionNames.size();
+                deviceCreateInfo.ppEnabledExtensionNames = requiredDeviceExtensionNames.data();
+                vulkanStatus = vkCreateDevice(device.physical, &deviceCreateInfo, nullptr, &device.logical);
+                if (vulkanStatus != VK_SUCCESS)
+                {
+                    NV_PERF_LOG_ERR(50, "vkCreateDevice failed for device index %u with profiler required extensions enabled!\n", deviceIndex);
+                    // try to create a device without any required extensions enabled
+                    deviceCreateInfo.enabledExtensionCount = 0;
+                    deviceCreateInfo.ppEnabledExtensionNames = nullptr;
+                    vulkanStatus = vkCreateDevice(device.physical, &deviceCreateInfo, nullptr, &device.logical);
+                    if (vulkanStatus != VK_SUCCESS)
+                    {
+                        NV_PERF_LOG_ERR(50, "vkCreateDevice failed for device index %u without any profiler required extensions enabled!\n", deviceIndex);
+                        return false;
+                    }
+                }
+
+                if (queueFamilyIndex != uint32_t(~0))
+                {
+                    vkGetDeviceQueue(device.logical, queueFamilyIndex, 0, &device.queue);
+                }
+            }
+        }
+
+        // profiler-specific
+        {
+            bool nvperfStatus = true;
+            nvperfStatus = VulkanLoadDriver(state.instance);
+            if (!nvperfStatus)
+            {
+                NV_PERF_LOG_ERR(10, "VulkanLoadDriver failed!\n");
+            }
+            else
+            {
+                state.isDriverLoaded = true;
+                for (uint32_t deviceIndex = 0; deviceIndex < state.devices.size(); ++deviceIndex)
+                {
+                    State::Device& device = state.devices[deviceIndex];
+                    if (VulkanIsNvidiaDevice(device.physical))
+                    {
+                        device.nvpwDeviceIndex = VulkanGetNvperfDeviceIndex(state.instance, device.physical, device.logical);
+                        if (device.nvpwDeviceIndex == ~size_t(0))
+                        {
+                            NV_PERF_LOG_ERR(50, "VulkanGetNvperfDeviceIndex failed for device index %u!\n", deviceIndex);
+                        }
+                    }
+                }
+            }
+        }
+
+        return true;
+    }
+
+    inline void AppendInstanceState(ordered_json& node)
+    {
+        // layers
+        {
+            node["AvailableInstanceLayers"] = ordered_json::array(); // we only have instance layers, "device layers" are now deprecated
+            std::vector<VkLayerProperties> properties;
+            bool success = GetAvailableInstanceLayerProperties(properties);
+            if (success)
+            {
+                auto& layers = node["AvailableInstanceLayers"];
+                for (const VkLayerProperties& vkLayerProperty : properties)
+                {
+                    auto property = ordered_json();
+                    property["Name"] = vkLayerProperty.layerName;
+                    property["Description"] = vkLayerProperty.description;
+                    property["SpecVersion"] = vkLayerProperty.specVersion;
+                    property["ImplementationVersion"] = vkLayerProperty.implementationVersion;
+                    layers.emplace_back(std::move(property));
+                }
+            }
+        }
+        // TODO: any easy way to retrieve the list of "implicit layers" enabled?
+        // profiler required instance extensions
+        {
+            node["ProfilerRequiredInstanceExtensionsSupported"] = nullptr;
+            std::map<const char*, bool> requiredInstanceExtensionSupportStatus;
+            if (GetRequiredInstanceExtensionSupportStatus(requiredInstanceExtensionSupportStatus))
+            {
+                node["ProfilerRequiredInstanceExtensionsSupported"] = requiredInstanceExtensionSupportStatus;
+            }
+        }
+    }
+
+    inline void AppendDeviceState(const State& state, size_t deviceIndex, ordered_json& node)
+    {
+        const State::Device& currentDevice = state.devices[deviceIndex];
+        VkPhysicalDevice physicalDevice = currentDevice.physical;
+        VkDevice logicalDevice = currentDevice.logical;
+        node["VKDeviceIndex"] = deviceIndex;
+        bool isNvidiaDevice = false;
+
+        // physical properties
+        {
+            VkPhysicalDeviceProperties properties;
+            vkGetPhysicalDeviceProperties(physicalDevice, &properties);
+            node["Name"] = properties.deviceName;
+            node["Type"] = ToCString(properties.deviceType);
+            node["VendorId"] = properties.vendorID;
+            isNvidiaDevice = VulkanIsNvidiaDevice(physicalDevice);
+            node["DeviceId"] = properties.deviceID;
+            node["ApiVersion"] = GetApiVersion(properties);
+        }
+
+        // device id properties
+        {
+            const VkPhysicalDeviceIDProperties idProperties = GetDeviceIdProperties(physicalDevice);
+            node["DeviceUUID"] = GetDeviceUUID(idProperties);
+            node["DeviceLUID"] = GetDeviceLUID(idProperties);
+            node["DeviceNodeMask"] = nullptr;
+            uint32_t nodeMask = 0;
+            if (GetDeviceNodeMask(idProperties, nodeMask))
+            {
+                node["DeviceNodeMask"] = nodeMask;
+            }
+        }
+
+        // NV-specific properties
+        node["IsNvidiaDevice"] = isNvidiaDevice;
+        node["ProfilerDeviceIndex"] = nullptr;
+        node["ProfilerIsGpuSupported"]["IsSupported"] = false;
+        node["ProfilerIsGpuSupported"]["GpuArchitectureSupported"] = nullptr;
+        node["ProfilerIsGpuSupported"]["SliSupportLevel"] = nullptr;
+        node["ProfilerIsGpuSupported"]["Advice"] = "Unrecognized device";
+        node["ProfilerIsSessionSupported"]["IsSupported"] = false;
+        node["ProfilerIsSessionSupported"]["Advice"] = "Unsupported Gpu";
+        node["ProfilerRequiredDeviceExtensionsSupported"] = nullptr;
+        auto success = [&]() {
+            if (!isNvidiaDevice || currentDevice.nvpwDeviceIndex == ~0)
+            {
+                return false;
+            }
+            node["ProfilerDeviceIndex"] = currentDevice.nvpwDeviceIndex;
+
+            NVPW_VK_Profiler_IsGpuSupported_Params isGpuSupportedParams = { NVPW_VK_Profiler_IsGpuSupported_Params_STRUCT_SIZE };
+            isGpuSupportedParams.deviceIndex = currentDevice.nvpwDeviceIndex;
+            NVPA_Status nvpaStatus = NVPW_VK_Profiler_IsGpuSupported(&isGpuSupportedParams);
+            if (nvpaStatus)
+            {
+                NV_PERF_LOG_ERR(10, "NVPW_VK_Profiler_IsGpuSupported failed on %s\n", VulkanGetDeviceName(physicalDevice).c_str());
+                return false;
+            }
+            node["ProfilerIsGpuSupported"]["GpuArchitectureSupported"] = true;
+            node["ProfilerIsGpuSupported"]["SliSupportLevel"] = true;
+            if (!isGpuSupportedParams.isSupported)
+            {
+                std::string unsupportedReason = "";
+                if (isGpuSupportedParams.gpuArchitectureSupportLevel != NVPW_GPU_ARCHITECTURE_SUPPORT_LEVEL_SUPPORTED)
+                {
+                    node["ProfilerIsGpuSupported"]["GpuArchitectureSupported"] = false;
+                    unsupportedReason += "Unsupported GPU architecture;";
+                }
+                if (isGpuSupportedParams.sliSupportLevel == NVPW_SLI_SUPPORT_LEVEL_UNSUPPORTED)
+                {
+                    node["ProfilerIsGpuSupported"]["SliSupportLevel"] = false;
+                    unsupportedReason += "Devices in SLI configuration are not supported;";
+                }
+                node["ProfilerIsGpuSupported"]["Advice"] = unsupportedReason;
+                return false;
+            }
+            node["ProfilerIsGpuSupported"]["IsSupported"] = true;
+            node["ProfilerIsGpuSupported"]["Advice"] = "";
+
+            // profiler required device extensions
+            std::map<const char*, bool> requiredDeviceExtensionSupportStatus;
+            if (GetRequiredDeviceExtensionSupportStatus(state.instance, physicalDevice, requiredDeviceExtensionSupportStatus))
+            {
+                node["ProfilerRequiredDeviceExtensionsSupported"] = requiredDeviceExtensionSupportStatus;
+            }
+
+            // test if we can start a profiler session on this device
+            nvpaStatus = ProfilerSessionSupported(state.instance, physicalDevice, logicalDevice, currentDevice.queue);
+            if (nvpaStatus)
+            {
+                NV_PERF_LOG_ERR(10, "ProfilerSessionSupported failed on %s\n", VulkanGetDeviceName(physicalDevice).c_str());
+                std::string unsupportedReason;
+                if (nvpaStatus == NVPA_STATUS_INSUFFICIENT_PRIVILEGE)
+                {
+                    unsupportedReason = "Profiling permissions not enabled. Please follow these instructions: https://developer.nvidia.com/nvidia-development-tools-solutions-ERR_NVGPUCTRPERM-permission-issue-performance-counters";
+                }
+                else if (nvpaStatus == NVPA_STATUS_INSUFFICIENT_DRIVER_VERSION)
+                {
+                    unsupportedReason = "Insufficient driver version. Please install the latest NVIDIA driver from https://www.nvidia.com";
+                }
+                else
+                {
+                    unsupportedReason = "Unknown error";
+                }
+                node["ProfilerIsSessionSupported"]["Advice"] = unsupportedReason;
+                return false;
+            }
+            node["ProfilerIsSessionSupported"]["IsSupported"] = true;
+            node["ProfilerIsSessionSupported"]["Advice"] = "";
+            return true;
+        }();
+    }
+
+    inline void AppendState(const State& state, ordered_json& node)
+    {
+        // append instance state
+        AppendInstanceState(node);
+
+        // append other global info
+        node["ProfilerDriverLoaded"] = state.isDriverLoaded;
+
+        // append per-device state
+        auto deviceArray = ordered_json::array();
+        for (size_t deviceIndex = 0; deviceIndex < state.devices.size(); ++deviceIndex)
+        {
+            auto device = ordered_json();
+            AppendDeviceState(state, deviceIndex, device);
+            deviceArray.emplace_back(std::move(device));
+        }
+        node["Devices"] = deviceArray;
+    }
+
+    inline void CleanupState(State& state)
+    {
+        for (State::Device& device : state.devices)
+        {
+            vkDestroyDevice(device.logical, nullptr);
+        }
+        vkDestroyInstance(state.instance, nullptr);
+        state = State();
+    }
+
+}}}} // nv::perf::tool::vk
--- a/ruins64k/tools/NvPerfUtility/tools/GpuDiag/GpuDiagHtmlTemplate.cpp
+++ b/ruins64k/tools/NvPerfUtility/tools/GpuDiag/GpuDiagHtmlTemplate.cpp
@@ -0,0 +1,144 @@
+/*
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+#include <string>
+
+namespace nv { namespace perf { namespace tool {
+
+    extern const std::string HtmlTemplate = R"(
+<html>
+  <meta charset="utf-8"/>
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+
+  <head>
+    <title>GpuDiagnostics</title>
+    <style id="ReportStyle">
+      .titlearea {
+        display: flex;
+        align-items: center;
+        color: white;
+        font-family: verdana;
+      }
+
+      .titlebar {
+        margin-left: 0;
+        margin-right: auto;
+      }
+
+      .title {
+        font-size: 28px;
+        margin-left: 10px;
+      }
+
+      .section {
+        border-radius: 15px;
+        padding: 10px;
+        background: #FFFFFF;
+        margin: 10px;
+      }
+
+      .section_title {
+        font-family: verdana;
+        font-weight: bold;
+        color: black;
+      }
+
+      summary {
+        padding: 2px 6px;
+        background-color: #fff;
+        border-radius: 15px;
+        box-shadow: 1px 1px 2px black;
+        cursor: pointer;
+      }
+
+      details > summary:only-child::-webkit-details-marker {
+        display: none;
+      }
+
+      details > details {
+        margin-left: 22px;
+      }
+
+      .value {
+        color: #228b22;
+        text-align: right;
+      }
+    </style>
+
+    <script type="text/JavaScript">
+      function appendNodeRecursively(key, obj, domNode) {
+        let summary = document.createElement('summary');
+        summary.innerText = key;
+        // exclude dummy root
+        if (key !== "") {
+          domNode.appendChild(summary);
+        }
+
+        // if it's a leaf node
+        if (typeof(obj) != 'object') {
+          if (obj != null) {
+            let span = document.createElement('span'); // wrap the value in a span so we can customize its style
+            span.className = 'value';
+            span.innerText = obj.toString();
+            summary.innerText = summary.innerText + ': ';
+            summary.appendChild(span);
+          }
+          return;
+        }
+
+        // for non-leaf nodes
+        for (var child in obj) {
+          let childNode = document.createElement('details');
+          childNode.open = true;
+          appendNodeRecursively(child, obj[child], childNode);
+          domNode.appendChild(childNode);
+        }
+      }
+
+      function onBodyLoaded() {
+        let main = document.getElementById('main');
+        appendNodeRecursively('', g_json, main);
+      }
+    </script>
+  </head>
+
+
+  <body onload="onBodyLoaded() " style="background-color:#202020;">
+    <noscript>
+      <p>Enable javascript to see report contents</span>
+    </noscript>
+
+    <div>
+      <div class="titlearea">
+        <div class="titlebar">
+          <img src="https://developer.nvidia.com/sites/all/themes/devzone_new/nvidia_logo.png"/>
+          <span class="title" id="titlebar_text">Nsight Perf SDK GPU Diagnostics Report</span>
+        </div>
+      </div>
+    </div>
+
+    <div class="section", id="main">
+    </div>
+
+    <script>
+      g_json = /***JSON_DATA_HERE***/;
+    </script>
+  </body>
+
+</html>
+)";
+
+}}} // nv::perf::tool
--- a/ruins64k/tools/NvPerfUtility/tools/GpuDiag/GpuDiagOS_Linux.h
+++ b/ruins64k/tools/NvPerfUtility/tools/GpuDiag/GpuDiagOS_Linux.h
@@ -0,0 +1,198 @@
+/*
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+#pragma once
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <cstdint>
+#include <cstring>
+#include <sstream>
+#include <iomanip>
+#include <vector>
+
+#include <sys/utsname.h>
+#include <errno.h>
+
+#include <json/json.hpp>
+
+#include "GpuDiagCommon.h"
+
+namespace nv { namespace perf { namespace tool { namespace linux_ {
+
+    using namespace nv::perf::tool;
+    using namespace nlohmann;
+
+    struct State
+    {
+    };
+
+    // a simple RAII wrapper
+    class Pipe
+    {
+    private:
+        FILE* m_pPipe;
+        std::string m_cmd; // DFD
+    public:
+        Pipe()
+            : m_pPipe()
+        {
+        }
+        Pipe(FILE* pPipe, const std::string& cmd)
+            : m_pPipe(pPipe)
+            , m_cmd(cmd)
+        {
+        }
+        Pipe(Pipe&& pipe)
+            : m_pPipe(pipe.m_pPipe)
+            , m_cmd(pipe.m_cmd)
+        {
+        }
+        Pipe& operator=(Pipe&& rhs)
+        {
+            m_pPipe = rhs.m_pPipe;
+            m_cmd = rhs.m_cmd;
+            rhs.m_pPipe = nullptr;
+            rhs.m_cmd = std::string();
+            return *this;
+        }
+        ~Pipe()
+        {
+            if (m_pPipe)
+            {
+                int ret = pclose(m_pPipe);
+                if (ret)
+                {
+                    NV_PERF_LOG_ERR(50, "Failed pclose for cmd: %s\nError: %s\n", m_cmd.c_str(), strerror(errno));
+                }
+            }
+        }
+    private:
+        Pipe(const Pipe& pipe);
+        Pipe& operator=(const Pipe& rhs);
+    };
+
+    bool ReadFromCmd(const char* pCmd, std::string& output)
+    {
+        FILE* pPipe = popen(pCmd, "r");
+        if (!pPipe)
+        {
+            NV_PERF_LOG_ERR(50, "Failed popen for cmd: %s\nError: %s\n", pCmd, strerror(errno));
+            return false;
+        }
+        Pipe pipe(pPipe, pCmd);
+
+        std::stringstream ss;
+        const size_t BUFFER_SIZE = 4096;
+        char buffer[BUFFER_SIZE] = {};
+        while (true)
+        {
+            std::fgets(buffer, BUFFER_SIZE, pPipe);
+            const size_t length = strlen(buffer);
+            if (length)
+            {
+                if (buffer[length - 1] == '\n')
+                {
+                    buffer[length - 1] = '\0';
+                }
+                ss << buffer;
+                buffer[0] = '\0';
+            }
+            if (feof(pPipe))
+            {
+                output = ss.str();
+                return true;
+            }
+            if (ferror(pPipe))
+            {
+                NV_PERF_LOG_ERR(50, "Error detected for cmd: %s\nError: %s\n", pCmd, strerror(errno));
+                return false;
+            }
+        }
+        return true;
+    }
+
+    inline std::string ReadFromCmd(const char* pCmd)
+    {
+        std::string output;
+        bool success = ReadFromCmd(pCmd, output);
+        if (!success)
+        {
+            NV_PERF_LOG_ERR(50, "Failed ReadFromCmd for cmd\n", pCmd);
+            return "Unknown";
+        }
+        return output;
+    }
+
+    inline std::string GetOSNameFromUName()
+    {
+        utsname name;
+        int ret = uname(&name);
+        if (ret)
+        {
+            NV_PERF_LOG_ERR(10, "Failed uname: %s\n", strerror(errno));
+            return "Unknown";
+        }
+
+        std::stringstream ss;
+        ss << name.sysname << "(" << name.machine << ") " << name.release;
+        // TODO: if it's Ubuntu, we can further convert the kernel versions to Ubuntu versions:
+        // https://askubuntu.com/questions/517136/list-of-ubuntu-versions-with-corresponding-linux-kernel-version/517140#517140
+        return ss.str();
+    }
+
+    inline std::string GetOSName(const State& state)
+    {
+        std::string osVerStr;
+        bool success = ReadFromCmd("lsb_release -ds", osVerStr);
+        if (success)
+        {
+            return osVerStr;
+        }
+        NV_PERF_LOG_ERR(10, "Reading os version from lsb_release failed, trying reading from uname\n");
+        return GetOSNameFromUName();
+    }
+
+    inline std::string GetPhysicalMemorySize()
+    {
+        std::string sizeStr;
+        bool success  = ReadFromCmd("awk '/MemTotal/ { print $2 }' /proc/meminfo", sizeStr);
+        if (success)
+        {
+            return sizeStr + " kB";
+        }
+        return "Unknown";
+    }
+
+    inline bool InitializeState(State& state)
+    {
+        return true;
+    }
+
+    inline void AppendState(const State& state, ordered_json& node)
+    {
+        node["OS"] = GetOSName(state);
+        node["Processor"] = ReadFromCmd("cat /proc/cpuinfo | grep \"model name\" | cut -d \":\" -f2 | head -1");
+        node["NumberOfProcessors"] = ReadFromCmd("nproc --all"); // both "nproc" & "grep -c ^processor /proc/cpuinfo" may not work with hyperthreading
+        node["PhysicalMemory"] = GetPhysicalMemorySize();
+    }
+
+    inline void CleanupState(State& state)
+    {
+        state = State();
+    }
+
+}}}} // nv::perf::tool::linux_
--- a/ruins64k/tools/NvPerfUtility/tools/GpuDiag/GpuDiagOS_Windows.h
+++ b/ruins64k/tools/NvPerfUtility/tools/GpuDiag/GpuDiagOS_Windows.h
@@ -0,0 +1,281 @@
+/*
+* Copyright 2014-2021 NVIDIA Corporation.  All rights reserved.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+#pragma once
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <cstdint>
+#include <cstring>
+#include <sstream>
+#include <iomanip>
+#ifndef WIN32_LEAN_AND_MEAN
+#define WIN32_LEAN_AND_MEAN             // Exclude rarely-used stuff from Windows headers.
+#endif
+#include <windows.h>
+#include <winternl.h>
+
+#include <json/json.hpp>
+
+#include "GpuDiagCommon.h"
+
+namespace nv { namespace perf { namespace tool { namespace windows {
+
+    using namespace nv::perf::tool;
+    using namespace nlohmann;
+
+    enum class WinVersion
+    {
+        Unrecognized,
+        Win7_64bit,
+        Win7_32bit,
+        Win8_64bit,
+        Win8_Arm_32bit,
+        Win8_32bit,
+        Win81_64bit,
+        Win81_Arm_32bit,
+        Win81_32bit,
+        Win10_64bit,
+        Win10_Arm_32bit,
+        Win10_32bit,
+    };
+
+    struct State
+    {
+        SYSTEM_INFO sysInfo;
+
+        OSVERSIONINFOEXW osInfo;
+        bool isOsInfoValid = false;
+
+        MEMORYSTATUSEX memoryInfo;
+        bool isMemoryInfoValid = false;
+
+        std::string cpuName = "Unknown";
+    };
+
+    inline const char* ToCString(WinVersion winVer)
+    {
+        switch (winVer)
+        {
+            case WinVersion::Win7_64bit:        return "Windows 7 (64 bit)";
+            case WinVersion::Win7_32bit:        return "Windows 7 (32 bit)";
+            case WinVersion::Win8_64bit:        return "Windows 8 (64 bit)";
+            case WinVersion::Win8_Arm_32bit:    return "Windows 8 (Arm 32 bit)";
+            case WinVersion::Win8_32bit:        return "Windows 8 (32 bit)";
+            case WinVersion::Win81_64bit:       return "Windows 8.1 (64 bit)";
+            case WinVersion::Win81_Arm_32bit:   return "Windows 8.1 (Arm 32 bit)";
+            case WinVersion::Win81_32bit:       return "Windows 8.1 (32 bit)";
+            case WinVersion::Win10_64bit:       return "Windows 10 (64 bit)";
+            case WinVersion::Win10_Arm_32bit:   return "Windows 10 (Arm 32 bit)";
+            case WinVersion::Win10_32bit:       return "Windows 10 (32 bit)";
+            default:                            return "Unrecognized";
+        }
+    }
+
+    inline const char* GetProcessorArchitecture(const SYSTEM_INFO& sysInfo)
+    {
+        switch(sysInfo.wProcessorArchitecture)
+        {
+            case PROCESSOR_ARCHITECTURE_UNKNOWN:        return "Unknown";
+            case PROCESSOR_ARCHITECTURE_INTEL:          return "Intel";
+            case PROCESSOR_ARCHITECTURE_MIPS:           return "Mips";
+            case PROCESSOR_ARCHITECTURE_ALPHA:          return "Alpha";
+            case PROCESSOR_ARCHITECTURE_ALPHA64:        return "Alpha64";
+            case PROCESSOR_ARCHITECTURE_PPC:            return "PowerPC";
+            case PROCESSOR_ARCHITECTURE_SHX:            return "SHX";
+            case PROCESSOR_ARCHITECTURE_ARM:            return "ARM";
+            case PROCESSOR_ARCHITECTURE_IA64:           return "IA64";
+            case PROCESSOR_ARCHITECTURE_IA32_ON_WIN64:  return "IA32 on WIN64";
+            case PROCESSOR_ARCHITECTURE_AMD64:          return "AMD64";
+            case PROCESSOR_ARCHITECTURE_MSIL:           return "MSIL";
+            default:                                    return "Unrecognized";
+        }
+    }
+
+    inline WinVersion GetOsVersion(const OSVERSIONINFOEXW& osInfo, const SYSTEM_INFO& sysInfo)
+    {
+        const uint32_t osMajorVersion = osInfo.dwMajorVersion;
+        const uint32_t osMinorVersion = osInfo.dwMinorVersion;
+        // http://msdn.microsoft.com/en-us/library/windows/desktop/ms724832%28v=vs.85%29.aspx
+        if (osMajorVersion == 6)
+        {
+            switch (osMinorVersion)
+            {
+            case 1:
+                if (sysInfo.wProcessorArchitecture == PROCESSOR_ARCHITECTURE_AMD64)
+                {
+                    return WinVersion::Win7_64bit;
+                }
+                else
+                {
+                    return WinVersion::Win7_32bit;
+                }
+                break;
+            case 2:
+                if (sysInfo.wProcessorArchitecture == PROCESSOR_ARCHITECTURE_AMD64)
+                {
+                    return WinVersion::Win8_64bit;
+                }
+                else if (sysInfo.wProcessorArchitecture == PROCESSOR_ARCHITECTURE_ARM)
+                {
+                    return WinVersion::Win8_Arm_32bit;
+                }
+                else
+                {
+                    return WinVersion::Win8_32bit;
+                }
+                break;
+            case 3:
+                if (sysInfo.wProcessorArchitecture == PROCESSOR_ARCHITECTURE_AMD64)
+                {
+                    return WinVersion::Win81_64bit;
+                }
+                else if (sysInfo.wProcessorArchitecture == PROCESSOR_ARCHITECTURE_ARM)
+                {
+                    return WinVersion::Win81_Arm_32bit;
+                }
+                else
+                {
+                    return WinVersion::Win81_32bit;
+                }
+                break;
+            default:
+                break;
+            }
+        }
+        else if (osMajorVersion == 10)
+        {
+            if (osMinorVersion == 0)
+            {
+                if (sysInfo.wProcessorArchitecture == PROCESSOR_ARCHITECTURE_AMD64)
+                {
+                    return WinVersion::Win10_64bit;
+                }
+                else if (sysInfo.wProcessorArchitecture == PROCESSOR_ARCHITECTURE_ARM)
+                {
+                    return WinVersion::Win10_Arm_32bit;
+                }
+                else
+                {
+                    return WinVersion::Win10_32bit;
+                }
+            }
+        }
+        NV_PERF_LOG_ERR(50, "Unrecognized OS version. Major = %u, Minor = %u, ProcessorArchitecture = %u\n", osMajorVersion, osMinorVersion, sysInfo.wProcessorArchitecture);
+        return WinVersion::Unrecognized;
+    }
+
+    inline std::string GetOSString(const OSVERSIONINFOEXW& osInfo, const SYSTEM_INFO& sysInfo)
+    {
+        std::stringstream ss;
+        ss << ToCString(GetOsVersion(osInfo, sysInfo));
+        ss << " Build " << osInfo.dwBuildNumber;
+        return ss.str();
+    }
+
+    inline bool InitializeState(State& state)
+    {
+        state = State();
+
+        // OS version info
+        {
+            // GetVersion/GetVersionEx are deprecated starting in Windows8.1
+            // VerifyVersionInfo() or IsWindows10OrGreater() doesn't work either without a properly versioned manifest file
+            // https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-verifyversioninfoa
+            // https://stackoverflow.com/questions/32115255/c-how-to-detect-windows-10
+            NTSTATUS(WINAPI * RtlGetVersion)(LPOSVERSIONINFOEXW);
+            *(FARPROC*)&RtlGetVersion = GetProcAddress(GetModuleHandleA("ntdll"), "RtlGetVersion");
+            if (!RtlGetVersion)
+            {
+                NV_PERF_LOG_ERR(10, "Unable to get RtlGetVersion's address\n");
+            }
+            else
+            {
+                state.osInfo.dwOSVersionInfoSize = sizeof(state.osInfo);
+                if (!NT_SUCCESS(RtlGetVersion(&state.osInfo)))
+                {
+                    NV_PERF_LOG_ERR(10, "RtlGetVersion failed\n");
+                }
+                else
+                {
+                    state.isOsInfoValid = true;
+                }
+            }
+        }
+
+        // system info
+        GetSystemInfo(&state.sysInfo);
+
+        // memory info
+        {
+            state.memoryInfo.dwLength = sizeof(state.memoryInfo);
+            if (!GlobalMemoryStatusEx(&state.memoryInfo))
+            {
+                NV_PERF_LOG_ERR(10, "GlobalMemoryStatusEx failed\n");
+            }
+            else
+            {
+                state.isMemoryInfoValid = true;
+            }
+        }
+
+        // CPU name
+        {
+            // https://docs.microsoft.com/en-us/cpp/intrinsics/cpuid-cpuidex?view=msvc-160
+            // Some processors support Extended Function CPUID information. When it's supported, function_id values from 0x80000000 might be used to return information.
+            // Calling __cpuid with 0x80000000 as the function_id argument gets the number of the highest valid extended ID.
+            int cpuInfo[4] = {};
+            __cpuid(cpuInfo, 0x80000000);
+            const int brandStrBeginPos = 0x80000002;
+            const int highestValidExtendedId = cpuInfo[0];
+            const int brandStrEndPos = highestValidExtendedId > 0x80000005 ? 0x80000005 : highestValidExtendedId;
+            char brandStr[0x40] = {};
+            size_t offset = 0;
+            for (int ii = brandStrBeginPos; ii <= brandStrEndPos; ++ii)
+            {
+                __cpuid(cpuInfo, ii);
+                memcpy(brandStr + offset, cpuInfo, sizeof(cpuInfo)); // re-interpreted as a list of chars
+                offset += sizeof(cpuInfo);
+            }
+            state.cpuName = brandStr;
+        }
+
+        return true;
+    }
+
+    inline void AppendState(const State& state, ordered_json& node)
+    {
+        node["OS"] = nullptr;
+        if (state.isOsInfoValid)
+        {
+            node["OS"] = GetOSString(state.osInfo, state.sysInfo);
+        }
+        node["Processor"] = state.cpuName;
+        node["ProcessorArchitecture"] = GetProcessorArchitecture(state.sysInfo);
+        node["NumberOfProcessors"] = state.sysInfo.dwNumberOfProcessors;
+        node["PhysicalMemory"] = nullptr;
+        if (state.isMemoryInfoValid)
+        {
+            node["PhysicalMemory"] = SizeToString(state.memoryInfo.ullTotalPhys);
+        }
+    }
+
+    inline void CleanupState(State& state)
+    {
+        state = State();
+    }
+
+}}}} // nv::perf::tool::windows
--- a/ruins64k/tools/NvPerfUtility/tools/GpuDiag/GpuDiagSchema.json
+++ b/ruins64k/tools/NvPerfUtility/tools/GpuDiag/GpuDiagSchema.json
@@ -0,0 +1,320 @@
+{
+  "$schema": "https://json-schema.org/draft/2020-12/schema",
+  "title": "GpuDiag",
+  "description": "Gpu And Environment Info Collection Tool",
+  "type": "object",
+  "properties": {
+    "Windows": {
+      "description": "Windows System Info",
+      "type": "object",
+      "properties": {
+        "OS": {
+          "type": [ "string", "null"]
+        },
+        "Processor": {
+          "type": "string"
+        },
+        "ProcessorArchitecture": {
+          "type": "string"
+        },
+        "NumberOfProcessors": {
+          "type": "number",
+          "minimum": 1
+        },
+        "PhysicalMemory": {
+          "type": [ "string", "null"]
+        }
+      },
+      "required": [ "OS", "Processor", "ProcessorArchitecture", "NumberOfProcessors", "PhysicalMemory" ],
+      "additionalProperties": false
+    },
+    "Linux": {
+      "description": "Linux System Info",
+      "type": "object",
+      "properties": {
+        "OS": {
+          "type": "string"
+        },
+        "Processor": {
+          "type": "string"
+        },
+        "NumberOfProcessors": {
+          "type": "string"
+        },
+        "PhysicalMemory": {
+          "type": "string"
+        }
+      },
+      "required": [ "OS", "Processor", "NumberOfProcessors", "PhysicalMemory" ],
+      "additionalProperties": false
+    },
+    "Global": {
+      "description": "Global Info",
+      "type": "object",
+      "properties": {
+        "GraphicsDriverVersion": {
+          "type": "string"
+        },
+        "GPUs": {
+          "type": "array",
+          "items": {
+            "type": "object",
+            "properties": {
+              "ProfilerDeviceIndex": {
+                "type": "number"
+              },
+              "DeviceName": {
+                "type": "string"
+              },
+              "ChipName": {
+                "type": "string"
+              },
+              "VideoMemorySize": {
+                "type": [ "string", "null"]
+              },
+              "ClockStatus": {
+                "type": [ "string", "null" ]
+              }
+            },
+            "required": [ "ProfilerDeviceIndex", "DeviceName", "ChipName", "VideoMemorySize", "ClockStatus" ],
+            "additionalProperties": false
+          }
+        }
+      },
+      "required": [ "GraphicsDriverVersion", "GPUs" ],
+      "additionalProperties": false
+    },
+    "Vulkan": {
+      "description": "Vulkan Info",
+      "type": "object",
+      "properties": {
+        "AvailableInstanceLayers": {
+          "type": "array",
+          "items": {
+            "type": "object",
+            "properties": {
+              "Name": {
+                "type": "string"
+              },
+              "Description": {
+                "type": "string"
+              },
+              "SpecVersion": {
+                "type": "number"
+              },
+              "ImplementationVersion": {
+                "type": "number"
+              }
+            },
+            "required": [ "Name", "Description", "SpecVersion", "ImplementationVersion" ],
+            "additionalProperties": false
+          }
+        },
+        "ProfilerRequiredInstanceExtensionsSupported": {
+          "type": [ "object", "null" ]
+        },
+        "ProfilerDriverLoaded": {
+          "type": "boolean"
+        },
+        "Devices": {
+          "type": "array",
+          "items": {
+            "type": "object",
+            "properties": {
+              "VKDeviceIndex": {
+                "type": "number"
+              },
+              "Name": {
+                "type": "string"
+              },
+              "Type": {
+                "type": "string"
+              },
+              "VendorId": {
+                "type": "number"
+              },
+              "DeviceId": {
+                "type": "number"
+              },
+              "ApiVersion": {
+                "type": "string"
+              },
+              "DeviceUUID": {
+                "type": "string"
+              },
+              "DeviceLUID": {
+                "type": "string"
+              },
+              "DeviceNodeMask": {
+                "type": [ "number", "null" ]
+              },
+              "IsNvidiaDevice": {
+                "type": "boolean"
+              },
+              "ProfilerIsGpuSupported": {
+                "type": "object",
+                "properties": {
+                  "IsSupported": {
+                    "type": "boolean"
+                  },
+                  "GpuArchitectureSupported": {
+                    "type": [ "boolean", "null"]
+                  },
+                  "SliSupportLevel": {
+                    "type": [ "boolean", "null"]
+                  },
+                  "Advice": {
+                    "type": "string"
+                  }
+                },
+                "required": [ "IsSupported", "GpuArchitectureSupported", "SliSupportLevel", "Advice" ],
+                "additionalProperties": false
+              },
+              "ProfilerIsSessionSupported": {
+                "type": "object",
+                "properties": {
+                  "IsSupported": {
+                    "type": "boolean"
+                  },
+                  "Advice": {
+                    "type": "string"
+                  }
+                },
+                "required": [ "IsSupported", "Advice" ],
+                "additionalProperties": false
+              },
+              "ProfilerDeviceIndex": {
+                "type": [ "number", "null" ]
+              },
+              "ProfilerRequiredDeviceExtensionsSupported": {
+                "type": [ "object", "null" ]
+              }
+            },
+            "required": [ "VKDeviceIndex", "Name", "Type", "VendorId", "DeviceId", "ApiVersion", "DeviceUUID", "DeviceLUID", "DeviceNodeMask", "IsNvidiaDevice", "ProfilerIsGpuSupported", "ProfilerIsSessionSupported", "ProfilerDeviceIndex", "ProfilerRequiredDeviceExtensionsSupported" ],
+            "additionalProperties": false
+          }
+        }
+      },
+      "required": [ "AvailableInstanceLayers", "ProfilerRequiredInstanceExtensionsSupported", "ProfilerDriverLoaded", "Devices" ],
+      "additionalProperties": false
+    },
+    "D3D": {
+      "description": "D3D Info",
+      "type": "object",
+      "properties": {
+        "ProfilerDriverLoaded": {
+          "type": "boolean"
+        },
+        "Devices": {
+          "type": "array",
+          "items": {
+            "type": "object",
+            "properties": {
+              "DXGIAdapterIndex": {
+                "type": "number"
+              },
+              "Name": {
+                "type": "string"
+              },
+              "VendorId": {
+                "type": "number"
+              },
+              "DeviceId": {
+                "type": "number"
+              },
+              "DedicatedVideoMemory": {
+                "type": "string"
+              },
+              "DedicatedSystemMemory": {
+                "type": "string"
+              },
+              "SharedSystemMemory": {
+                "type": "string"
+              },
+              "DeviceLUID": {
+                "type": "string"
+              },
+              "IsDebugLayerForcedOn": {
+                "type": "boolean"
+              },
+              "IsNvidiaDevice": {
+                "type": "boolean"
+              },
+              "ProfilerIsGpuSupported": {
+                "type": "object",
+                "properties": {
+                  "IsSupported": {
+                    "type": "boolean"
+                  },
+                  "GpuArchitectureSupported": {
+                    "type": [ "boolean", "null"]
+                  },
+                  "SliSupportLevel": {
+                    "type": [ "boolean", "null"]
+                  },
+                  "Advice": {
+                    "type": "string"
+                  }
+                },
+                "required": [ "IsSupported", "GpuArchitectureSupported", "SliSupportLevel", "Advice" ],
+                "additionalProperties": false
+              },
+              "ProfilerIsSessionSupported": {
+                "type": "object",
+                "properties": {
+                  "IsSupported": {
+                    "type": "boolean"
+                  },
+                  "Advice": {
+                    "type": "string"
+                  }
+                },
+                "required": [ "IsSupported", "Advice" ],
+                "additionalProperties": false
+              },
+              "ProfilerDeviceIndex": {
+                "type": [ "number", "null" ]
+              },
+              "Displays": {
+                "type": "array",
+                "items": {
+                  "type": "object",
+                  "properties": {
+                    "OutputIndex": {
+                      "type": "number"
+                    },
+                    "Description": {
+                      "type": "string"
+                    },
+                    "Left": {
+                      "type": "number"
+                    },
+                    "Top": {
+                      "type": "number"
+                    },
+                    "Width": {
+                      "type": "number"
+                    },
+                    "Height": {
+                      "type": "number"
+                    },
+                    "AttachedToDesktop": {
+                      "type": "boolean"
+                    }
+                  },
+                  "required": [ "OutputIndex", "Description", "Left", "Top", "Width", "Height", "AttachedToDesktop" ],
+                  "additionalProperties": false
+                }
+              }
+            },
+            "required": [ "DXGIAdapterIndex", "Name", "VendorId", "DeviceId", "DedicatedVideoMemory", "DedicatedSystemMemory", "SharedSystemMemory", "DeviceLUID", "IsDebugLayerForcedOn", "IsNvidiaDevice", "ProfilerIsGpuSupported", "ProfilerIsSessionSupported", "ProfilerDeviceIndex", "Displays" ],
+            "additionalProperties": false
+          }
+        }
+      },
+      "required": [ "ProfilerDriverLoaded", "Devices" ],
+      "additionalProperties": false
+    }
+  },
+  "required": [ "Global", "Vulkan" ]
+}