port from perforce

2026-04-18 22:31:51 +02:00
commit 8d0ab5b7cc
8409 changed files with 3972376 additions and 0 deletions
--- a/hgplus/las/framework-dx11-nasm/tools/nvcc.txt
+++ b/hgplus/las/framework-dx11-nasm/tools/nvcc.txt
@@ -0,0 +1,431 @@
+
+Usage  : nvcc [options] <inputfile>
+
+Options for specifying the compilation phase
+============================================
+More exactly, this option specifies up to which stage the input files must be 
+compiled, according to the following compilation trajectories for different 
+input file types:
+        .c/.cc/.cpp/.cxx : preprocess, compile, link
+        .o               : link
+        .i/.ii           : compile, link
+        .cu              : preprocess, cuda frontend, ptxassemble,
+                           merge with host C code, compile, link
+        .gpu             : cicc compile into cubin
+        .ptx             : ptxassemble into cubin.
+
+--cuda  (-cuda)                           
+        Compile all .cu input files to .cu.cpp.ii output.
+
+--cubin (-cubin)                          
+        Compile all .cu/.ptx/.gpu input files to device- only .cubin files. 
+        This step discards the host code for each .cu input file.
+
+--fatbin(-fatbin)                         
+        Compile all .cu/.ptx/.gpu input files to ptx or device- only .cubin 
+        files (depending on the values specified for options '-arch' and/or 
+        '-code') and place the result into the fat binary file specified with 
+        option -o.
+        This step discards the host code for each .cu input file.
+
+--ptx   (-ptx)                            
+        Compile all .cu/.gpu input files to device- only .ptx files. This step 
+        discards the host code for each of these input file.
+
+--gpu   (-gpu)                            
+        Compile all .cu input files to device-only .gpu files. This step 
+        discards the host code for each .cu input file.
+
+--preprocess                                (-E)                              
+        Preprocess all .c/.cc/.cpp/.cxx/.cu input files.
+
+--generate-dependencies                     (-M)                              
+        Generate for the one .c/.cc/.cpp/.cxx/.cu input file (more than one 
+        input file is not allowed in this mode) a dependency file that can be 
+        included in a make file.
+
+--compile                                   (-c)                              
+        Compile each .c/.cc/.cpp/.cxx/.cu input file into an object file.
+
+--device-c                                  (-dc)                             
+        Compile each .c/.cc/.cpp/.cxx/.cu input file into an object file that 
+        contains relocatable device code. It is equivalent to 
+        '--relocatable-device-code=true --compile'.
+
+--device-w                                  (-dw)                             
+        Compile each .c/.cc/.cpp/.cxx/.cu input file into an object file that 
+        contains executable device code. It is equivalent to 
+        '--relocatable-device-code=false --compile'.
+
+--device-link                               (-dlink)                          
+        Link object files with relocatable device code and .ptx/.cubin/.fatbin 
+        files into an object file with executable device code, which can be 
+        passed to the host linker.
+
+--link  (-link)                           
+        This option specifies the default behavior: compile and link all inputs
+        .
+
+--no-device-link                            (-nodlink)                        
+        Skip the device link step when linking object files.
+
+--lib   (-lib)                            
+        Compile all inputs into object files (if necessary) and add the results 
+        to the specified output library file.
+
+--run   (-run)                            
+        This option compiles and links all inputs into an executable, and 
+        executes it. Or, when the input is a single executable, it is executed 
+        without any compilation or linking. This step is intended for 
+        developers who do not want to be bothered with setting the necessary 
+        cuda dll search paths (these will be set temporarily by nvcc).
+
+
+File and path specifications
+============================
+
+--x     (-x)                              
+        Explicitly specify the language for the input files, rather than 
+        letting the compiler choose a default based on the file name suffix.
+        Allowed values for this option:  'c','c++','cu'.
+
+--output-file <file>                        (-o)                              
+        Specify name and location of the output file. Only a single input file 
+        is allowed when this option is present in nvcc non- linking/archiving 
+        mode.
+
+--pre-include <include-file>,...            (-include)                        
+        Specify header files that must be preincluded during preprocessing.
+
+--library <library>,...                     (-l)                              
+        Specify libraries to be used in the linking stage without the library 
+        file extension. The libraries are searched for on the library search 
+        paths that have been specified using option '-L'.
+
+--define-macro <macrodef>,...               (-D)                              
+        Specify macro definitions to define for use during preprocessing or 
+        compilation.
+
+--undefine-macro <macrodef>,...             (-U)                              
+        Specify macro definitions to undefine for use during preprocessing or 
+        compilation.
+
+--include-path <include-path>,...           (-I)                              
+        Specify include search paths.
+
+--system-include <include-path>,...         (-isystem)                        
+        Specify system include search paths.
+
+--library-path <library-path>,...           (-L)                              
+        Specify library search paths.
+
+--output-directory <directory>              (-odir)                           
+        Specify the directory of the output file. This option is intended for 
+        letting the dependency generation step (option 
+        '--generate-dependencies') generate a rule that defines the target 
+        object file in the proper directory.
+
+--compiler-bindir <path>                    (-ccbin)                          
+        Specify the directory in which the compiler executable (Microsoft 
+        Visual Studio cl, or a gcc derivative) resides. By default, this 
+        executable is expected in the current executable search path. For a 
+        different compiler, or to specify these compilers with a different 
+        executable name, specify the path to the compiler including the 
+        executable name.
+
+--cudart(-cudart)                         
+        Specify the type of CUDA runtime library to be used: static CUDA 
+        runtime library, shared/dynamic CUDA runtime library, or no CUDA 
+        runtime library. By default, the static CUDA runtime library is used.
+        Allowed values for this option:  'none','shared','static'.
+        Default value:  'static'.
+
+--cl-version <cl-version-number>            --cl-version <cl-version-number>  
+        Specify the version of Microsoft Visual Studio installation. Note: this 
+        option is to be used in conjunction with '--use-local-env', and is 
+        ignored when '--use-local-env' is not specified.
+        Allowed values for this option:  2008,2010,2012.
+
+--use-local-env                             --use-local-env                   
+        Specify whether the environment is already set up for the host compiler
+        .
+
+--libdevice-directory <directory>           (-ldir)                           
+        Specify the directory that contains the libdevice library files when 
+        option '--dont-use-profile' is used. Libdevice library files are 
+        located in the 'nvvm/libdevice' directory in the CUDA toolkit.
+
+
+Options for specifying behaviour of compiler/linker
+===================================================
+
+--profile                                   (-pg)                             
+        Instrument generated code/executable for use by gprof (Linux only).
+
+--debug (-g)                              
+        Generate debug information for host code.
+
+--device-debug                              (-G)                              
+        Generate debug information for device code.
+
+--generate-line-info                        (-lineinfo)                       
+        Generate line-number information for device code.
+
+--optimize <level>                          (-O)                              
+        Specify optimization level for host code.
+
+--shared(-shared)                         
+        Generate a shared library during linking. Note: when other linker 
+        options are required for controlling dll generation, use option 
+        -Xlinker.
+
+--machine <bits>                            (-m)                              
+        Specify 32 vs 64 bit architecture.
+        Allowed values for this option:  32,64.
+        Default value:  64.
+
+
+Options for passing specific phase options
+==========================================
+These allow for passing options directly to the intended compilation phase. 
+Using these, users have the ability to pass options to the lower level 
+compilation tools, without the need for nvcc to know about each and every such 
+option.
+
+--compiler-options <options>,...            (-Xcompiler)                      
+        Specify options directly to the compiler/preprocessor.
+
+--linker-options <options>,...              (-Xlinker)                        
+        Specify options directly to the host linker.
+
+--archive-options <options>,...             (-Xarchive)                       
+        Specify options directly to library manager.
+
+--cudafe-options <options>,...              (-Xcudafe)                        
+        Specify options directly to cudafe.
+
+--ptxas-options <options>,...               (-Xptxas)                         
+        Specify options directly to the ptx optimizing assembler.
+
+--nvlink-options <options>,...              (-Xnvlink)                        
+        Specify options directly to nvlink.
+
+
+Miscellaneous options for guiding the compiler driver
+=====================================================
+
+--dont-use-profile                          (-noprof)                         
+        Nvcc uses the nvcc.profiles file for compilation. When specifying this 
+        option, the profile file is not used.
+
+--dryrun(-dryrun)                         
+        Do not execute the compilation commands generated by nvcc. Instead, 
+        list them.
+
+--verbose                                   (-v)                              
+        List the compilation commands generated by this compiler driver, but do 
+        not suppress their execution.
+
+--keep  (-keep)                           
+        Keep all intermediate files that are generated during internal 
+        compilation steps.
+
+--keep-dir                                  (-keep-dir)                       
+        Keep all intermediate files that are generated during internal 
+        compilation steps in this directory.
+
+--save-temps                                (-save-temps)                     
+        This option is an alias of '--keep'.
+
+--clean-targets                             (-clean)                          
+        This option reverses the behaviour of nvcc. When specified, none of the 
+        compilation phases will be executed. Instead, all of the non- temporary 
+        files that nvcc would otherwise create will be deleted.
+
+--run-args <arguments>,...                  (-run-args)                       
+        Used in combination with option -R, to specify command line arguments 
+        for the executable.
+
+--input-drive-prefix <prefix>               (-idp)                            
+        On Windows platforms, all command line arguments that refer to file 
+        names must be converted to Windows native format before they are passed 
+        to pure Windows executables. This option specifies how the 'current' 
+        development environment represents absolute paths. Use '-idp /cygwin/' 
+        for CygWin build environments, and '-idp /' for Mingw.
+
+--dependency-drive-prefix <prefix>          (-ddp)                            
+        On Windows platforms, when generating dependency files (option -M), all 
+        file names must be converted to whatever the used instance of 'make' 
+        will recognize. Some instances of 'make' have trouble with the colon in 
+        absolute paths in native Windows format, which depends on the 
+        environment in which this 'make' instance has been compiled. Use '-ddp 
+        /cygwin/' for a CygWin make, and '-ddp /' for Mingw. Or leave these 
+        file names in native Windows format by specifying nothing.
+
+--dependency-target-name <target>           (-MT)                             
+        Specify the target name of the generated rule when generating a 
+        dependency file (option -M).
+
+--drive-prefix <prefix>                     (-dp)                             
+        Specifies <prefix> as both input-drive-prefix and 
+        dependency-drive-prefix.
+
+--no-align-double                           --no-align-double                 
+        Specifies that -malign-double should not be passed as a compiler 
+        argument on 32-bit platforms. WARNING: this makes the ABI incompatible 
+        with the cuda's kernel ABI for certain 64-bit types.
+
+
+Options for steering GPU code generation
+========================================
+
+--gpu-architecture <gpu architecture name>  (-arch)                           
+        Specify the name of the class of nVidia GPU architectures for which the 
+        cuda input files must be compiled.
+        With the exception as described for the shorthand below, the 
+        architecture specified with this option must be a virtual architecture 
+        (such as compute_10), and it will be the assumed architecture during 
+        the cicc compilation stage.
+        This option will cause no code to be generated (that is the role of 
+        nvcc option '--gpu-code', see below); rather, its purpose is to steer 
+        the cicc stage, influencing the architecture of the generated ptx 
+        intermediate.
+        For convenience in case of simple nvcc compilations the following 
+        shorthand is supported: if no value for option '--gpu-code' is 
+        specified, then the value of this option defaults to the value of 
+        '--gpu-architecture'. In this situation, as only exception to the 
+        description above, the value specified for '--gpu-architecture' may be 
+        a 'real' architecture (such as a sm_13), in which case nvcc uses the 
+        specified real architecture and its closest virtual architecture as 
+        effective architecture values. For example, 'nvcc -arch=sm_13' is 
+        equivalent to 'nvcc -arch=compute_13 -code=sm_13,compute_13'.
+        Allowed values for this option:  'compute_10','compute_11','compute_12',
+        'compute_13','compute_20','compute_30','compute_35','sm_10','sm_11',
+        'sm_12','sm_13','sm_20','sm_21','sm_30','sm_35'.
+
+--gpu-code <gpu architecture name>,...      (-code)                           
+        Specify the names of nVidia gpus to generate code for.
+        nvcc will embed a compiled code image in the resulting executable for 
+        each specified 'code' architecture. This code image will be a true 
+        binary load image for each 'real' architecture (such as a sm_13), and 
+        ptx intermediate code for each virtual architecture (such as 
+        compute_10). During runtime, in case no better binary load image is 
+        found, and provided that the ptx architecture is compatible with the 
+        'current' GPU, such embedded ptx code will be dynamically translated 
+        for this current GPU by the cuda runtime system.
+        Architectures specified for this option can be virtual as well as real, 
+        but each of these 'code' architectures must be compatible with the 
+        architecture specified with option '--gpu-architecture'.
+        For instance, 'arch'=compute_13 is not compatible with 'code'=sm_10, 
+        because the generated ptx code will assume the availability of 
+        compute_13 features that are not present on sm_10.
+        Allowed values for this option:  'compute_10','compute_11','compute_12',
+        'compute_13','compute_20','compute_30','compute_35','sm_10','sm_11',
+        'sm_12','sm_13','sm_20','sm_21','sm_30','sm_35'.
+
+--generate-code                             (-gencode)                        
+        This option provides a generalization of the '--gpu-architecture=<arch> 
+        --gpu-code=code,...' option combination for specifying nvcc behavior 
+        with respect to code generation. Where use of the previous options 
+        generates different code for a fixed virtual architecture, option 
+        '--generate-code' allows multiple cicc invocations, iterating over 
+        different virtual architectures. In fact, 
+                '--gpu-architecture=<arch> --gpu-code=<code>,...'
+        is equivalent to
+                '--generate-code arch=<arch>,code=<code>,...'.
+        '--generate-code' options may be repeated for different virtual 
+        architectures.
+        Allowed keywords for this option:  'arch','code'.
+
+--maxrregcount <N>                          (-maxrregcount)                   
+        Specify the maximum amount of registers that GPU functions can use. 
+        Until a function- specific limit, a higher value will generally 
+        increase the performance of individual GPU threads that execute this 
+        function. However, because thread registers are allocated from a global 
+        register pool on each GPU, a higher value of this option will also 
+        reduce the maximum thread block size, thereby reducing the amount of 
+        thread parallelism. Hence, a good maxrregcount value is the result of a 
+        trade-off.
+        If this option is not specified, then no maximum is assumed.
+        Value less than the minimum registers required by ABI will be bumped up 
+        by the compiler to ABI minimum limit.
+
+--ftz [true,false]                          (-ftz)                            
+        When performing single-precision floating-point operations, flush 
+        denormal values to zero or preserve denormal values. -use_fast_math 
+        implies --ftz=true.
+        Default value:  0.
+
+--prec-div [true,false]                     (-prec-div)                       
+        For single-precision floating-point division and reciprocals, use IEEE 
+        round-to-nearest mode or use a faster approximation. -use_fast_math 
+        implies --prec-div=false.
+        Default value:  1.
+
+--prec-sqrt [true,false]                    (-prec-sqrt)                      
+        For single-precision floating-point square root, use IEEE 
+        round-to-nearest mode or use a faster approximation. -use_fast_math 
+        implies --prec-sqrt=false.
+        Default value:  1.
+
+--fmad [true,false]                         (-fmad)                           
+        Enables (disables) the contraction of floating-point multiplies and 
+        adds/subtracts into floating-point multiply-add operations (FMAD, FFMA, 
+        or DFMA). This option is supported only when '--gpu-architecture' is 
+        set with compute_20, sm_20, or higher. For other architecture classes, 
+        the contraction is always enabled. -use_fast_math implies --fmad=true.
+        Default value:  1.
+
+--relocatable-device-code [true,false]      (-rdc)                            
+        Enable (disable) the generation of relocatable device code. If 
+        disabled, executable device code is generated.
+        Default value:  0.
+
+
+Options for steering cuda compilation
+=====================================
+
+--use_fast_math                             (-use_fast_math)                  
+        Make use of fast math library. -use_fast_math implies -ftz=true 
+        -prec-div=false -prec-sqrt=false.
+
+--entries entry,...                         (-e)                              
+        In case of compilation of ptx or gpu files to cubin: specify the global 
+        entry functions for which code must be generated. By default, code will 
+        be generated for all entry functions.
+
+
+Generic tool options
+====================
+
+--disable-warnings                          (-w)                              
+        Inhibit all warning messages.
+
+--source-in-ptx                             (-src-in-ptx)                     
+        Interleave source in ptx.
+
+--restrict                                  (-restrict)                       
+        Programmer assertion that all kernel pointer parameters are restrict 
+        pointers.
+
+--Werror<kind>,...                         (-Werror)                         
+        Make warnings of the specified kinds into errors. The following is the 
+        list of warning kinds accepted by this option:
+                
+        cross-execution-space-call
+                Be more strict about unsupported cross execution space calls.
+                The compiler will generate an error instead of a warning for a
+                call from a __host__ __device__ to a __host__ function.
+                
+        Allowed values for this option:  'cross-execution-space-call'.
+
+--help  (-h)                              
+        Print this help information on this tool.
+
+--version                                   (-V)                              
+        Print version information on this tool.
+
+--options-file <file>,...                   (-optf)                           
+        Include command line options from specified file.
+
+