port from perforce
This commit is contained in:
431
hgplus/las/framework-dx11-nasm/tools/nvcc.txt
Normal file
431
hgplus/las/framework-dx11-nasm/tools/nvcc.txt
Normal file
@@ -0,0 +1,431 @@
|
||||
|
||||
Usage : nvcc [options] <inputfile>
|
||||
|
||||
Options for specifying the compilation phase
|
||||
============================================
|
||||
More exactly, this option specifies up to which stage the input files must be
|
||||
compiled, according to the following compilation trajectories for different
|
||||
input file types:
|
||||
.c/.cc/.cpp/.cxx : preprocess, compile, link
|
||||
.o : link
|
||||
.i/.ii : compile, link
|
||||
.cu : preprocess, cuda frontend, ptxassemble,
|
||||
merge with host C code, compile, link
|
||||
.gpu : cicc compile into cubin
|
||||
.ptx : ptxassemble into cubin.
|
||||
|
||||
--cuda (-cuda)
|
||||
Compile all .cu input files to .cu.cpp.ii output.
|
||||
|
||||
--cubin (-cubin)
|
||||
Compile all .cu/.ptx/.gpu input files to device- only .cubin files.
|
||||
This step discards the host code for each .cu input file.
|
||||
|
||||
--fatbin(-fatbin)
|
||||
Compile all .cu/.ptx/.gpu input files to ptx or device- only .cubin
|
||||
files (depending on the values specified for options '-arch' and/or
|
||||
'-code') and place the result into the fat binary file specified with
|
||||
option -o.
|
||||
This step discards the host code for each .cu input file.
|
||||
|
||||
--ptx (-ptx)
|
||||
Compile all .cu/.gpu input files to device- only .ptx files. This step
|
||||
discards the host code for each of these input file.
|
||||
|
||||
--gpu (-gpu)
|
||||
Compile all .cu input files to device-only .gpu files. This step
|
||||
discards the host code for each .cu input file.
|
||||
|
||||
--preprocess (-E)
|
||||
Preprocess all .c/.cc/.cpp/.cxx/.cu input files.
|
||||
|
||||
--generate-dependencies (-M)
|
||||
Generate for the one .c/.cc/.cpp/.cxx/.cu input file (more than one
|
||||
input file is not allowed in this mode) a dependency file that can be
|
||||
included in a make file.
|
||||
|
||||
--compile (-c)
|
||||
Compile each .c/.cc/.cpp/.cxx/.cu input file into an object file.
|
||||
|
||||
--device-c (-dc)
|
||||
Compile each .c/.cc/.cpp/.cxx/.cu input file into an object file that
|
||||
contains relocatable device code. It is equivalent to
|
||||
'--relocatable-device-code=true --compile'.
|
||||
|
||||
--device-w (-dw)
|
||||
Compile each .c/.cc/.cpp/.cxx/.cu input file into an object file that
|
||||
contains executable device code. It is equivalent to
|
||||
'--relocatable-device-code=false --compile'.
|
||||
|
||||
--device-link (-dlink)
|
||||
Link object files with relocatable device code and .ptx/.cubin/.fatbin
|
||||
files into an object file with executable device code, which can be
|
||||
passed to the host linker.
|
||||
|
||||
--link (-link)
|
||||
This option specifies the default behavior: compile and link all inputs
|
||||
.
|
||||
|
||||
--no-device-link (-nodlink)
|
||||
Skip the device link step when linking object files.
|
||||
|
||||
--lib (-lib)
|
||||
Compile all inputs into object files (if necessary) and add the results
|
||||
to the specified output library file.
|
||||
|
||||
--run (-run)
|
||||
This option compiles and links all inputs into an executable, and
|
||||
executes it. Or, when the input is a single executable, it is executed
|
||||
without any compilation or linking. This step is intended for
|
||||
developers who do not want to be bothered with setting the necessary
|
||||
cuda dll search paths (these will be set temporarily by nvcc).
|
||||
|
||||
|
||||
File and path specifications
|
||||
============================
|
||||
|
||||
--x (-x)
|
||||
Explicitly specify the language for the input files, rather than
|
||||
letting the compiler choose a default based on the file name suffix.
|
||||
Allowed values for this option: 'c','c++','cu'.
|
||||
|
||||
--output-file <file> (-o)
|
||||
Specify name and location of the output file. Only a single input file
|
||||
is allowed when this option is present in nvcc non- linking/archiving
|
||||
mode.
|
||||
|
||||
--pre-include <include-file>,... (-include)
|
||||
Specify header files that must be preincluded during preprocessing.
|
||||
|
||||
--library <library>,... (-l)
|
||||
Specify libraries to be used in the linking stage without the library
|
||||
file extension. The libraries are searched for on the library search
|
||||
paths that have been specified using option '-L'.
|
||||
|
||||
--define-macro <macrodef>,... (-D)
|
||||
Specify macro definitions to define for use during preprocessing or
|
||||
compilation.
|
||||
|
||||
--undefine-macro <macrodef>,... (-U)
|
||||
Specify macro definitions to undefine for use during preprocessing or
|
||||
compilation.
|
||||
|
||||
--include-path <include-path>,... (-I)
|
||||
Specify include search paths.
|
||||
|
||||
--system-include <include-path>,... (-isystem)
|
||||
Specify system include search paths.
|
||||
|
||||
--library-path <library-path>,... (-L)
|
||||
Specify library search paths.
|
||||
|
||||
--output-directory <directory> (-odir)
|
||||
Specify the directory of the output file. This option is intended for
|
||||
letting the dependency generation step (option
|
||||
'--generate-dependencies') generate a rule that defines the target
|
||||
object file in the proper directory.
|
||||
|
||||
--compiler-bindir <path> (-ccbin)
|
||||
Specify the directory in which the compiler executable (Microsoft
|
||||
Visual Studio cl, or a gcc derivative) resides. By default, this
|
||||
executable is expected in the current executable search path. For a
|
||||
different compiler, or to specify these compilers with a different
|
||||
executable name, specify the path to the compiler including the
|
||||
executable name.
|
||||
|
||||
--cudart(-cudart)
|
||||
Specify the type of CUDA runtime library to be used: static CUDA
|
||||
runtime library, shared/dynamic CUDA runtime library, or no CUDA
|
||||
runtime library. By default, the static CUDA runtime library is used.
|
||||
Allowed values for this option: 'none','shared','static'.
|
||||
Default value: 'static'.
|
||||
|
||||
--cl-version <cl-version-number> --cl-version <cl-version-number>
|
||||
Specify the version of Microsoft Visual Studio installation. Note: this
|
||||
option is to be used in conjunction with '--use-local-env', and is
|
||||
ignored when '--use-local-env' is not specified.
|
||||
Allowed values for this option: 2008,2010,2012.
|
||||
|
||||
--use-local-env --use-local-env
|
||||
Specify whether the environment is already set up for the host compiler
|
||||
.
|
||||
|
||||
--libdevice-directory <directory> (-ldir)
|
||||
Specify the directory that contains the libdevice library files when
|
||||
option '--dont-use-profile' is used. Libdevice library files are
|
||||
located in the 'nvvm/libdevice' directory in the CUDA toolkit.
|
||||
|
||||
|
||||
Options for specifying behaviour of compiler/linker
|
||||
===================================================
|
||||
|
||||
--profile (-pg)
|
||||
Instrument generated code/executable for use by gprof (Linux only).
|
||||
|
||||
--debug (-g)
|
||||
Generate debug information for host code.
|
||||
|
||||
--device-debug (-G)
|
||||
Generate debug information for device code.
|
||||
|
||||
--generate-line-info (-lineinfo)
|
||||
Generate line-number information for device code.
|
||||
|
||||
--optimize <level> (-O)
|
||||
Specify optimization level for host code.
|
||||
|
||||
--shared(-shared)
|
||||
Generate a shared library during linking. Note: when other linker
|
||||
options are required for controlling dll generation, use option
|
||||
-Xlinker.
|
||||
|
||||
--machine <bits> (-m)
|
||||
Specify 32 vs 64 bit architecture.
|
||||
Allowed values for this option: 32,64.
|
||||
Default value: 64.
|
||||
|
||||
|
||||
Options for passing specific phase options
|
||||
==========================================
|
||||
These allow for passing options directly to the intended compilation phase.
|
||||
Using these, users have the ability to pass options to the lower level
|
||||
compilation tools, without the need for nvcc to know about each and every such
|
||||
option.
|
||||
|
||||
--compiler-options <options>,... (-Xcompiler)
|
||||
Specify options directly to the compiler/preprocessor.
|
||||
|
||||
--linker-options <options>,... (-Xlinker)
|
||||
Specify options directly to the host linker.
|
||||
|
||||
--archive-options <options>,... (-Xarchive)
|
||||
Specify options directly to library manager.
|
||||
|
||||
--cudafe-options <options>,... (-Xcudafe)
|
||||
Specify options directly to cudafe.
|
||||
|
||||
--ptxas-options <options>,... (-Xptxas)
|
||||
Specify options directly to the ptx optimizing assembler.
|
||||
|
||||
--nvlink-options <options>,... (-Xnvlink)
|
||||
Specify options directly to nvlink.
|
||||
|
||||
|
||||
Miscellaneous options for guiding the compiler driver
|
||||
=====================================================
|
||||
|
||||
--dont-use-profile (-noprof)
|
||||
Nvcc uses the nvcc.profiles file for compilation. When specifying this
|
||||
option, the profile file is not used.
|
||||
|
||||
--dryrun(-dryrun)
|
||||
Do not execute the compilation commands generated by nvcc. Instead,
|
||||
list them.
|
||||
|
||||
--verbose (-v)
|
||||
List the compilation commands generated by this compiler driver, but do
|
||||
not suppress their execution.
|
||||
|
||||
--keep (-keep)
|
||||
Keep all intermediate files that are generated during internal
|
||||
compilation steps.
|
||||
|
||||
--keep-dir (-keep-dir)
|
||||
Keep all intermediate files that are generated during internal
|
||||
compilation steps in this directory.
|
||||
|
||||
--save-temps (-save-temps)
|
||||
This option is an alias of '--keep'.
|
||||
|
||||
--clean-targets (-clean)
|
||||
This option reverses the behaviour of nvcc. When specified, none of the
|
||||
compilation phases will be executed. Instead, all of the non- temporary
|
||||
files that nvcc would otherwise create will be deleted.
|
||||
|
||||
--run-args <arguments>,... (-run-args)
|
||||
Used in combination with option -R, to specify command line arguments
|
||||
for the executable.
|
||||
|
||||
--input-drive-prefix <prefix> (-idp)
|
||||
On Windows platforms, all command line arguments that refer to file
|
||||
names must be converted to Windows native format before they are passed
|
||||
to pure Windows executables. This option specifies how the 'current'
|
||||
development environment represents absolute paths. Use '-idp /cygwin/'
|
||||
for CygWin build environments, and '-idp /' for Mingw.
|
||||
|
||||
--dependency-drive-prefix <prefix> (-ddp)
|
||||
On Windows platforms, when generating dependency files (option -M), all
|
||||
file names must be converted to whatever the used instance of 'make'
|
||||
will recognize. Some instances of 'make' have trouble with the colon in
|
||||
absolute paths in native Windows format, which depends on the
|
||||
environment in which this 'make' instance has been compiled. Use '-ddp
|
||||
/cygwin/' for a CygWin make, and '-ddp /' for Mingw. Or leave these
|
||||
file names in native Windows format by specifying nothing.
|
||||
|
||||
--dependency-target-name <target> (-MT)
|
||||
Specify the target name of the generated rule when generating a
|
||||
dependency file (option -M).
|
||||
|
||||
--drive-prefix <prefix> (-dp)
|
||||
Specifies <prefix> as both input-drive-prefix and
|
||||
dependency-drive-prefix.
|
||||
|
||||
--no-align-double --no-align-double
|
||||
Specifies that -malign-double should not be passed as a compiler
|
||||
argument on 32-bit platforms. WARNING: this makes the ABI incompatible
|
||||
with the cuda's kernel ABI for certain 64-bit types.
|
||||
|
||||
|
||||
Options for steering GPU code generation
|
||||
========================================
|
||||
|
||||
--gpu-architecture <gpu architecture name> (-arch)
|
||||
Specify the name of the class of nVidia GPU architectures for which the
|
||||
cuda input files must be compiled.
|
||||
With the exception as described for the shorthand below, the
|
||||
architecture specified with this option must be a virtual architecture
|
||||
(such as compute_10), and it will be the assumed architecture during
|
||||
the cicc compilation stage.
|
||||
This option will cause no code to be generated (that is the role of
|
||||
nvcc option '--gpu-code', see below); rather, its purpose is to steer
|
||||
the cicc stage, influencing the architecture of the generated ptx
|
||||
intermediate.
|
||||
For convenience in case of simple nvcc compilations the following
|
||||
shorthand is supported: if no value for option '--gpu-code' is
|
||||
specified, then the value of this option defaults to the value of
|
||||
'--gpu-architecture'. In this situation, as only exception to the
|
||||
description above, the value specified for '--gpu-architecture' may be
|
||||
a 'real' architecture (such as a sm_13), in which case nvcc uses the
|
||||
specified real architecture and its closest virtual architecture as
|
||||
effective architecture values. For example, 'nvcc -arch=sm_13' is
|
||||
equivalent to 'nvcc -arch=compute_13 -code=sm_13,compute_13'.
|
||||
Allowed values for this option: 'compute_10','compute_11','compute_12',
|
||||
'compute_13','compute_20','compute_30','compute_35','sm_10','sm_11',
|
||||
'sm_12','sm_13','sm_20','sm_21','sm_30','sm_35'.
|
||||
|
||||
--gpu-code <gpu architecture name>,... (-code)
|
||||
Specify the names of nVidia gpus to generate code for.
|
||||
nvcc will embed a compiled code image in the resulting executable for
|
||||
each specified 'code' architecture. This code image will be a true
|
||||
binary load image for each 'real' architecture (such as a sm_13), and
|
||||
ptx intermediate code for each virtual architecture (such as
|
||||
compute_10). During runtime, in case no better binary load image is
|
||||
found, and provided that the ptx architecture is compatible with the
|
||||
'current' GPU, such embedded ptx code will be dynamically translated
|
||||
for this current GPU by the cuda runtime system.
|
||||
Architectures specified for this option can be virtual as well as real,
|
||||
but each of these 'code' architectures must be compatible with the
|
||||
architecture specified with option '--gpu-architecture'.
|
||||
For instance, 'arch'=compute_13 is not compatible with 'code'=sm_10,
|
||||
because the generated ptx code will assume the availability of
|
||||
compute_13 features that are not present on sm_10.
|
||||
Allowed values for this option: 'compute_10','compute_11','compute_12',
|
||||
'compute_13','compute_20','compute_30','compute_35','sm_10','sm_11',
|
||||
'sm_12','sm_13','sm_20','sm_21','sm_30','sm_35'.
|
||||
|
||||
--generate-code (-gencode)
|
||||
This option provides a generalization of the '--gpu-architecture=<arch>
|
||||
--gpu-code=code,...' option combination for specifying nvcc behavior
|
||||
with respect to code generation. Where use of the previous options
|
||||
generates different code for a fixed virtual architecture, option
|
||||
'--generate-code' allows multiple cicc invocations, iterating over
|
||||
different virtual architectures. In fact,
|
||||
'--gpu-architecture=<arch> --gpu-code=<code>,...'
|
||||
is equivalent to
|
||||
'--generate-code arch=<arch>,code=<code>,...'.
|
||||
'--generate-code' options may be repeated for different virtual
|
||||
architectures.
|
||||
Allowed keywords for this option: 'arch','code'.
|
||||
|
||||
--maxrregcount <N> (-maxrregcount)
|
||||
Specify the maximum amount of registers that GPU functions can use.
|
||||
Until a function- specific limit, a higher value will generally
|
||||
increase the performance of individual GPU threads that execute this
|
||||
function. However, because thread registers are allocated from a global
|
||||
register pool on each GPU, a higher value of this option will also
|
||||
reduce the maximum thread block size, thereby reducing the amount of
|
||||
thread parallelism. Hence, a good maxrregcount value is the result of a
|
||||
trade-off.
|
||||
If this option is not specified, then no maximum is assumed.
|
||||
Value less than the minimum registers required by ABI will be bumped up
|
||||
by the compiler to ABI minimum limit.
|
||||
|
||||
--ftz [true,false] (-ftz)
|
||||
When performing single-precision floating-point operations, flush
|
||||
denormal values to zero or preserve denormal values. -use_fast_math
|
||||
implies --ftz=true.
|
||||
Default value: 0.
|
||||
|
||||
--prec-div [true,false] (-prec-div)
|
||||
For single-precision floating-point division and reciprocals, use IEEE
|
||||
round-to-nearest mode or use a faster approximation. -use_fast_math
|
||||
implies --prec-div=false.
|
||||
Default value: 1.
|
||||
|
||||
--prec-sqrt [true,false] (-prec-sqrt)
|
||||
For single-precision floating-point square root, use IEEE
|
||||
round-to-nearest mode or use a faster approximation. -use_fast_math
|
||||
implies --prec-sqrt=false.
|
||||
Default value: 1.
|
||||
|
||||
--fmad [true,false] (-fmad)
|
||||
Enables (disables) the contraction of floating-point multiplies and
|
||||
adds/subtracts into floating-point multiply-add operations (FMAD, FFMA,
|
||||
or DFMA). This option is supported only when '--gpu-architecture' is
|
||||
set with compute_20, sm_20, or higher. For other architecture classes,
|
||||
the contraction is always enabled. -use_fast_math implies --fmad=true.
|
||||
Default value: 1.
|
||||
|
||||
--relocatable-device-code [true,false] (-rdc)
|
||||
Enable (disable) the generation of relocatable device code. If
|
||||
disabled, executable device code is generated.
|
||||
Default value: 0.
|
||||
|
||||
|
||||
Options for steering cuda compilation
|
||||
=====================================
|
||||
|
||||
--use_fast_math (-use_fast_math)
|
||||
Make use of fast math library. -use_fast_math implies -ftz=true
|
||||
-prec-div=false -prec-sqrt=false.
|
||||
|
||||
--entries entry,... (-e)
|
||||
In case of compilation of ptx or gpu files to cubin: specify the global
|
||||
entry functions for which code must be generated. By default, code will
|
||||
be generated for all entry functions.
|
||||
|
||||
|
||||
Generic tool options
|
||||
====================
|
||||
|
||||
--disable-warnings (-w)
|
||||
Inhibit all warning messages.
|
||||
|
||||
--source-in-ptx (-src-in-ptx)
|
||||
Interleave source in ptx.
|
||||
|
||||
--restrict (-restrict)
|
||||
Programmer assertion that all kernel pointer parameters are restrict
|
||||
pointers.
|
||||
|
||||
--Werror<kind>,... (-Werror)
|
||||
Make warnings of the specified kinds into errors. The following is the
|
||||
list of warning kinds accepted by this option:
|
||||
|
||||
cross-execution-space-call
|
||||
Be more strict about unsupported cross execution space calls.
|
||||
The compiler will generate an error instead of a warning for a
|
||||
call from a __host__ __device__ to a __host__ function.
|
||||
|
||||
Allowed values for this option: 'cross-execution-space-call'.
|
||||
|
||||
--help (-h)
|
||||
Print this help information on this tool.
|
||||
|
||||
--version (-V)
|
||||
Print version information on this tool.
|
||||
|
||||
--options-file <file>,... (-optf)
|
||||
Include command line options from specified file.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user