432 lines
21 KiB
Plaintext
432 lines
21 KiB
Plaintext
|
|
Usage : nvcc [options] <inputfile>
|
|
|
|
Options for specifying the compilation phase
|
|
============================================
|
|
More exactly, this option specifies up to which stage the input files must be
|
|
compiled, according to the following compilation trajectories for different
|
|
input file types:
|
|
.c/.cc/.cpp/.cxx : preprocess, compile, link
|
|
.o : link
|
|
.i/.ii : compile, link
|
|
.cu : preprocess, cuda frontend, ptxassemble,
|
|
merge with host C code, compile, link
|
|
.gpu : cicc compile into cubin
|
|
.ptx : ptxassemble into cubin.
|
|
|
|
--cuda (-cuda)
|
|
Compile all .cu input files to .cu.cpp.ii output.
|
|
|
|
--cubin (-cubin)
|
|
Compile all .cu/.ptx/.gpu input files to device- only .cubin files.
|
|
This step discards the host code for each .cu input file.
|
|
|
|
--fatbin(-fatbin)
|
|
Compile all .cu/.ptx/.gpu input files to ptx or device- only .cubin
|
|
files (depending on the values specified for options '-arch' and/or
|
|
'-code') and place the result into the fat binary file specified with
|
|
option -o.
|
|
This step discards the host code for each .cu input file.
|
|
|
|
--ptx (-ptx)
|
|
Compile all .cu/.gpu input files to device- only .ptx files. This step
|
|
discards the host code for each of these input file.
|
|
|
|
--gpu (-gpu)
|
|
Compile all .cu input files to device-only .gpu files. This step
|
|
discards the host code for each .cu input file.
|
|
|
|
--preprocess (-E)
|
|
Preprocess all .c/.cc/.cpp/.cxx/.cu input files.
|
|
|
|
--generate-dependencies (-M)
|
|
Generate for the one .c/.cc/.cpp/.cxx/.cu input file (more than one
|
|
input file is not allowed in this mode) a dependency file that can be
|
|
included in a make file.
|
|
|
|
--compile (-c)
|
|
Compile each .c/.cc/.cpp/.cxx/.cu input file into an object file.
|
|
|
|
--device-c (-dc)
|
|
Compile each .c/.cc/.cpp/.cxx/.cu input file into an object file that
|
|
contains relocatable device code. It is equivalent to
|
|
'--relocatable-device-code=true --compile'.
|
|
|
|
--device-w (-dw)
|
|
Compile each .c/.cc/.cpp/.cxx/.cu input file into an object file that
|
|
contains executable device code. It is equivalent to
|
|
'--relocatable-device-code=false --compile'.
|
|
|
|
--device-link (-dlink)
|
|
Link object files with relocatable device code and .ptx/.cubin/.fatbin
|
|
files into an object file with executable device code, which can be
|
|
passed to the host linker.
|
|
|
|
--link (-link)
|
|
This option specifies the default behavior: compile and link all inputs
|
|
.
|
|
|
|
--no-device-link (-nodlink)
|
|
Skip the device link step when linking object files.
|
|
|
|
--lib (-lib)
|
|
Compile all inputs into object files (if necessary) and add the results
|
|
to the specified output library file.
|
|
|
|
--run (-run)
|
|
This option compiles and links all inputs into an executable, and
|
|
executes it. Or, when the input is a single executable, it is executed
|
|
without any compilation or linking. This step is intended for
|
|
developers who do not want to be bothered with setting the necessary
|
|
cuda dll search paths (these will be set temporarily by nvcc).
|
|
|
|
|
|
File and path specifications
|
|
============================
|
|
|
|
--x (-x)
|
|
Explicitly specify the language for the input files, rather than
|
|
letting the compiler choose a default based on the file name suffix.
|
|
Allowed values for this option: 'c','c++','cu'.
|
|
|
|
--output-file <file> (-o)
|
|
Specify name and location of the output file. Only a single input file
|
|
is allowed when this option is present in nvcc non- linking/archiving
|
|
mode.
|
|
|
|
--pre-include <include-file>,... (-include)
|
|
Specify header files that must be preincluded during preprocessing.
|
|
|
|
--library <library>,... (-l)
|
|
Specify libraries to be used in the linking stage without the library
|
|
file extension. The libraries are searched for on the library search
|
|
paths that have been specified using option '-L'.
|
|
|
|
--define-macro <macrodef>,... (-D)
|
|
Specify macro definitions to define for use during preprocessing or
|
|
compilation.
|
|
|
|
--undefine-macro <macrodef>,... (-U)
|
|
Specify macro definitions to undefine for use during preprocessing or
|
|
compilation.
|
|
|
|
--include-path <include-path>,... (-I)
|
|
Specify include search paths.
|
|
|
|
--system-include <include-path>,... (-isystem)
|
|
Specify system include search paths.
|
|
|
|
--library-path <library-path>,... (-L)
|
|
Specify library search paths.
|
|
|
|
--output-directory <directory> (-odir)
|
|
Specify the directory of the output file. This option is intended for
|
|
letting the dependency generation step (option
|
|
'--generate-dependencies') generate a rule that defines the target
|
|
object file in the proper directory.
|
|
|
|
--compiler-bindir <path> (-ccbin)
|
|
Specify the directory in which the compiler executable (Microsoft
|
|
Visual Studio cl, or a gcc derivative) resides. By default, this
|
|
executable is expected in the current executable search path. For a
|
|
different compiler, or to specify these compilers with a different
|
|
executable name, specify the path to the compiler including the
|
|
executable name.
|
|
|
|
--cudart(-cudart)
|
|
Specify the type of CUDA runtime library to be used: static CUDA
|
|
runtime library, shared/dynamic CUDA runtime library, or no CUDA
|
|
runtime library. By default, the static CUDA runtime library is used.
|
|
Allowed values for this option: 'none','shared','static'.
|
|
Default value: 'static'.
|
|
|
|
--cl-version <cl-version-number> --cl-version <cl-version-number>
|
|
Specify the version of Microsoft Visual Studio installation. Note: this
|
|
option is to be used in conjunction with '--use-local-env', and is
|
|
ignored when '--use-local-env' is not specified.
|
|
Allowed values for this option: 2008,2010,2012.
|
|
|
|
--use-local-env --use-local-env
|
|
Specify whether the environment is already set up for the host compiler
|
|
.
|
|
|
|
--libdevice-directory <directory> (-ldir)
|
|
Specify the directory that contains the libdevice library files when
|
|
option '--dont-use-profile' is used. Libdevice library files are
|
|
located in the 'nvvm/libdevice' directory in the CUDA toolkit.
|
|
|
|
|
|
Options for specifying behaviour of compiler/linker
|
|
===================================================
|
|
|
|
--profile (-pg)
|
|
Instrument generated code/executable for use by gprof (Linux only).
|
|
|
|
--debug (-g)
|
|
Generate debug information for host code.
|
|
|
|
--device-debug (-G)
|
|
Generate debug information for device code.
|
|
|
|
--generate-line-info (-lineinfo)
|
|
Generate line-number information for device code.
|
|
|
|
--optimize <level> (-O)
|
|
Specify optimization level for host code.
|
|
|
|
--shared(-shared)
|
|
Generate a shared library during linking. Note: when other linker
|
|
options are required for controlling dll generation, use option
|
|
-Xlinker.
|
|
|
|
--machine <bits> (-m)
|
|
Specify 32 vs 64 bit architecture.
|
|
Allowed values for this option: 32,64.
|
|
Default value: 64.
|
|
|
|
|
|
Options for passing specific phase options
|
|
==========================================
|
|
These allow for passing options directly to the intended compilation phase.
|
|
Using these, users have the ability to pass options to the lower level
|
|
compilation tools, without the need for nvcc to know about each and every such
|
|
option.
|
|
|
|
--compiler-options <options>,... (-Xcompiler)
|
|
Specify options directly to the compiler/preprocessor.
|
|
|
|
--linker-options <options>,... (-Xlinker)
|
|
Specify options directly to the host linker.
|
|
|
|
--archive-options <options>,... (-Xarchive)
|
|
Specify options directly to library manager.
|
|
|
|
--cudafe-options <options>,... (-Xcudafe)
|
|
Specify options directly to cudafe.
|
|
|
|
--ptxas-options <options>,... (-Xptxas)
|
|
Specify options directly to the ptx optimizing assembler.
|
|
|
|
--nvlink-options <options>,... (-Xnvlink)
|
|
Specify options directly to nvlink.
|
|
|
|
|
|
Miscellaneous options for guiding the compiler driver
|
|
=====================================================
|
|
|
|
--dont-use-profile (-noprof)
|
|
Nvcc uses the nvcc.profiles file for compilation. When specifying this
|
|
option, the profile file is not used.
|
|
|
|
--dryrun(-dryrun)
|
|
Do not execute the compilation commands generated by nvcc. Instead,
|
|
list them.
|
|
|
|
--verbose (-v)
|
|
List the compilation commands generated by this compiler driver, but do
|
|
not suppress their execution.
|
|
|
|
--keep (-keep)
|
|
Keep all intermediate files that are generated during internal
|
|
compilation steps.
|
|
|
|
--keep-dir (-keep-dir)
|
|
Keep all intermediate files that are generated during internal
|
|
compilation steps in this directory.
|
|
|
|
--save-temps (-save-temps)
|
|
This option is an alias of '--keep'.
|
|
|
|
--clean-targets (-clean)
|
|
This option reverses the behaviour of nvcc. When specified, none of the
|
|
compilation phases will be executed. Instead, all of the non- temporary
|
|
files that nvcc would otherwise create will be deleted.
|
|
|
|
--run-args <arguments>,... (-run-args)
|
|
Used in combination with option -R, to specify command line arguments
|
|
for the executable.
|
|
|
|
--input-drive-prefix <prefix> (-idp)
|
|
On Windows platforms, all command line arguments that refer to file
|
|
names must be converted to Windows native format before they are passed
|
|
to pure Windows executables. This option specifies how the 'current'
|
|
development environment represents absolute paths. Use '-idp /cygwin/'
|
|
for CygWin build environments, and '-idp /' for Mingw.
|
|
|
|
--dependency-drive-prefix <prefix> (-ddp)
|
|
On Windows platforms, when generating dependency files (option -M), all
|
|
file names must be converted to whatever the used instance of 'make'
|
|
will recognize. Some instances of 'make' have trouble with the colon in
|
|
absolute paths in native Windows format, which depends on the
|
|
environment in which this 'make' instance has been compiled. Use '-ddp
|
|
/cygwin/' for a CygWin make, and '-ddp /' for Mingw. Or leave these
|
|
file names in native Windows format by specifying nothing.
|
|
|
|
--dependency-target-name <target> (-MT)
|
|
Specify the target name of the generated rule when generating a
|
|
dependency file (option -M).
|
|
|
|
--drive-prefix <prefix> (-dp)
|
|
Specifies <prefix> as both input-drive-prefix and
|
|
dependency-drive-prefix.
|
|
|
|
--no-align-double --no-align-double
|
|
Specifies that -malign-double should not be passed as a compiler
|
|
argument on 32-bit platforms. WARNING: this makes the ABI incompatible
|
|
with the cuda's kernel ABI for certain 64-bit types.
|
|
|
|
|
|
Options for steering GPU code generation
|
|
========================================
|
|
|
|
--gpu-architecture <gpu architecture name> (-arch)
|
|
Specify the name of the class of nVidia GPU architectures for which the
|
|
cuda input files must be compiled.
|
|
With the exception as described for the shorthand below, the
|
|
architecture specified with this option must be a virtual architecture
|
|
(such as compute_10), and it will be the assumed architecture during
|
|
the cicc compilation stage.
|
|
This option will cause no code to be generated (that is the role of
|
|
nvcc option '--gpu-code', see below); rather, its purpose is to steer
|
|
the cicc stage, influencing the architecture of the generated ptx
|
|
intermediate.
|
|
For convenience in case of simple nvcc compilations the following
|
|
shorthand is supported: if no value for option '--gpu-code' is
|
|
specified, then the value of this option defaults to the value of
|
|
'--gpu-architecture'. In this situation, as only exception to the
|
|
description above, the value specified for '--gpu-architecture' may be
|
|
a 'real' architecture (such as a sm_13), in which case nvcc uses the
|
|
specified real architecture and its closest virtual architecture as
|
|
effective architecture values. For example, 'nvcc -arch=sm_13' is
|
|
equivalent to 'nvcc -arch=compute_13 -code=sm_13,compute_13'.
|
|
Allowed values for this option: 'compute_10','compute_11','compute_12',
|
|
'compute_13','compute_20','compute_30','compute_35','sm_10','sm_11',
|
|
'sm_12','sm_13','sm_20','sm_21','sm_30','sm_35'.
|
|
|
|
--gpu-code <gpu architecture name>,... (-code)
|
|
Specify the names of nVidia gpus to generate code for.
|
|
nvcc will embed a compiled code image in the resulting executable for
|
|
each specified 'code' architecture. This code image will be a true
|
|
binary load image for each 'real' architecture (such as a sm_13), and
|
|
ptx intermediate code for each virtual architecture (such as
|
|
compute_10). During runtime, in case no better binary load image is
|
|
found, and provided that the ptx architecture is compatible with the
|
|
'current' GPU, such embedded ptx code will be dynamically translated
|
|
for this current GPU by the cuda runtime system.
|
|
Architectures specified for this option can be virtual as well as real,
|
|
but each of these 'code' architectures must be compatible with the
|
|
architecture specified with option '--gpu-architecture'.
|
|
For instance, 'arch'=compute_13 is not compatible with 'code'=sm_10,
|
|
because the generated ptx code will assume the availability of
|
|
compute_13 features that are not present on sm_10.
|
|
Allowed values for this option: 'compute_10','compute_11','compute_12',
|
|
'compute_13','compute_20','compute_30','compute_35','sm_10','sm_11',
|
|
'sm_12','sm_13','sm_20','sm_21','sm_30','sm_35'.
|
|
|
|
--generate-code (-gencode)
|
|
This option provides a generalization of the '--gpu-architecture=<arch>
|
|
--gpu-code=code,...' option combination for specifying nvcc behavior
|
|
with respect to code generation. Where use of the previous options
|
|
generates different code for a fixed virtual architecture, option
|
|
'--generate-code' allows multiple cicc invocations, iterating over
|
|
different virtual architectures. In fact,
|
|
'--gpu-architecture=<arch> --gpu-code=<code>,...'
|
|
is equivalent to
|
|
'--generate-code arch=<arch>,code=<code>,...'.
|
|
'--generate-code' options may be repeated for different virtual
|
|
architectures.
|
|
Allowed keywords for this option: 'arch','code'.
|
|
|
|
--maxrregcount <N> (-maxrregcount)
|
|
Specify the maximum amount of registers that GPU functions can use.
|
|
Until a function- specific limit, a higher value will generally
|
|
increase the performance of individual GPU threads that execute this
|
|
function. However, because thread registers are allocated from a global
|
|
register pool on each GPU, a higher value of this option will also
|
|
reduce the maximum thread block size, thereby reducing the amount of
|
|
thread parallelism. Hence, a good maxrregcount value is the result of a
|
|
trade-off.
|
|
If this option is not specified, then no maximum is assumed.
|
|
Value less than the minimum registers required by ABI will be bumped up
|
|
by the compiler to ABI minimum limit.
|
|
|
|
--ftz [true,false] (-ftz)
|
|
When performing single-precision floating-point operations, flush
|
|
denormal values to zero or preserve denormal values. -use_fast_math
|
|
implies --ftz=true.
|
|
Default value: 0.
|
|
|
|
--prec-div [true,false] (-prec-div)
|
|
For single-precision floating-point division and reciprocals, use IEEE
|
|
round-to-nearest mode or use a faster approximation. -use_fast_math
|
|
implies --prec-div=false.
|
|
Default value: 1.
|
|
|
|
--prec-sqrt [true,false] (-prec-sqrt)
|
|
For single-precision floating-point square root, use IEEE
|
|
round-to-nearest mode or use a faster approximation. -use_fast_math
|
|
implies --prec-sqrt=false.
|
|
Default value: 1.
|
|
|
|
--fmad [true,false] (-fmad)
|
|
Enables (disables) the contraction of floating-point multiplies and
|
|
adds/subtracts into floating-point multiply-add operations (FMAD, FFMA,
|
|
or DFMA). This option is supported only when '--gpu-architecture' is
|
|
set with compute_20, sm_20, or higher. For other architecture classes,
|
|
the contraction is always enabled. -use_fast_math implies --fmad=true.
|
|
Default value: 1.
|
|
|
|
--relocatable-device-code [true,false] (-rdc)
|
|
Enable (disable) the generation of relocatable device code. If
|
|
disabled, executable device code is generated.
|
|
Default value: 0.
|
|
|
|
|
|
Options for steering cuda compilation
|
|
=====================================
|
|
|
|
--use_fast_math (-use_fast_math)
|
|
Make use of fast math library. -use_fast_math implies -ftz=true
|
|
-prec-div=false -prec-sqrt=false.
|
|
|
|
--entries entry,... (-e)
|
|
In case of compilation of ptx or gpu files to cubin: specify the global
|
|
entry functions for which code must be generated. By default, code will
|
|
be generated for all entry functions.
|
|
|
|
|
|
Generic tool options
|
|
====================
|
|
|
|
--disable-warnings (-w)
|
|
Inhibit all warning messages.
|
|
|
|
--source-in-ptx (-src-in-ptx)
|
|
Interleave source in ptx.
|
|
|
|
--restrict (-restrict)
|
|
Programmer assertion that all kernel pointer parameters are restrict
|
|
pointers.
|
|
|
|
--Werror<kind>,... (-Werror)
|
|
Make warnings of the specified kinds into errors. The following is the
|
|
list of warning kinds accepted by this option:
|
|
|
|
cross-execution-space-call
|
|
Be more strict about unsupported cross execution space calls.
|
|
The compiler will generate an error instead of a warning for a
|
|
call from a __host__ __device__ to a __host__ function.
|
|
|
|
Allowed values for this option: 'cross-execution-space-call'.
|
|
|
|
--help (-h)
|
|
Print this help information on this tool.
|
|
|
|
--version (-V)
|
|
Print version information on this tool.
|
|
|
|
--options-file <file>,... (-optf)
|
|
Include command line options from specified file.
|
|
|
|
|