11Writing TC operations
22=====================
33
4+ .. automodule :: tensor_comprehensions
5+
46This document focuses on writing TC operations using the high-level API.
57For examples of using the low-level API, see the Python API documentation.
68
79To create a CUDA kernel implementing an operation backed by TC, one should:
810
9111. Create a callable TC object by calling :func: `define `
10122. Create input PyTorch Tensors
11- 3. Call the helper object with the input PyTorch Tensors
13+ 3. Call the TC object with the input PyTorch Tensors
1214
1315When running, the backend ensures the TC is compiled and memoized for the
14- given input tensor sizes (see the documentation for :func: `define ` for more detals ).
16+ given input tensor sizes (see the documentation for :func: `define ` for more details ).
1517Calling the object returned by :func: `define ` executes the
1618corresponding operation and returns a list of outputs.
1719If the operation has already been compiled, in the following runs, the TC
@@ -23,11 +25,11 @@ Example
2325
2426The following example demonstrates the steps above.
2527We use the :func: `make_naive_options_factory ` builder function to provide
26- naive :class: `MappingOptions `. Naive options result in poor performance.
27- At this time, there is no notion of a default :class: `MappingOptions `.
28+ naive :class: `~tclib. MappingOptions `. Naive options result in poor performance.
29+ At this time, there is no notion of a default :class: `~tclib. MappingOptions `.
2830Instead one should use the autotuner to perform an evolutionary search
29- starting from an initial :class: `MappingOptions ` object and return a better
30- :class: `MappingOptions ` object for a given TC function and sizes (more on this
31+ starting from an initial :class: `~tclib. MappingOptions ` object and return a better
32+ :class: `~tclib. MappingOptions ` object for a given TC function and sizes (more on this
3133below).
3234
3335 .. code-block :: python
@@ -50,19 +52,19 @@ below).
5052 Specifying MappingOptions
5153-------------------------
5254
53- There are three ways to construct :class: `MappingOptions ` when defining a TC:
55+ There are three ways to construct :class: `~tclib. MappingOptions ` when defining a TC:
5456
5557* **Naive MappingOptions **:
5658
5759 * :code: `naive `: this is provided to create a basic GPU mapping strategy with
5860 3-D tiling by 32x32x32, mapping to 256x256 blocks 32x8 threads. This
5961 should by no means be considered a good baseline but just a point to
6062 get started using TC. Once a correct TC is written, we recommend either
61- using options loaded from a :class: `MappingOptionsCache ` or resulting from
62- a tuning run. One can also modify a :class: `MappingOptions ` object
63+ using options loaded from a :class: `~tclib. MappingOptionsCache ` or resulting from
64+ a tuning run. One can also modify a :class: `~tclib. MappingOptions ` object
6365 programmatically (see the API documentation).
6466
65- * **Loading from MappingOptionsCache **: a :class: `MappingOptionsCache ` provides
67+ * **Loading from MappingOptionsCache **: a :class: `~tclib. MappingOptionsCache ` provides
6668 a simple interface to load the best options from a previous tuning run.
6769
6870* **Autotuning **: A kernel can be autotuned for fixed input tensor sizes.
@@ -73,7 +75,7 @@ There are three ways to construct :class:`MappingOptions` when defining a TC:
7375Loading from cache
7476------------------
7577
76- Loading the best options from a previously serialized :class: `MappingOptionsCache `
78+ Loading the best options from a previously serialized :class: `~tclib. MappingOptionsCache `
7779can be achieved by making a factory function with
7880:func: `make_load_from_cache_options_factory ` and passing it as an argument to the
7981:func: `define ` function:
@@ -91,7 +93,7 @@ can be achieved by making a factory function with
9193 torch.randn(G, D, device = ' cuda' ))
9294 Sum, SumSq, O = T.group_normalization(I, gamma, beta)
9395
94- One can also use the low-level :class: `MappingOptionsCache `.
96+ One can also use the low-level :class: `~tclib. MappingOptionsCache `.
9597
9698Autotuning
9799----------
@@ -121,10 +123,10 @@ Tuning can be achieved by making a factory function with
121123 that case, the compilation and evaluation jobs currently in flight will
122124 be flushed, but no new compilation job will be created. Once the jobs in
123125 flight are flushed, saving to cache occurs (if requested) and the best
124- :class: `MappingOptions ` found so far will be returned.
126+ :class: `~tclib. MappingOptions ` found so far will be returned.
125127
126128Tuning behavior can be modified by defining the TC with an optional
127- :class: `TunerConfig ` parameter constructed as such:
129+ :class: `~tclib. TunerConfig ` parameter constructed as such:
128130:code: `tuner_config=tc.TunerConfig().threads(5).generations(3).pop_size(5) `.
129131
130132 .. note ::
0 commit comments