@@ -31,7 +31,7 @@ using the first form.
3131In this tutorial we will highlight both use cases in separate parts.
3232
3333!!! note
34-
34+
3535 If you're looking for GPU-accelerated neural networks inside of nonlinear solvers,
3636 check out [ DeepEquilibriumNetworks.jl] ( https://docs.sciml.ai/DeepEquilibriumNetworks/stable/ ) .
3737
@@ -59,7 +59,7 @@ f(u, p) = u .* u .- p
5959u0 = CUDA. cu (ones (1000 ))
6060p = CUDA. cu (collect (1 : 1000 ))
6161prob = NLS. NonlinearProblem (f, u0, p)
62- sol = NLS. solve (prob, NLS. NewtonRaphson (), abstol= 1f -4 )
62+ sol = NLS. solve (prob, NLS. NewtonRaphson (), abstol = 1.0f -4 )
6363```
6464
6565Notice a few things here. One, nothing is different except the input array types. But
@@ -95,7 +95,7 @@ import AMDGPU # For if you have an AMD GPU
9595import Metal # For if you have a Mac M-series device and want to use the built-in GPU
9696import OneAPI # For if you have an Intel GPU
9797
98- @ KernelAbstractions. kernel function parallel_nonlinearsolve_kernel! (result, @Const (prob), @Const (alg))
98+ KernelAbstractions. @ kernel function parallel_nonlinearsolve_kernel! (result, @Const (prob), @Const (alg))
9999 i = @index (Global)
100100 prob_i = SciMLBase. remake (prob; p = prob. p[i])
101101 sol = NLS. solve (prob_i, alg)
@@ -109,7 +109,7 @@ is saying, "for the ith call, get the i'th parameter set and solve with these pa
109109The ith result is then this solution".
110110
111111!!! note
112-
112+
113113 Because kernel code needs to be able to be compiled to a GPU kernel, it has very strict
114114 specifications of what's allowed because GPU cores are not as flexible as CPU cores.
115115 In general, this means that you need to avoid any runtime operations in kernel code,
@@ -140,16 +140,16 @@ Now let's build a nonlinear system to test it on.
140140 out2 = sqrt (p[2 ]) * (x[3 ] - x[4 ])
141141 out3 = (x[2 ] - p[3 ] * x[3 ])^ 2
142142 out4 = sqrt (p[4 ]) * (x[1 ] - x[4 ]) * (x[1 ] - x[4 ])
143- StaticArrays. SA[out1,out2,out3,out4]
143+ StaticArrays. SA[out1, out2, out3, out4]
144144end
145145
146146p = StaticArrays. @SVector [StaticArrays. @SVector (rand (Float32, 4 )) for _ in 1 : 1024 ]
147- u0 = StaticArrays. SA[1f0 , 2f0 , 3f0 , 4f0 ]
147+ u0 = StaticArrays. SA[1.0f0 , 2.0f0 , 3.0f0 , 4.0f0 ]
148148prob = SciMLBase. ImmutableNonlinearProblem {false} (p2_f, u0, p)
149149```
150150
151151!!! note
152-
152+
153153 Because the custom kernel is going to need to embed the the code for our nonlinear
154154 problem into the kernel, it also must be written to be GPU compatible.
155155 In general, this means that you need to avoid any runtime operations in kernel code,
@@ -176,4 +176,5 @@ vectorized_solve(prob, NLS.SimpleNewtonRaphson(); backend = Metal.MetalBackend()
176176```
177177
178178!!! warn
179+
179180 The GPU-based calls will only work on your machine if you have a compatible GPU!
0 commit comments