SimpleChains
Documentation for SimpleChains.
SimpleChains.ADAM
SimpleChains.AbsoluteLoss
SimpleChains.AbstractPenalty
SimpleChains.Activation
SimpleChains.Conv
SimpleChains.Dropout
SimpleChains.Flatten
SimpleChains.FrontLastPenalty
SimpleChains.L1Penalty
SimpleChains.L2Penalty
SimpleChains.LogitCrossEntropyLoss
SimpleChains.MaxPool
SimpleChains.SimpleChain
SimpleChains.SquaredLoss
SimpleChains.TurboDense
Base.front
SimpleChains.add_loss
SimpleChains.alloc_threaded_grad
SimpleChains.biases
SimpleChains.init_params
SimpleChains.init_params!
SimpleChains.numparam
SimpleChains.params
SimpleChains.pullback_arg!
SimpleChains.train_batched!
SimpleChains.train_unbatched!
SimpleChains.valgrad!
SimpleChains.weights
SimpleChains.ADAM
— TypeADAM(η = 0.001, β = (0.9, 0.999))
ADAM optimizer.
SimpleChains.AbsoluteLoss
— TypeAbsoluteLoss
Calculates mean absolute loss of the target.
SimpleChains.AbstractPenalty
— TypeAbstractPenalty
The AbstractPenalty
interface requires supporting the following methods:
getchain(::AbstractPenalty)::SimpleChain
returns aSimpleChain
if it is carrying one.apply_penalty(::AbstractPenalty, params)::Number
returns the penaltyapply_penalty!(grad, ::AbstractPenalty, params)::Number
returns the penalty and updatesgrad
to add the gradient.
SimpleChains.Activation
— TypeActivation(activation)
Applies activation
function elementwise.
SimpleChains.Conv
— TypeConv(activation, dims::Tuple{Vararg{Integer}}, outputdim::Integer)
Performs a convolution with dims
and maps it to outputdim
output channels, then adds a bias (one per outputdim
) and applies activation
elementwise.
E.g., Conv(relu, (5, 5), 16)
performs a 5 × 5
convolution, and maps the input channels to 16 output channels, before adding a bias and applying relu
.
Randomly initializing weights using the (Xavier) Glorot uniform distribution. The bias is zero-initialized.
SimpleChains.Dropout
— TypeDropout(p) # 0 < p < 1
Dropout layer.
When evaluated without gradients, it multiplies inputs by (1 - p)
. When evaluated with gradients, it randomly zeros p
proportion of inputs.
SimpleChains.Flatten
— TypeFlatten{N}()
Flattens the first N
dimensions. E.g.,
julia> Flatten{2}()(rand(2, 3, 4))
6×4 Matrix{Float64}:
0.0609115 0.597285 0.279899 0.888223
0.0667422 0.315741 0.351003 0.805629
0.678297 0.350817 0.984215 0.399418
0.125801 0.566696 0.96873 0.57744
0.331961 0.350742 0.59598 0.741998
0.26345 0.144635 0.076433 0.330475
SimpleChains.FrontLastPenalty
— TypeFrontLastPenalty(SimpleChain, frontpen(λ₁...), lastpen(λ₂...))
Applies frontpen
to all but the last layer, applying lastpen
to the last layer instead. "Last layer" here ignores the loss function, i.e. if the last element of the chain is a loss layer, the then lastpen
applies to the layer preceding this.
SimpleChains.L1Penalty
— TypeL1Penalty(λ)
Applies a L1 penalty of λ
to parameters, i.e. penalizing by their absolute value.
SimpleChains.L2Penalty
— TypeL2Penalty(λ)
Applies a L2 penalty of λ
to parameters, i.e. penalizing by their squares.
SimpleChains.LogitCrossEntropyLoss
— TypeLogitCrossEntropyLoss
Calculates mean logit cross-entropy loss.
SimpleChains.MaxPool
— TypeMaxPool(dims::Tuple{Vararg{Integer}}
Calculates the maximum of pools of size dims
.
SimpleChains.SimpleChain
— TypeSimpleChain([inputdim::Union{Integer,Tuple{Vararg{Integer}}, ] layers)
Construct a SimpleChain. Optional inputdim
argument allows SimpleChains
to check the size of inputs. Making these static
will allow SimpleChains
to infer size and loop bounds at compile time. Batch size generally should not be included in the inputdim
. If inputdim
is not specified, some methods, e.g. init_params
, will require passing the size as an additional argument, because the number of parameters may be a function of the input size (e.g., for a TurboDense
layer).
The layers
argument holds various SimpleChains
layers, e.g. TurboDense
, Conv
, Activation
, Flatten
, Dropout
, or MaxPool
. It may optionally terminate in an AbstractLoss
layer.
These objects are callable, e.g.
c = SimpleChain(...);
p = SimpleChains.init_params(c);
c(X, p) # X are the independent variables, and `p` the parameter vector.
SimpleChains.SquaredLoss
— TypeSquaredLoss(target)
Calculates half of mean squared loss of the target.
SimpleChains.TurboDense
— TypeTurboDense{B=true}(activation, outputdim::Integer)
Linear (dense) layer.
B
specifies whether the layer includes a bias term.- The
activation
function is applied elementwise to the result. outputdim
indicates how many dimensions the input is mapped to.
Randomly initializing weights using the (Xavier) Glorot normal distribution. The bias is zero-initialized.
Base.front
— MethodBase.front(c::SimpleChain)
Useful for popping off a loss layer.
SimpleChains.add_loss
— Methodadd_loss(chn, l::AbstractLoss)
Add the loss function l
to the simple chain. The loss function should hold the target you're trying to fit.
SimpleChains.alloc_threaded_grad
— Methodalloc_threaded_grad(chn, id = nothing, ::Type{T} = Float32; numthreads = min(Threads.nthreads(), SimpleChains.num_cores())
Returns a preallocated array for writing gradients, for use with train_batched
and train_unbatched
. If Julia was started with multiple threads, returns a matrix with one column per thread, so they may accumulate gradients in parallel.
Note that the memory is aligned to avoid false sharing.
SimpleChains.biases
— Functionbiases(sc::SimpleChain, p::AbstractVector, inputdim = nothing)
Returns a tuple of the biases of the SimpleChain sc
, as a view of the parameter vector p
.
SimpleChains.init_params!
— FunctionSimpleChains.init_params!(chn, p, id = nothing)
Randomly initializes parameter vector p
with input dim id
. Input dim does not need to be specified if these were provided to the chain object itself. See the documentation of the individual layers to see how they are initialized, but it is generally via (Xavier) Glorot uniform or normal distributions.
SimpleChains.init_params
— MethodSimpleChains.init_params(chn[, id = nothing][, ::Type{T} = Float32])
Creates a parameter vector of element type T
with size matching that by id
(argument not required if provided to the chain
object itself). See the documentation of the individual layers to see how they are initialized, but it is generally via (Xavier) Glorot uniform or normal distributions.
SimpleChains.numparam
— Methodnumparam(d::Layer, inputdim::Tuple)
Returns a Tuple{Int,S}
. The first element is the number of parameters required by the layer given an argument of size inputdim
. The second argument is the size of the object returned by the layer, which can be fed into numparam
of the following layer.
SimpleChains.params
— Functionparams(sc::SimpleChain, p::AbstractVector, inputdim = nothing)
Returns a tuple of the parameters of the SimpleChain sc
, as a view of the parameter vector p
.
SimpleChains.pullback_arg!
— Methodpullback_arg!(dest, layer, C̄, A, p, pu, pu2)
Computes the pullback of layer
with respect to A
and C̄
, storing the result in dest
.
pullback_arg!(layer, C̄, A, p, pu, pu2)
Computes the pullback of layer
with respect to A
and C̄
, storing the result in A
.
SimpleChains.train_batched!
— Methodtrain_batched!(g::AbstractVecOrMat, p, chn, X, opt, iters; batchsize = nothing)
Train while batching arguments.
Arguments:
g
pre-allocated gradient buffer. Can be allocated withsimilar(p)
(if you want to run single threaded), oralloc_threaded_grad(chn, size(X))
(size(X)
argument is only necessary if the input dimension was not specified when constructing the chain). If a matrix, the number of columns gives how many threads to use. Do not use more threads than batch size would allow.p
is the parameter vector. It is updated inplace. It should be pre-initialized, e.g. withinit_params
/init_params!
. This is to allow callingtrain_unbatched!
several times to train in increments.chn
is theSimpleChain
. It must include a loss (seeSimpleChains.add_loss
) containing the target information (dependent variables) you're trying to fit.X
the training data input argument (independent variables).opt
is the optimizer. Currently, onlySimpleChains.ADAM
is supported.iters
, how many iterations to train for.batchsize
keyword argument: the size of the batches to use. Ifbatchsize = nothing
, it'll try to do a half-decent job of picking the batch size for you. However, this is not well optimized at the moment.
SimpleChains.train_unbatched!
— Methodtrain_unbatched!([g::AbstractVecOrMat, ]p, chn, X, opt, iters)
Train without batching inputs.
Arguments:
g
pre-allocated gradient buffer. Can be allocated withsimilar(p)
(if you want to run single threaded), oralloc_threaded_grad(chn, size(X))
(size(X)
argument is only necessary if the input dimension was not specified when constructing the chain). If a matrix, the number of columns gives how many threads to use. Do not use more threads than batch size would allow. This argument is optional. If excluded, it will run multithreaded (assuming you started Julia with multiple threads).p
is the parameter vector. It is updated inplace. It should be pre-initialized, e.g. withinit_params
/init_params!
. This is to allow callingtrain_unbatched!
several times to train in increments.chn
is theSimpleChain
. It must include a loss (seeSimpleChains.add_loss
) containing the target information (dependent variables) you're trying to fit.X
the training data input argument (independent variables).opt
is the optimizer. Currently, onlySimpleChains.ADAM
is supported.iters
, how many iterations to train for.
SimpleChains.valgrad!
— Methodvalgrad!(g, c::SimpleChain, arg, params)
g
can be either an AbstractVector
with the same size as params
, or a Tuple{A,G}
. If g
is a tuple, the first element is the gradient with respect to arg
, and should either be nothing
(for not taking this gradient) or have the same size
as arg. The second element is the gradient with respect to params
, and should likewise either be nothing
or have the same size as params
.
Allowed destruction:
valgrad_layer!
Accepts return of previous layer (B
) and returns an ouput C
. If an internal layer, allowed to destroy B
(e.g. dropout layer).
pullback!
Accepts adjoint of its return (C̄
). It is allowed to destroy this. It is also allowed to destroy the previous layer's return B
to produce B̄
(the C̄
it receives). Thus, the pullback is not allowed to depend on C
, as it may have been destroyed in producing C̄
.
SimpleChains.weights
— Functionweights(sc::SimpleChain, p::AbstractVector, inputdim = nothing)
Returns a tuple of the weights (parameters other than biases) of the SimpleChain sc
, as a view of the parameter vector p
.