SimpleChains

Documentation for SimpleChains.

SimpleChains.ADAM
SimpleChains.AbsoluteLoss
SimpleChains.AbstractPenalty
SimpleChains.Activation
SimpleChains.Conv
SimpleChains.Dropout
SimpleChains.Flatten
SimpleChains.FrontLastPenalty
SimpleChains.L1Penalty
SimpleChains.L2Penalty
SimpleChains.LogitCrossEntropyLoss
SimpleChains.MaxPool
SimpleChains.SimpleChain
SimpleChains.SquaredLoss
SimpleChains.TurboDense
Base.front
SimpleChains.add_loss
SimpleChains.alloc_threaded_grad
SimpleChains.biases
SimpleChains.init_params
SimpleChains.init_params!
SimpleChains.numparam
SimpleChains.params
SimpleChains.pullback_arg!
SimpleChains.train_batched!
SimpleChains.train_unbatched!
SimpleChains.valgrad!
SimpleChains.weights

SimpleChains.ADAM — Type

ADAM(η = 0.001, β = (0.9, 0.999))

ADAM optimizer.

source

SimpleChains.AbsoluteLoss — Type

AbsoluteLoss

Calculates mean absolute loss of the target.

source

SimpleChains.AbstractPenalty — Type

AbstractPenalty

The AbstractPenalty interface requires supporting the following methods:

getchain(::AbstractPenalty)::SimpleChain returns a SimpleChain if it is carrying one.
apply_penalty(::AbstractPenalty, params)::Number returns the penalty
apply_penalty!(grad, ::AbstractPenalty, params)::Number returns the penalty and updates grad to add the gradient.

source

SimpleChains.Activation — Type

Activation(activation)

Applies activation function elementwise.

source

SimpleChains.Conv — Type

Conv(activation, dims::Tuple{Vararg{Integer}}, outputdim::Integer)

Performs a convolution with dims and maps it to outputdim output channels, then adds a bias (one per outputdim) and applies activation elementwise.

E.g., Conv(relu, (5, 5), 16) performs a 5 × 5 convolution, and maps the input channels to 16 output channels, before adding a bias and applying relu.

Randomly initializing weights using the (Xavier) Glorot uniform distribution. The bias is zero-initialized.

source

SimpleChains.Dropout — Type

Dropout(p) # 0 < p < 1

Dropout layer.

When evaluated without gradients, it multiplies inputs by (1 - p). When evaluated with gradients, it randomly zeros p proportion of inputs.

source

SimpleChains.Flatten — Type

Flatten{N}()

Flattens the first N dimensions. E.g.,

julia> Flatten{2}()(rand(2, 3, 4))
6×4 Matrix{Float64}:
 0.0609115  0.597285  0.279899  0.888223
 0.0667422  0.315741  0.351003  0.805629
 0.678297   0.350817  0.984215  0.399418
 0.125801   0.566696  0.96873   0.57744
 0.331961   0.350742  0.59598   0.741998
 0.26345    0.144635  0.076433  0.330475

source

SimpleChains.FrontLastPenalty — Type

FrontLastPenalty(SimpleChain, frontpen(λ₁...), lastpen(λ₂...))

Applies frontpen to all but the last layer, applying lastpen to the last layer instead. "Last layer" here ignores the loss function, i.e. if the last element of the chain is a loss layer, the then lastpen applies to the layer preceding this.

source

SimpleChains.L1Penalty — Type

L1Penalty(λ)

Applies a L1 penalty of λ to parameters, i.e. penalizing by their absolute value.

source

SimpleChains.L2Penalty — Type

L2Penalty(λ)

Applies a L2 penalty of λ to parameters, i.e. penalizing by their squares.

source

SimpleChains.LogitCrossEntropyLoss — Type

LogitCrossEntropyLoss

Calculates mean logit cross-entropy loss.

source

SimpleChains.MaxPool — Type

MaxPool(dims::Tuple{Vararg{Integer}}

Calculates the maximum of pools of size dims.

source

SimpleChains.SimpleChain — Type

SimpleChain([inputdim::Union{Integer,Tuple{Vararg{Integer}}, ] layers)

Construct a SimpleChain. Optional inputdim argument allows SimpleChains to check the size of inputs. Making these static will allow SimpleChains to infer size and loop bounds at compile time. Batch size generally should not be included in the inputdim. If inputdim is not specified, some methods, e.g. init_params, will require passing the size as an additional argument, because the number of parameters may be a function of the input size (e.g., for a TurboDense layer).

The layers argument holds various SimpleChains layers, e.g. TurboDense, Conv, Activation, Flatten, Dropout, or MaxPool. It may optionally terminate in an AbstractLoss layer.

These objects are callable, e.g.

c = SimpleChain(...);
p = SimpleChains.init_params(c);
c(X, p) # X are the independent variables, and `p` the parameter vector.

source

SimpleChains.SquaredLoss — Type

SquaredLoss(target)

Calculates half of mean squared loss of the target.

source

SimpleChains.TurboDense — Type

TurboDense{B=true}(activation, outputdim::Integer)

Linear (dense) layer.

B specifies whether the layer includes a bias term.
The activation function is applied elementwise to the result.
outputdim indicates how many dimensions the input is mapped to.

Randomly initializing weights using the (Xavier) Glorot normal distribution. The bias is zero-initialized.

source

Base.front — Method

Base.front(c::SimpleChain)

Useful for popping off a loss layer.

source

SimpleChains.add_loss — Method

add_loss(chn, l::AbstractLoss)

Add the loss function l to the simple chain. The loss function should hold the target you're trying to fit.

source

SimpleChains.alloc_threaded_grad — Method

alloc_threaded_grad(chn, id = nothing, ::Type{T} = Float32; numthreads = min(Threads.nthreads(), SimpleChains.num_cores())

Returns a preallocated array for writing gradients, for use with train_batched and train_unbatched. If Julia was started with multiple threads, returns a matrix with one column per thread, so they may accumulate gradients in parallel.

Note that the memory is aligned to avoid false sharing.

source

SimpleChains.biases — Function

biases(sc::SimpleChain, p::AbstractVector, inputdim = nothing)

Returns a tuple of the biases of the SimpleChain sc, as a view of the parameter vector p.

source

SimpleChains.init_params! — Function

SimpleChains.init_params!(chn, p, id = nothing)

Randomly initializes parameter vector p with input dim id. Input dim does not need to be specified if these were provided to the chain object itself. See the documentation of the individual layers to see how they are initialized, but it is generally via (Xavier) Glorot uniform or normal distributions.

source

SimpleChains.init_params — Method

SimpleChains.init_params(chn[, id = nothing][, ::Type{T} = Float32])

Creates a parameter vector of element type T with size matching that by id (argument not required if provided to the chain object itself). See the documentation of the individual layers to see how they are initialized, but it is generally via (Xavier) Glorot uniform or normal distributions.

source

SimpleChains.numparam — Method

numparam(d::Layer, inputdim::Tuple)

Returns a Tuple{Int,S}. The first element is the number of parameters required by the layer given an argument of size inputdim. The second argument is the size of the object returned by the layer, which can be fed into numparam of the following layer.

source

SimpleChains.params — Function

params(sc::SimpleChain, p::AbstractVector, inputdim = nothing)

Returns a tuple of the parameters of the SimpleChain sc, as a view of the parameter vector p.

source

SimpleChains.pullback_arg! — Method

pullback_arg!(dest, layer, C̄, A, p, pu, pu2)

Computes the pullback of layer with respect to A and C̄, storing the result in dest.

pullback_arg!(layer, C̄, A, p, pu, pu2)

Computes the pullback of layer with respect to A and C̄, storing the result in A.

source

SimpleChains.train_batched! — Method

train_batched!(g::AbstractVecOrMat, p, chn, X, opt, iters; batchsize = nothing)

Train while batching arguments.

Arguments:

g pre-allocated gradient buffer. Can be allocated with similar(p) (if you want to run single threaded), or alloc_threaded_grad(chn, size(X)) (size(X) argument is only necessary if the input dimension was not specified when constructing the chain). If a matrix, the number of columns gives how many threads to use. Do not use more threads than batch size would allow.
p is the parameter vector. It is updated inplace. It should be pre-initialized, e.g. with init_params/init_params!. This is to allow calling train_unbatched! several times to train in increments.
chn is the SimpleChain. It must include a loss (see SimpleChains.add_loss) containing the target information (dependent variables) you're trying to fit.
X the training data input argument (independent variables).
opt is the optimizer. Currently, only SimpleChains.ADAM is supported.
iters, how many iterations to train for.
batchsize keyword argument: the size of the batches to use. If batchsize = nothing, it'll try to do a half-decent job of picking the batch size for you. However, this is not well optimized at the moment.

source

SimpleChains.train_unbatched! — Method

train_unbatched!([g::AbstractVecOrMat, ]p, chn, X, opt, iters)

Train without batching inputs.

Arguments:

g pre-allocated gradient buffer. Can be allocated with similar(p) (if you want to run single threaded), or alloc_threaded_grad(chn, size(X)) (size(X) argument is only necessary if the input dimension was not specified when constructing the chain). If a matrix, the number of columns gives how many threads to use. Do not use more threads than batch size would allow. This argument is optional. If excluded, it will run multithreaded (assuming you started Julia with multiple threads).
p is the parameter vector. It is updated inplace. It should be pre-initialized, e.g. with init_params/init_params!. This is to allow calling train_unbatched! several times to train in increments.
chn is the SimpleChain. It must include a loss (see SimpleChains.add_loss) containing the target information (dependent variables) you're trying to fit.
X the training data input argument (independent variables).
opt is the optimizer. Currently, only SimpleChains.ADAM is supported.
iters, how many iterations to train for.

source

SimpleChains.valgrad! — Method

valgrad!(g, c::SimpleChain, arg, params)

g can be either an AbstractVector with the same size as params, or a Tuple{A,G}. If g is a tuple, the first element is the gradient with respect to arg, and should either be nothing (for not taking this gradient) or have the same size as arg. The second element is the gradient with respect to params, and should likewise either be nothing or have the same size as params.

Allowed destruction:

valgrad_layer!

Accepts return of previous layer (B) and returns an ouput C. If an internal layer, allowed to destroy B (e.g. dropout layer).

pullback!

Accepts adjoint of its return (C̄). It is allowed to destroy this. It is also allowed to destroy the previous layer's return B to produce B̄ (the C̄ it receives). Thus, the pullback is not allowed to depend on C, as it may have been destroyed in producing C̄.

source

SimpleChains.weights — Function

weights(sc::SimpleChain, p::AbstractVector, inputdim = nothing)

Returns a tuple of the weights (parameters other than biases) of the SimpleChain sc, as a view of the parameter vector p.

source