gluonts.mx.model.transformer.layers module#

class gluonts.mx.model.transformer.layers.InputLayer(model_size: int = 64, **kwargs)[source]#

Bases: mxnet.gluon.block.HybridBlock

Transforms the input vector to model_size with an one-layer MPL, i.e., (batch_size, time_length, input_dim) -> (batch_size, time_length, model_size)

hybrid_forward(F, data: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], *args)[source]#

Overrides to construct symbolic graph for this Block.

Parameters
  • x (Symbol or NDArray) – The first input tensor.

  • *args (list of Symbol or list of NDArray) – Additional input tensors.

class gluonts.mx.model.transformer.layers.LayerNormalization(scale_init: str = 'ones', shift_init: str = 'zeros', eps: float = 1e-06, **kwargs)[source]#

Bases: mxnet.gluon.block.HybridBlock

Implements layer normalization as proposed in [BKH16].

hybrid_forward(F, data: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol]) Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol][source]#

Normalizes hidden units of data as follows:

data = scale * (data - mean) / sqrt(var + eps) + shift

Normalization is performed over the last dimension of the input data.

Parameters

data – Data to normalize of shape (d0, …, dn, num_hidden)

Returns

Normalized inputs of shape

Return type

(d0, …, dn, num_hidden)

class gluonts.mx.model.transformer.layers.MultiHeadAttention(att_dim_in: int = 32, heads: int = 8, att_dim_out: int = 32, dropout: float = 0.0, **kwargs)[source]#

Bases: gluonts.mx.model.transformer.layers.MultiHeadAttentionBase

Multi-head attention layer for queries independent from keys/values.

Parameters
  • att_dim_in – Attention dimension (number of hidden units)

  • heads – Number of attention heads

  • att_dim_out – Output dimension (number of output units)

  • dropout – Dropout rate on attention scores

hybrid_forward(F, queries: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], memory: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], mask: Optional[Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol]] = None) Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol][source]#

Computes multi-head attention for queries given a memory tensor. If sequence lengths are provided, they will be used to mask the attention scores. A mask tensor may also be used to mask the attention scores. Returns a tensor of shape (batch_size, max_length, att_dim_out).

Parameters
  • queries – Queries tensor of shape (batch_size, query_max_length, att_dim_in)

  • memory – Memory tensor to attend to of shape (batch_size, memory_max_length, att_dim_in)

  • mask – Optional tensor to mask attention scores

Return type

Tensor of shape (batch_size, query_seq_len, att_dim_out)

class gluonts.mx.model.transformer.layers.MultiHeadAttentionBase(att_dim_in: int = 32, heads: int = 8, att_dim_out: int = 32, dropout: float = 0.0, **kwargs)[source]#

Bases: mxnet.gluon.block.HybridBlock

Base class for Multi-head attention.

Parameters
  • att_dim_in – Attention dimension (number of hidden units)

  • heads – Number of attention heads

  • att_dim_out – Output dimension (number of output units)

  • dropout – Dropout rate on attention scores

class gluonts.mx.model.transformer.layers.MultiHeadSelfAttention(att_dim_in: int = 32, heads: int = 8, att_dim_out: int = 32, dropout: float = 0.0, **kwargs)[source]#

Bases: gluonts.mx.model.transformer.layers.MultiHeadAttentionBase

Multi-head self-attention. Independent linear projections of inputs serve as queries, keys, and values for the attention.

Parameters
  • att_dim_in – Attention dimension (number of hidden units)

  • heads – Number of attention heads

  • att_dim_out – Output dimension (number of output units)

  • dropout – Dropout rate on attention scores

hybrid_forward(F, inputs: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], mask: Optional[Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol]] = None, cache: Optional[Dict[str, Optional[Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol]]]] = None) Tuple[Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], Optional[Dict]][source]#

Computes multi-head attention on a set of inputs, serving as queries, keys, and values. If sequence lengths are provided, they will be used to mask the attention scores. May also use a cache of previously computed inputs.

Parameters
  • inputs – Input data of shape (batch_size, max_length, att_dim_in)

  • mask – Optional tensor to mask attention scores

  • cache – Optional dictionary of previously computed keys and values

Returns

A tensor of shape (batch_size, max_length, att_dim_out)

Return type

Tensor

class gluonts.mx.model.transformer.layers.TransformerFeedForward(inner_dim: int = 32, out_dim: int = 32, act_type: str = 'softrelu', dropout: float = 0.0, **kwargs)[source]#

Bases: mxnet.gluon.block.HybridBlock

Position-wise feed-forward network with activation.

\[activation(XW_1 + b_1)W_2 + b_2\]

\(W_1\): (batch_size, d, inner_dim) \(W_2\): (batch_size, inner_dim, out_dim)

hybrid_forward(F, x: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], *args) Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol][source]#

Position-wise feed-forward network with activation.

Parameters

x – Tensor of shape (batch_size, d, in_dim)

Return type

Tensor of shape (batch_size, d1, out_dim)

class gluonts.mx.model.transformer.layers.TransformerProcessBlock(sequence: str, dropout: float, **kwargs)[source]#

Bases: mxnet.gluon.block.HybridBlock

Block to perform pre/post processing on layer inputs.

The processing steps are determined by the sequence argument, which can contain one of the three operations: n: layer normalization r: residual connection d: dropout

hybrid_forward(F, data: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], prev: Optional[Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol]] = None) Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol][source]#

Apply processing sequence to data with optional previous input.

Parameters
  • data – Input data of shape: (batch_size, length, num_hidden)

  • prev – Previous data of shape (batch_size, length, num_hidden)

Return type

Processed data of shape (batch_size, length, num_hidden).

gluonts.mx.model.transformer.layers.combine_heads(F, x: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], dim_per_head: int, heads: int) Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol][source]#
Parameters
  • x – Tensor of shape (batch_size * heads, time_length, dim_per_head)

  • dim_per_head – Dimension per head

  • heads – Number of heads

Return type

Tensor of shape (batch_size, time_length, dim)

gluonts.mx.model.transformer.layers.dot_attention(F, queries: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], keys: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], values: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], mask: Optional[Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol]] = None, dropout: float = 0.0) Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol][source]#
Parameters
  • queries – Attention queries of shape (n, lq, d)

  • keys – Attention keys of shape (n, lk, d)

  • values – Attention values of shape (n, lk, dv)

  • mask – Optional mask tensor

  • dropout – Dropout rate

Return type

‘Context’ vectors for each query of shape (n, lq, dv)

gluonts.mx.model.transformer.layers.split_heads(F, x: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], dim_per_head: int, heads: int) Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol][source]#

Returns a tensor with head dimension folded into batch and last dimension divided by the number of heads.

Parameters
  • x – Tensor of shape (batch_size, time_length, dim).

  • dim_per_head – Dimension per head

  • heads – Number of heads

Return type

Tensor of shape (batch_size * heads, time_length, dim_per_head).