gluonts.mx.model.transformer.layers module#
- class gluonts.mx.model.transformer.layers.InputLayer(model_size: int = 64, **kwargs)[source]#
Bases:
mxnet.gluon.block.HybridBlock
Transforms the input vector to model_size with an one-layer MPL, i.e., (batch_size, time_length, input_dim) -> (batch_size, time_length, model_size)
- class gluonts.mx.model.transformer.layers.LayerNormalization(scale_init: str = 'ones', shift_init: str = 'zeros', eps: float = 1e-06, **kwargs)[source]#
Bases:
mxnet.gluon.block.HybridBlock
Implements layer normalization as proposed in [BKH16].
- hybrid_forward(F, data: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol]) Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol] [source]#
Normalizes hidden units of data as follows:
data = scale * (data - mean) / sqrt(var + eps) + shift
Normalization is performed over the last dimension of the input data.
- Parameters
data – Data to normalize of shape (d0, …, dn, num_hidden)
- Returns
Normalized inputs of shape
- Return type
(d0, …, dn, num_hidden)
- class gluonts.mx.model.transformer.layers.MultiHeadAttention(att_dim_in: int = 32, heads: int = 8, att_dim_out: int = 32, dropout: float = 0.0, **kwargs)[source]#
Bases:
gluonts.mx.model.transformer.layers.MultiHeadAttentionBase
Multi-head attention layer for queries independent from keys/values.
- Parameters
att_dim_in – Attention dimension (number of hidden units)
heads – Number of attention heads
att_dim_out – Output dimension (number of output units)
dropout – Dropout rate on attention scores
- hybrid_forward(F, queries: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], memory: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], mask: Optional[Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol]] = None) Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol] [source]#
Computes multi-head attention for queries given a memory tensor. If sequence lengths are provided, they will be used to mask the attention scores. A mask tensor may also be used to mask the attention scores. Returns a tensor of shape (batch_size, max_length, att_dim_out).
- Parameters
queries – Queries tensor of shape (batch_size, query_max_length, att_dim_in)
memory – Memory tensor to attend to of shape (batch_size, memory_max_length, att_dim_in)
mask – Optional tensor to mask attention scores
- Return type
Tensor of shape (batch_size, query_seq_len, att_dim_out)
- class gluonts.mx.model.transformer.layers.MultiHeadAttentionBase(att_dim_in: int = 32, heads: int = 8, att_dim_out: int = 32, dropout: float = 0.0, **kwargs)[source]#
Bases:
mxnet.gluon.block.HybridBlock
Base class for Multi-head attention.
- Parameters
att_dim_in – Attention dimension (number of hidden units)
heads – Number of attention heads
att_dim_out – Output dimension (number of output units)
dropout – Dropout rate on attention scores
- class gluonts.mx.model.transformer.layers.MultiHeadSelfAttention(att_dim_in: int = 32, heads: int = 8, att_dim_out: int = 32, dropout: float = 0.0, **kwargs)[source]#
Bases:
gluonts.mx.model.transformer.layers.MultiHeadAttentionBase
Multi-head self-attention. Independent linear projections of inputs serve as queries, keys, and values for the attention.
- Parameters
att_dim_in – Attention dimension (number of hidden units)
heads – Number of attention heads
att_dim_out – Output dimension (number of output units)
dropout – Dropout rate on attention scores
- hybrid_forward(F, inputs: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], mask: Optional[Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol]] = None, cache: Optional[Dict[str, Optional[Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol]]]] = None) Tuple[Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], Optional[Dict]] [source]#
Computes multi-head attention on a set of inputs, serving as queries, keys, and values. If sequence lengths are provided, they will be used to mask the attention scores. May also use a cache of previously computed inputs.
- Parameters
inputs – Input data of shape (batch_size, max_length, att_dim_in)
mask – Optional tensor to mask attention scores
cache – Optional dictionary of previously computed keys and values
- Returns
A tensor of shape (batch_size, max_length, att_dim_out)
- Return type
Tensor
- class gluonts.mx.model.transformer.layers.TransformerFeedForward(inner_dim: int = 32, out_dim: int = 32, act_type: str = 'softrelu', dropout: float = 0.0, **kwargs)[source]#
Bases:
mxnet.gluon.block.HybridBlock
Position-wise feed-forward network with activation.
\[activation(XW_1 + b_1)W_2 + b_2\]\(W_1\): (batch_size, d, inner_dim) \(W_2\): (batch_size, inner_dim, out_dim)
- hybrid_forward(F, x: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], *args) Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol] [source]#
Position-wise feed-forward network with activation.
- Parameters
x – Tensor of shape (batch_size, d, in_dim)
- Return type
Tensor of shape (batch_size, d1, out_dim)
- class gluonts.mx.model.transformer.layers.TransformerProcessBlock(sequence: str, dropout: float, **kwargs)[source]#
Bases:
mxnet.gluon.block.HybridBlock
Block to perform pre/post processing on layer inputs.
The processing steps are determined by the sequence argument, which can contain one of the three operations: n: layer normalization r: residual connection d: dropout
- hybrid_forward(F, data: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], prev: Optional[Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol]] = None) Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol] [source]#
Apply processing sequence to data with optional previous input.
- Parameters
data – Input data of shape: (batch_size, length, num_hidden)
prev – Previous data of shape (batch_size, length, num_hidden)
- Return type
Processed data of shape (batch_size, length, num_hidden).
- gluonts.mx.model.transformer.layers.combine_heads(F, x: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], dim_per_head: int, heads: int) Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol] [source]#
- Parameters
x – Tensor of shape (batch_size * heads, time_length, dim_per_head)
dim_per_head – Dimension per head
heads – Number of heads
- Return type
Tensor of shape (batch_size, time_length, dim)
- gluonts.mx.model.transformer.layers.dot_attention(F, queries: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], keys: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], values: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], mask: Optional[Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol]] = None, dropout: float = 0.0) Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol] [source]#
- Parameters
queries – Attention queries of shape (n, lq, d)
keys – Attention keys of shape (n, lk, d)
values – Attention values of shape (n, lk, dv)
mask – Optional mask tensor
dropout – Dropout rate
- Return type
‘Context’ vectors for each query of shape (n, lq, dv)
- gluonts.mx.model.transformer.layers.split_heads(F, x: Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol], dim_per_head: int, heads: int) Union[mxnet.ndarray.ndarray.NDArray, mxnet.symbol.symbol.Symbol] [source]#
Returns a tensor with head dimension folded into batch and last dimension divided by the number of heads.
- Parameters
x – Tensor of shape (batch_size, time_length, dim).
dim_per_head – Dimension per head
heads – Number of heads
- Return type
Tensor of shape (batch_size * heads, time_length, dim_per_head).