Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicitly initialize RNN/LSTM/GRU states #2314

Merged
merged 5 commits into from
Nov 24, 2022

Conversation

gvtulder
Copy link
Contributor

Description of changes:

The current from-scratch implementations of the RNN, LSTM and GRU do some complicated juggling with the initial states of the recurrent module.

Example for the current LSTM:

@d2l.add_to_class(LSTMScratch)
def forward(self, inputs, H_C=None):
H, C = None, None if H_C is None else H_C
outputs = []
for X in inputs:
I = d2l.sigmoid(d2l.matmul(X, self.W_xi) + (
d2l.matmul(H, self.W_hi) if H is not None else 0) + self.b_i)
if H is None:
H, C = d2l.zeros_like(I), d2l.zeros_like(I)
F = d2l.sigmoid(d2l.matmul(X, self.W_xf) +
d2l.matmul(H, self.W_hf) + self.b_f)
O = d2l.sigmoid(d2l.matmul(X, self.W_xo) +
d2l.matmul(H, self.W_ho) + self.b_o)
C_tilde = d2l.tanh(d2l.matmul(X, self.W_xc) +
d2l.matmul(H, self.W_hc) + self.b_c)
C = F * C + I * C_tilde
H = O * d2l.tanh(C)
outputs.append(H)
return outputs, (H, C)

While this is clever and avoids a little bit of code, it has a few downsides:

  • The initial state is initialized inside the for loop, which requires an if H is None block that runs at every time step, but only does something in the first step of the RNN. From a good programming perspective, this if should be placed outside the loop.
  • This lazy initialization obscures the fact that we need to initialize the state of the RNN to something before the first time step. I think this makes the model more difficult to understand than necessary. (For example, I had a discussion with a student where this initialization led to some confusion.)

With this pull request, I'd like to suggest some changes to initialize the states explicitly before the for loop. This requires one or two extra lines of code, but I think it would look cleaner and might make the models easier to follow.

Note that I added a self.num_hiddens to the RNN/LSTM/GRU classes. This is isn't absolutely necessary, you could also get this from the shape of the weights, but I think giving it a name might help.


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@d2l-bot
Copy link
Member

d2l-bot commented Sep 30, 2022

Job d2l-en/PR-2314/1 is complete.
Check the results at http://preview.d2l.ai/d2l-en/PR-2314/

Copy link
Member

@astonzhang astonzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I removed self.num_hiddens = num_hiddens because it's automatically done by self.save_hyperparameters()

chapter_recurrent-neural-networks/rnn-scratch.md Outdated Show resolved Hide resolved
chapter_recurrent-modern/gru.md Outdated Show resolved Hide resolved
chapter_recurrent-modern/lstm.md Outdated Show resolved Hide resolved
@astonzhang astonzhang merged commit c5a555b into d2l-ai:master Nov 24, 2022
@astonzhang
Copy link
Member

@gvtulder We've added your name in our acknowledgement https://github.com/d2l-ai/d2l-en/blob/master/chapter_preface/index.md
Feel free to send another PR if there's any mistake in the name :)

@gvtulder gvtulder deleted the f-improve-rnn-initial-states branch November 25, 2022 19:49
@astonzhang astonzhang mentioned this pull request Dec 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants