Explicitly initialize RNN/LSTM/GRU states #2314

gvtulder · 2022-09-30T16:29:04Z

Description of changes:

The current from-scratch implementations of the RNN, LSTM and GRU do some complicated juggling with the initial states of the recurrent module.

Example for the current LSTM:

d2l-en/chapter_recurrent-modern/lstm.md

Lines 286 to 304 in f4432f3

    
           @d2l.add_to_class(LSTMScratch) 
        
           def forward(self, inputs, H_C=None): 
        
               H, C = None, None if H_C is None else H_C 
        
               outputs = [] 
        
               for X in inputs: 
        
                   I = d2l.sigmoid(d2l.matmul(X, self.W_xi) + ( 
        
                       d2l.matmul(H, self.W_hi) if H is not None else 0) + self.b_i) 
        
                   if H is None: 
        
                       H, C = d2l.zeros_like(I), d2l.zeros_like(I) 
        
                   F = d2l.sigmoid(d2l.matmul(X, self.W_xf) + 
        
                                   d2l.matmul(H, self.W_hf) + self.b_f) 
        
                   O = d2l.sigmoid(d2l.matmul(X, self.W_xo) + 
        
                                   d2l.matmul(H, self.W_ho) + self.b_o) 
        
                   C_tilde = d2l.tanh(d2l.matmul(X, self.W_xc) + 
        
                                      d2l.matmul(H, self.W_hc) + self.b_c) 
        
                   C = F * C + I * C_tilde 
        
                   H = O * d2l.tanh(C) 
        
                   outputs.append(H) 
        
               return outputs, (H, C)

While this is clever and avoids a little bit of code, it has a few downsides:

The initial state is initialized inside the for loop, which requires an if H is None block that runs at every time step, but only does something in the first step of the RNN. From a good programming perspective, this if should be placed outside the loop.
This lazy initialization obscures the fact that we need to initialize the state of the RNN to something before the first time step. I think this makes the model more difficult to understand than necessary. (For example, I had a discussion with a student where this initialization led to some confusion.)

With this pull request, I'd like to suggest some changes to initialize the states explicitly before the for loop. This requires one or two extra lines of code, but I think it would look cleaner and might make the models easier to follow.

Note that I added a self.num_hiddens to the RNN/LSTM/GRU classes. This is isn't absolutely necessary, you could also get this from the shape of the weights, but I think giving it a name might help.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

d2l-bot · 2022-09-30T16:49:00Z

Job d2l-en/PR-2314/1 is complete.
Check the results at http://preview.d2l.ai/d2l-en/PR-2314/

astonzhang

Thanks. I removed self.num_hiddens = num_hiddens because it's automatically done by self.save_hyperparameters()

chapter_recurrent-neural-networks/rnn-scratch.md

chapter_recurrent-modern/gru.md

chapter_recurrent-modern/lstm.md

astonzhang · 2022-11-24T00:36:08Z

@gvtulder We've added your name in our acknowledgement https://github.com/d2l-ai/d2l-en/blob/master/chapter_preface/index.md
Feel free to send another PR if there's any mistake in the name :)

Explicitly initialize RNN/LSTM/GRU states

a654452

astonzhang requested a review from AnirudhDagar September 30, 2022 23:13

Merge branch 'master' into f-improve-rnn-initial-states

db20776

astonzhang approved these changes Nov 24, 2022

View reviewed changes

chapter_recurrent-neural-networks/rnn-scratch.md Outdated Show resolved Hide resolved

chapter_recurrent-modern/gru.md Outdated Show resolved Hide resolved

chapter_recurrent-modern/lstm.md Outdated Show resolved Hide resolved

astonzhang added 3 commits November 23, 2022 16:32

Update chapter_recurrent-neural-networks/rnn-scratch.md

5db822e

Update chapter_recurrent-modern/gru.md

e5a2db4

Update chapter_recurrent-modern/lstm.md

f0d0189

astonzhang merged commit c5a555b into d2l-ai:master Nov 24, 2022

gvtulder deleted the f-improve-rnn-initial-states branch November 25, 2022 19:49

astonzhang mentioned this pull request Dec 1, 2022

JAX: Add LSTM #2366

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explicitly initialize RNN/LSTM/GRU states #2314

Explicitly initialize RNN/LSTM/GRU states #2314

gvtulder commented Sep 30, 2022

d2l-bot commented Sep 30, 2022

astonzhang left a comment

astonzhang commented Nov 24, 2022

	@d2l.add_to_class(LSTMScratch)
	def forward(self, inputs, H_C=None):
	H, C = None, None if H_C is None else H_C
	outputs = []
	for X in inputs:
	I = d2l.sigmoid(d2l.matmul(X, self.W_xi) + (
	d2l.matmul(H, self.W_hi) if H is not None else 0) + self.b_i)
	if H is None:
	H, C = d2l.zeros_like(I), d2l.zeros_like(I)
	F = d2l.sigmoid(d2l.matmul(X, self.W_xf) +
	d2l.matmul(H, self.W_hf) + self.b_f)
	O = d2l.sigmoid(d2l.matmul(X, self.W_xo) +
	d2l.matmul(H, self.W_ho) + self.b_o)
	C_tilde = d2l.tanh(d2l.matmul(X, self.W_xc) +
	d2l.matmul(H, self.W_hc) + self.b_c)
	C = F * C + I * C_tilde
	H = O * d2l.tanh(C)
	outputs.append(H)
	return outputs, (H, C)

Explicitly initialize RNN/LSTM/GRU states #2314

Explicitly initialize RNN/LSTM/GRU states #2314

Conversation

gvtulder commented Sep 30, 2022

d2l-bot commented Sep 30, 2022

astonzhang left a comment

Choose a reason for hiding this comment

astonzhang commented Nov 24, 2022