Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does DeepSpeed's Pipeline-Parallelism optimizer supports skip connections? #932

Open
RoyMahlab opened this issue Oct 17, 2024 · 0 comments

Comments

@RoyMahlab
Copy link

In your example you convert the AlexNet into a list of layers:

def join_layers(vision_model):

    layers = [
        *vision_model.features,
        vision_model.avgpool,
        lambda x: torch.flatten(x, 1),
        *vision_model.classifier,
    ]
    return layers

which is later inserted to PipelineModule

net = AlexNet(num_classes=10)
net = PipelineModule(layers=join_layers(net),
                     loss_fn=torch.nn.CrossEntropyLoss(),
                     num_stages=args.pipeline_parallel_size,
                     partition_method=part,
                     activation_checkpoint_interval=0)

This seems to run-over the forward module that you built in your AlexNet module, which makes me wonder about the possibility of having skip-connections in my module while using DeepSpeed's Pipeline-Parallelism optimizer.

Many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant