Description of the project: GNU toolchain is used widely for building embedded targets. There’s a certain momentum in the Clang/LLVM community towards improving the Clang toolchain to support embedded targets. Using the Clang toolchain as an alternative can help us improve code quality, find and fix security bugs, improve developer experience and take advantage of the new ideas and the momentum surrounding the Clang/LLVM community in supporting embedded devices.

A non-comprehensive list of improvements that can be made to LLD:

  • “–print-memory-usage” support
    “–print-memory-usage” in GCC provides a breakdown of the memory used in each memory region defined in the linker file. Embedded developers use this flag to understand the impact on memory. Often embedded systems define multiple memory regions with different space constraints. Supporting this in Clang toolchain will help projects that wish to use Clang toolchain for their projects.

  • Linkmap
    Currently, the LLD linker’s linkmap output is not as rich as the BFD linker output. Achieving feature parity on linkmap output will be highly useful in analyzing the binaries created by the LLD linker. Further, outputting linkmap in different formats (current LLD output, BFD, and JSON) can help build automation tools for investigating the artifacts produced by the linker.

  • “–print-gc-sections” improvement
    When the “–print-gc-sections” flag is enabled, LLD prints the sections that were discarded during the linking process. This information currently does not include the mapping between the symbol and the section groups, which is useful for debugging. Preserving this information during the linking process will require modifications to internal linker data structures.

Project size: 240 hrs

Difficulty: Medium/Hard

Skills: C++

Expected outcomes:

  • Implementation of “–print-memory-usage” flag.
  • Support for new linkmap output formats 1. BFD and 2. JSON.
  • Improved “–print-gc-sections” output to include information about the surviving symbols.

Mentors:

8 Likes

This might be a better place to post

I moved the post, thanks for noticing this!

Hi, I am interested to contribute in this. I am a student and I have intermediate knowledge of C++ and compilers, Can you kindly guide me on where to learn more about LLD?
and where can I find the codebase related to LLD?

can you explain the BFD linker feature that is to implement in the LLD linker?
thanks,

Are you looking for introductory LLD information as well as the specific feature? The codebase is the top-level “lld” folder in the LLVM monorepo https://github.com/llvm/llvm-project as well as other files from the repo. Info at https://lld.llvm.org . Personally I found Levine’s “Linkers & Loaders” book useful for background knowledge and ELF format, and tracing through the code at run time useful to see how it deals with each object file.

2 Likes

Hi @nigelp-xmos,
Thank you for the helpful information.

Hi Sahil! Glad you are interested in compilers and linkers. Learning about compilers and linkers is a long term process. Nigel gave you some places to begin. Here’s the popular 20 part linker series from Ian Lance Taylor if you’d like to dive deep into linkers: A ToC of the 20 part linker essay [LWN.net]

As far as this project goes, knowledge about compilers and linkers would be nice but not necessary. The initial improvements proposed can be carried out by anyone with a reasonable C++ expertise and software engineering skills.

Linkmap – Link map is an artifact produced by the linker. The BFD linker and LLD linker produces different link map formats. Good place to start would be to take a look at this earlier effort: ⚙ D63190 Add -gnu-map option to emit a map file in the GNU-tsyle format. which tried to produce BFD output format from LLD.

2 Likes

hi @prabhuk,
thanks for guidance

I have experience in writing c++ code.

after gaining some knowledge about linker from here, then I will back to you.

can you explain this in a bit more detail?

Hello,

My name is Ruturaj and I am a PhD student at University of Kansas. I am really interested in working on this project. The goal of my Phd thesis is to apply compiler-level security techniques at binary-level and assess the challenges in doing so. I have experience in binary analysis and reverse engineering. Previously, I interned at Meta and worked on creating program instrumentation framework using LLVM-pass (and thus have some experience with the Phabricator). I also used LLVM lto passes to recuperate some information to use it as a ground truth in my research. I also worked as a TA for compilers in my school.

I don’t have any development experience in any open source compiler projects, nor I ever built a linker. However, I really like to help the community. I would like to ask the mentors if my experience is really suited to be a match for this project.

I am reading the resources that @nigelp-xmos and @Prabhuk have provided, and I am also reading some sources on contributing to the llvm-project.

does it provide only static memory usage?

Linkmap –
is the BFD linker output same as the gold output? and do you have any example of JSON format?

Hii @Prabhuk @petrhosek,
I am writing a proposal for this project, so where can we discuss it?

Hii @Prabhuk @petrhosek
I am interested in the project and would like to contribute in the project.
I know intermediate c++. I am getting started with compilers.
Please guide me on how to begin with the project.

@Prabhuk
I am writing a draft proposal for GSoC. Should I publish it here? Or should we communicate via email?

Hello, I am a current Computer Engineering student at the University of Illinois. I was wondering how to get into contact with those involved in the project, and to discuss proposal. I am interested in compiling for embedded systems with a focus on security.

Hello,
I am a fresh grad computer systems engineer, I contributed to LLVM by fixing several issues.
I have a good knowledge of compilers, and linkers.
Is this project still open?

Hi Mohamed! This position was for 2023 and it is not open anymore.