LFCS: ARM, control groups, and the next 20 years
The recently held Linux Foundation Collaboration Summit (LFCS) had its traditional kernel panel on April 6 at which Andrew Morton, Arnd Bergmann, James Bottomley, and Thomas Gleixner sat down to discuss the kernel with moderator Jonathan Corbet. Several topics were covered, but the current struggles in the ARM community were clearly at the forefront of the minds of participants and audience members alike.
Each of the kernel hackers introduced themselves, some with tongue planted
firmly in cheek, such as Bottomley with a declaration that he was on the
panel "to meet famous kernel developers
", and Morton who said
he spent most of his time trying to figure out what the other kernel
hackers are doing to the memory management subsystem. Bergmann was a bit
modest about his contributions, so Gleixner pointed out that Bergmann had
done the last chunk of work required to remove the big kernel lock, which
was greeted with a big round of applause. For his part, Gleixner was a bit
surprised to find out that he manages bug reports for NANA flash (based
on a typo on the giant slides on either side of the stage), but noted that he
specialized in
"impossible tasks
" like getting the realtime preemption
patches into the mainline piecewise.
There is a "high-level architectural issue
" that Corbet wanted
the panel to tackle first, and that was the current problems in the ARM
world. It is "one of our more important architectures
", he
said, without which we wouldn't have all these different Android phones to
play with. So, it is "discouraging to see that there is a
mess
" in the ARM kernel community right now. What's the situation,
he asked, and how can we improve things?
For a long time, the problem in the ARM community was convincing
system-on-chip (SoC) and
board vendors to get their code upstream, Bergmann said, but now there is a
new problem in that they all "have their own subtrees that don't work
very well together
". Each of those trees is going their own way,
which means that core and driver code gets copied "five times or twenty
times
" into different SoC trees.
Corbet asked how the kernel community can do better with respect to
ARM. Gleixner noted that ARM maintainer Russell King tries to push back on
bad code coming in, "but he simply doesn't scale
". There are
70 different sub-architectures and 500 different SoCs in the ARM tree, he
said. In addition, "people have been pushing sub-arch trees directly
to Linus
", Bergmann said, so King does not have any control over
those. It is a consequence of the "tension between cleanliness and
time-to-market
", Bottomley said.
Gleixner thinks that the larger kernel community should be providing the
ARM vendors with "proper abstractions
" and that because of a
lack of a big picture view, those vendors cannot be expected to come up
with those themselves. By and large the ARM vendor community has a
different mindset that comes from other operating systems where changes to
the core code were impossible, so 500 line workarounds in drivers were the
norm. Bergmann suggested that the vendors get the code reviewed and
upstream before shipping products with that code. Morton said that
as the "price of admission
" vendors need to be asked to
maintain various pieces horizontally across the ARM trees. Actually
motivating them to do that is difficult, he said.
From the audience, Wolfram Sang asked whether more code review for the ARM
patches would help. All agreed that more code review is good, but
Bottomley expressed some reservations because there are generally only a
few reviewers that a subsystem maintainer can trust to spot important
issues, so all code review is not created equal. Morton suggested a
"review economy
" where one patch submitter needs to review the
code of another and vice versa. That would allow developers to justify the
time spent reviewing code to their managers. But, Bottomley said,
"collaborating with competitors
" is a hard concept for
organizations that are new to open source development.
If a driver looks like one that is already in the tree, it should not be
merged, and instead someone needs to
get the developers to work with the existing driver, Bergmann said. There
is a lot of reuse of IP blocks in SoCs, but the developers aren't aware of
it because different teams are work on the different SoCs, Gleixner said.
The kernel
community needs people that can figure that out, he said. Bottomley
observed that "the first question should be: did anyone do it before and
can I 'steal' it?
".
In response to an audience question about the panel's thoughts on Linaro,
Bergmann, who works with Linaro, said "I think it's great
" with
a smile. He went on to say that Linaro is doing work that is closely related
to the ARM problems that had been discussed. Getting different SoC vendors
to work together is a big part of what Linaro is doing, and that
"everyone suffers
" if that collaboration doesn't happen.
"ARM is one of the places where it [collaboration] is needed
most
", he said.
Control groups
The discussion soon shifted to control groups, with Corbet noting that they
are becoming more pervasive in the kernel, but that lots of kernel hackers
hate them. It will soon be difficult to get a distribution to boot and run
without control groups, he said, and wondered if adding them to the kernel
was the right move: "did we make a
mistake?
" Gleixner said that there is nothing wrong with control
groups conceptually, "just that the code is a
horror
". Bottomley lamented the code that is getting "grafted
onto the side of control groups
" as each resource in the kernel that is
getting controlled requires reaching into multiple subsystems in rather
intrusive ways.
As with "everything that sucks
" in the kernel, control groups
needs to be
cleaned up by someone who looks at it from a global perspective; that
person will have to
"reimplement it and radically modify it
", Gleixner said. That
is difficult to do because it is both a technical and a political problem,
Bottomley said. The technical part is to get the interaction right,
while the political part is that it is difficult to make changes across
subsystem boundaries in the kernel.
But Morton said that he hadn't seen much in the way of specific complaints
about control groups cross his desk. Conceptually, it extends what an
operating system should do in terms of limiting resources. "If it's
messy, it's because of how it was developed
" on top of a production
kernel that gets updated every three months. Bottomley said that the
problem with doing cross-subsystem work is often just a matter of
communication, but it also requires someone to take ownership and talk to
all of the affected subsystems rather than just picking the "weakest
subsystem
" and getting changes in through there.
Corbet wondered if the independence of subsystems in the kernel, something that was very helpful in allowing its development to scale, was changing. The panel seemed to think there wasn't much of an issue there, that while control groups crossed a lot of boundaries, naming five things like that in the kernel would be hard to do as Bottomley pointed out.
Twenty years ahead
With the 20 year anniversary of Linux being celebrated this year, Jon Masters asked from the audience, what would things be like 20 years from now. Bottomley promptly replied that four-fifths of the panel would be retired, but Gleixner expected that the 2038 bug would have brought them all back out of retirement. Morton said that unless some kind of quantum computer came along to make Linux obsolete, it would still be there in 20 years. He also expected that the first thing to be done with any new quantum computer would be to add an x86 emulation layer.
When Corbet posited that perhaps the realtime preempt code would be merged
by then, Gleixner made one his strongest predictions yet for merging that
code: "I am planning to be done with it before I retire
".
More seriously, he said that it is on a good track, he has talked to the
relevant subsystem maintainers, and is optimistic about getting it all
merged—eventually.
In 20 years, the kernel will still be supporting the existing user-space
interfaces, Corbet said. He quoted Morton from a recent kernel mailing list post: "Our hammer is kernel patches and all problems
look like nails
", and wondered whether there was a problem with how
the kernel hackers developed user-space interfaces. Morton noted that the
quote was about doing more pretty printing inside the kernel, which he is
generally opposed to. It has been done in the past because it was
difficult for the kernel hackers to ship user-space code, so that it would
stay in sync with kernel changes. But perf has demonstrated that the
kernel can ship user-space code, which could be a way forward.
Gleixner noted that there was quite a bit of resistance to shipping perf,
but that it worked out pretty well as a way to "keep the strict
connection between the kernel and user space
". Perf is meant to be
a simple tool to allow users to try out perf events gathering, he said, and
that people are building more full-blown tools on top of perf. Having
tools shipped with the kernel allows more freedom to experiment with the
ABI, Bottomley said. Morton said that there needs to be a middle ground,
noting that Google had a patch that exported a procfs file that
contained a shell script inside.
Ingo Molnar recently pointed out that FreeBSD is getting Linux-like quality
with a much smaller development community and suggested that it was because
the user space and kernel are developed together. Corbet asked whether
Linux was holding itself back by not taking that route. Bottomley thought
that Molnar was "both right and wrong
", and that FreeBSD has
an entire distribution in its kernel tree. "I hope Linux never gets
to that
", he said.
From perf to control groups, FreeBSD to ARM, as usual, the panel ranged over a number of topics in the hour allotted. The format and participants vary from year to year, but it is always interesting to hear what kernel developers are thinking about issues that Linux is facing.
Index entries for this article | |
---|---|
Conference | Collaboration Summit/2011 |
Posted Apr 13, 2011 16:57 UTC (Wed)
by cpeterso (guest, #305)
[Link] (11 responses)
Perhaps the cleanest and/or most popular ARM sub-architecture should be blessed as the primary platform code (like the x86 32/64-bit merge). If the other sub-architectures don't want to play, they can be developed out-of-tree. If a sub-architecture is only used by its developer, then who benefits from it being included in the Linus tree?
Posted Apr 13, 2011 19:04 UTC (Wed)
by mrfredsmoothie (guest, #3100)
[Link]
Posted Apr 13, 2011 21:17 UTC (Wed)
by drdabbles (guest, #48755)
[Link] (5 responses)
In the arm space, there are loads of custom parts. Custom CPUs, custom boards, etc. Very little is the same across multiple devices, which makes it very difficult to have a generic platform to bless as your standard. In fact, there are still ARM chips in heavy use that have no MMU.
This can most easily be seen by building a defconfig for an x86 system and booting it and trying to build a defconfig for an ARM device.
Posted Apr 14, 2011 6:06 UTC (Thu)
by james (subscriber, #1325)
[Link]
Posted Apr 14, 2011 13:24 UTC (Thu)
by BenHutchings (subscriber, #37955)
[Link] (3 responses)
Most platforms have basic devices like an interrupt controller and interval timer that can just be assumed to be present. On x86 you get a PIC or APICs and a PIT or HPET. The newer versions even have backward-compatible modes. On ARM there are a wide variety of interrupt controllers and timers, and no reasonable way to probe for them. The boot loader is supposed to tell the kernel which type of machine it's booting on, and sometimes the system vendor even gets this right... but AFAIK the kernel isn't yet able to initialise the right set of drivers based solely on that.
Posted Apr 14, 2011 18:54 UTC (Thu)
by dlang (guest, #313)
[Link] (1 responses)
based on this I don't believe that it works that way today.
Posted Apr 14, 2011 23:28 UTC (Thu)
by BenHutchings (subscriber, #37955)
[Link]
The current discussion is about having the boot loader pass in a fuller machine description (the Device Tree). This may simply be tacked onto the end of the kernel image in some way, so there is no need to modify existing boot loaders.
Posted Apr 15, 2011 18:11 UTC (Fri)
by p2mate (subscriber, #51563)
[Link]
Posted Apr 14, 2011 2:26 UTC (Thu)
by xxiao (guest, #9631)
[Link] (3 responses)
I hope Linus works at Linaro to achieve this shift. LF is basically an Intel shop, though I could be wrong.
Posted Apr 14, 2011 16:20 UTC (Thu)
by cpeterso (guest, #305)
[Link] (2 responses)
Posted Apr 15, 2011 14:20 UTC (Fri)
by mrfredsmoothie (guest, #3100)
[Link]
Posted Apr 16, 2011 14:38 UTC (Sat)
by oak (guest, #2786)
[Link]
Posted Apr 13, 2011 20:56 UTC (Wed)
by patrick_g (subscriber, #44470)
[Link] (8 responses)
Posted Apr 13, 2011 20:58 UTC (Wed)
by corbet (editor, #1)
[Link]
Posted Apr 14, 2011 2:52 UTC (Thu)
by elanthis (guest, #6227)
[Link] (6 responses)
One or two fantastic developers are worth 20 mediocre developers, and each of those mediocre developers is worth 100 idiot developers. (And the really bad developers are basically just negative worth and cancel out the value of better developers on the same team.)
Linux -- and much FOSS -- is praised for a many-eyes, scratch-an-itch approach to development. Anyone with real experience with software development knows better. A small, skilled team that has a focused set of real-world goals is going to accomplish much higher quality work than what a throng of "newbs" can pull off. Linux has a lot of skilled devs, but so does FreeBSD, and the masses of amateur Linux contributors just don't amount to much in comparison.
Posted Apr 14, 2011 6:56 UTC (Thu)
by jthill (subscriber, #56558)
[Link]
Posted Apr 14, 2011 9:43 UTC (Thu)
by error27 (subscriber, #8346)
[Link]
You do see people learn as well. You reject their patches and they start breaking them up properly and writing better changelogs. It sounds stupid but it makes a big difference.
Posted Apr 15, 2011 21:02 UTC (Fri)
by rilder (guest, #59804)
[Link] (3 responses)
The 'numbers' that people point out wrt. FOSS needn't be just the core developers but it can also be testers, distro packagers, bug reporters, users. So more people scrutinizing the code will mean more eyes naturally. What FreeBSD lacks and Linux has is the concept of diverse distros from Gentoo,Exherbo to Ubuntu which will provide a whole spectrum of feedback to upstream developers. The feedback IMO is quite as important as other things.
People may be quick to point out the fragmentation this may have caused but a richer, diverse user/development base also contributes to richer products on long term, fittest stay and sustain (think of Genetic algorithms).
OTOH, FreeBSD which has been emphatically been pointed is *NOT* up to the mark in desktop, though they are quite good in servers. They still use a O(1) scheduler which Linux used eons back. A quick peek at their GSOc ideas page quickly reveals a stark difference between features in Linux and FreeBSD. I can point out specifics but don't want to since I love both and aware of their field of application.
"
Posted Apr 16, 2011 10:07 UTC (Sat)
by WolfWings (subscriber, #56790)
[Link] (2 responses)
And many of the largest users of Linux have forward-ported the O(1) scheduler *BACK* into Linux as of 2.6.26 at least, and some users are using the tree w/ the BrainFuck scheduler instead, which is another O(1) scheduler that works all the way up to 2.6.38 already.
http://lwn.net/Articles/357658/
http://ck.kolivas.org/patches/bfs/
So, just because the 'official' mainline Linux scheduler is no longer the 'old O(1) scheduler' doesn't mean that O(1) schedulers are dead, they're quite actively used, and many people prefer them to the recent attempts to add more brains to the scheduler. Much like trying to add filesystem-aware processing to hard drives, I don't think adding deeply program-aware tuning to schedulers without trusting the programs to tell the kernel about themselves will work, long-term.
Posted Apr 16, 2011 13:46 UTC (Sat)
by corbet (editor, #1)
[Link]
Posted Apr 16, 2011 20:39 UTC (Sat)
by rilder (guest, #59804)
[Link]
Posted Apr 14, 2011 11:20 UTC (Thu)
by psankar (guest, #68004)
[Link]
LFCS: ARM, control groups, and the next 20 years
LFCS: ARM, control groups, and the next 20 years
LFCS: ARM, control groups, and the next 20 years
LFCS: ARM, control groups, and the next 20 years
LFCS: ARM, control groups, and the next 20 years
LFCS: ARM, control groups, and the next 20 years
LFCS: ARM, control groups, and the next 20 years
LFCS: ARM, control groups, and the next 20 years
LFCS: ARM, control groups, and the next 20 years
LFCS: ARM, control groups, and the next 20 years
LFCS: ARM, control groups, and the next 20 years
LFCS: ARM, control groups, and the next 20 years
>>> Ingo Molnar recently pointed out that FreeBSD is getting Linux-like quality with a much smaller development community and suggested that it was because the user space and kernel are developed together.LFCS: ARM, control groups, and the next 20 years
Do you have a link to the post from Ingo ?
Over here. It was a QOTW last week.
LFCS: ARM, control groups, and the next 20 years
LFCS: ARM, control groups, and the next 20 years
People's skills change over time, mostly with growing experience. Where will the next batch of good devs come from, and where will they go?
LFCS: ARM, control groups, and the next 20 years
LFCS: ARM, control groups, and the next 20 years
LFCS: ARM, control groups, and the next 20 years
A small, skilled team that has a focused set of real-world goals is going to accomplish much higher quality work than what a throng of "newbs" can pull off."
This is one of the mistakes made by many development teams -- a closed system model with a cathedral style development. Fortunately, "small team with real-world goals" often produce software which only they can use and end up using. Surprisingly smugness is also something which never fades in the community.
LFCS: ARM, control groups, and the next 20 years
Umm... the current CFS scheduler was written (as was Con's deadline scheduler before it) to get all of those interactivity heuristics out of the kernel. They are still mostly gone. Con's BFS (which is not an O(1) scheduler sticks with the fairness approach. Things have improved since the O(1) days.
LFCS: ARM, control groups, and the next 20 years
LFCS: ARM, control groups, and the next 20 years
LFCS: ARM, control groups, and the next 20 years