Sunday, November 15, 2015

And the winner is: RT-PREEMPT

Many might remember that one of the key contributions of the unified build branch of LinuxCNC - which eventually turned into Machinekit - was providing support to multiple realtime kernels. At the time, LinuxCNC only could make use of RTAI, distributing a well-aged version thereof.

RTAI still yields the best latency figures. But that comes at a huge cost: having application code run in-kernel is not only unsafe at any speed, but fraught with an enormous build complexity, kernel version dependency and recurring maintenance chores. And that has not changed - as well as the restriction that RTAI runs on Intel architectures only. Moreover, the future of the RTAI project has become clouded as it has always been a bit of a one-man show lacking a healthy community around it to take over just in case.

The alternatives supported by Machinekit are Xenomai and RT-PREEMPT.

Xenomai shares some history with RTAI - both are hypervisor kernels: the idea is to have a minimal, RT-capable scheduler and interrupt handler underneath the actual Linux kernel. RT threads use this hypervisor to achieve better timing than possible with a vanilla Linux kernel. Other than RTAI - where RT applications need to run in-kernel as modules similar to device drivers, Xenomai supports a threading model which almost looks like normal Posix threads - except that only rather restricted use of the Linux API can be made from such a thread. Xenomai does support a wide range of architectures, which is why it is the mainstay of running Machinekit on ARM platforms.  Again this comes at a cost: Xenomai still requires a rather intrusive kernel patch and is available for a limited range of underlying Linux kernel versions. And given the fact that many embedded manufacturers choose to use a rather specific, sometimes outdated kernel version and sometimes do not upstream their patches into the Torvalds mainline Linux kernel, the chances for getting a working Xenomai kernel for such platforms is pretty low.  That is the main reason for both RTAI and Xenomai kernels being "well aged" on most platforms.

The third alternative is RT-PREEMPT - a set of patches to the standard Linux kernel to improve it's timing behavior, but without introducing a separate API, or a hypervisor. This project  has been over a decade in the making, and at times there were doubts if it would make it into Linux mainline kernel. Initially being substantially higher latency than the hypervisor breed,  over the last year or so huge progress has been made in terms of performance delivered on Intel platforms in particular, but also on ARM platforms: I recently tried an RT-PREEMPT kernel on a Raspberry-2 and it delivers slightly better latency than Xenomai on the Beaglebone, so it's getting pretty close.

In the past, the argument for minimum latency has been its usefulness for software-based step generation and quadrature encoders - the actual servo cycle computations do not need that low latency. But fact is - the PC's parallel port is an extinct piece of hardware (and even then RT-PREEMPT can deliver reasonable software step rates). Plus, inexpensive FPGA hardware can deliver higher performance if needed.

Picking one among the above choices needs to made by all users of realtime applications, not just us. Some funding for RT work has come from the financial industry for high-frequency trading in the past, and the automotive industry has a healthy interest as well, demonstrated for instance by the participation in the Linux Realtime Workshop conference series. This revolves mostly around the autonomous driving theme.

There was some real good news recently: the Linux Foundation announced that it will adopt the RT-PREEMPT project with the goal of bringing it into the mainline kernel (note list of sponsors!). This assures the funding of the remaining work, and IMO it is reasonable to expect that in a few years - probably more than one, but certainly less than five - obtaining a RT-PREEMPT kernel will be just a build option of the mainline kernel. Already now building kernels is much, much simpler than any of the other options - and once that effort goes mainline and manufacturers actually support that kernel, it will be much simpler to obtain RT kernels for any platform, not just a few select ones.

Is interesting to note that the Xenomai3 effort provides a common API over both the hypervisor-style and RT-PREEMPT kernels. Since Xenomai seems to enjoy a healthy industrial user base, it is important to offer a migration path to its users towards what - probably not only I - consider the winner of the RT kernels competition.

These developments have some far-reaching implications for the Machinekit project, some of which were discussed at the recent meetup.  Among those are:


  • the performance edge provided by RTAI has become so small that it is by far outweighed by the enormous build complexity it entails for Machinekit. We have therefore decided to end support for RTAI.
  • With RT-PREEMPT very likely to go mainline, it is likely efforts are made by hardware vendors to support this rather than other options, meaning both range of supported hardware will widen, as well as functionality and performance improvements to show up here first. It will take a while until this "sinks in" with all involved, but the direction is clear.
  • For the time being we'll retain the Xenomai2 builds; while Xenomai2 is in maintenance mode already, we do have stable kernels around which perform great. And the build process is easy and robust. But there is not much point in a Xenomai3 port to just run RT-PREEMPT underneath - the rt-preempt flavor already does that. Any Xenomai3 hypervisor flavor needs to be weighed against the performance edge it has over RT-PREEMPT, and it looks like this edge is shrinking.
  • Ending support for kernel-threads enables removing the absurdly complex legacy build system by more mainstream tools like cmake.
  • using userland threads exclusively opens new options: so far HAL realtime code was restricted to C, as C++ is not supported for kernel modules. This has both potential for simplifying HAL itself, as well as making it easier to integrate with C++-based systems like ROS and Orocos, or bringing in improved solutions to old problems.



6 comments:

  1. One nit. It's called PREEMPT_RT not RT-PREEMPT.

    ReplyDelete
  2. This sounds really promising! Although having some experience with LinuxCNC, I have discovered the Machinekit project just yesterday - and this seems to be the spot-on effort that was needed to make LinuxCNC fit for the new embedded hardware boards available everywhere now. This is really good news for hobbyists and professionals when it comes to easy deployment, where the hardware work alone takes up a good amount of effort.

    Thanks Guys, I am looking forward to get a hands-on experience once my Beagle Bone arrives ;-)

    ReplyDelete
  3. "tried an RT-PREEMPT kernel on a Raspberry-2 and it delivers slightly better latency than Xenomai on the Beaglebone"
    Are you sure about this? I checked the latency on "Linux beaglebone 3.8.13-xenomai-r78" with "xeno latency" and the results are:
    User space: min 0.958us, avg 6.166us, max 23.29us
    Kernel space: min 0.194us, avg 2.633us, max 20.071us

    With Xenomai "dohell" (generating full CPU, I/O-load etc)
    User space: min: 1.583us, avg: 12.958us, max: 25.333us
    Kernel space: min: -1.804us, avg: 11.106us, max: 24.911us

    Also tried RT-PREEMPT results with Robert Nelson kernel 4.4.0-rc8-bone-rt-r1 with "cyclictest -t1 -p 80 -n -i 10000 -l 10000" (no load):
    Min: 23us, Avg: 40us Max: 91us

    Even if the kernel modules in Machinekit are phased out I think it would be nice to have Xenomai 3 support in the long run - making it possible to switch to Cobalt co-kernel if RT-PREEMPT-performance is poor on your target.

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. Hi,

      You have checked each kernel with different methods.
      This is unfair test...

      Thanks,
      Ran

      Delete
  4. Charles - Nice writeup on the different realtime kernels. For the average LinuxCNC user, the req'ts of each escapes us.

    ReplyDelete