17 May 2018

feedKernel Planet

Pete Zaitcev: Amazon AI plz

Not being a native speaker, I get amusing results sometimes when searching on Amazon. For example, "floor scoop" brings up mostly fancy dresses. Apparently, a scoop is a type of dress, which can be floor-length, and so. The correct request is actually "dust pan". Today though, searching for "Peliton termite" ended with a bunch of bicycle saddles. Apparently, Amazon force-replaced it with "peloton", and I know of no syntax to force my spelling. I suspect that Peliton may have trouble selling their products at Amazon. This sort of thing is making me wary of Alexa. I don't see myself ever winning an argument with a robot who knows better, and is implemented in proprietary software that I cannot adjust.

17 May 2018 5:48pm GMT

09 May 2018

feedKernel Planet

Pete Zaitcev: The space-based ADS-B

Today, I want to build a satellite that receives ADS-B signals from airplanes over the open ocean, far away from land. With a decent receiver and a simple antenna, it should be possible on a gravity-stabilized cubesat. I know about terrestrial receivers picking signals 200..300 km out, surely with care one can do better. But I highly doubt that it's possible to finance such a toy - unless someone has already done that. I know that people somehow manage to finance AIS receivers, which are basically the same thing, only for ships. How do they do that?

09 May 2018 3:27am GMT

07 May 2018

feedKernel Planet

Davidlohr Bueso: Linux v4.16: Performance Goodies

Linux v4.16 was released a few weeks ago and continues the mitigation of meltdown and spectre bugs for x86-64, as well as for arm64 and IBM s390. While v4.16 is not the most exciting kernel version in terms of performance and scalability, the following is an unsorted and incomplete list of changes that went in which I have cherry-picked. As always, the term 'performance' can be vague in that some gains in one area can negatively affect another so take everything with a grain of salt.

sched: reduce migrations and spreading of load to multiple CPUs

The scheduler decisions are biased towards reducing latency of searches but tends to spread load across an entire socket, unnecessarily. On low CPU usage, this means the load on each individual CPU is low which can be good but cpufreq decides that utilization on individual CPUs is too low to increase P-state and overall throughput suffers.

When a cpufreq driver is completely under the control of the OS, it can be compensated for. For example, intel_pstate can decide to boost apparent cpu utilization if a task recently slept on a CPU for idle. However, if hardware-based cpufreq is in play (e.g. hardware P-states HWP) then very poor decisions can be made and the OS cannot do much about it. This only gets worse as HWP becomes more prevalent, sockets get larger and the p-state for individual cores can be controlled. Just setting the performance governor is not an answer given that plenty of people really do worry about power utilization and still want a reasonable balance between performance and power. Experiments show performance benefits for network benchmarks running on localhost (at ~10% on netperf RR for UDP and TCP, depending on the machine). Hackbench also has some small improvements with ~6-11%, depending on machine and thread count.
[Commit 89a55f56fd1c, 3b76c4a33959, 806486c377e3, 32e839dda3ba]

printk: new locking scheme

Problems around the kernel's printk() call aren't new and traditionally must overcome issues with the console lock. Considering that the kernel printing out to the console is very generic operation which can be called from virtually anywhere at any time, relying on any sort of lock can cause deadlocks. Similarly, the call to printk() must proceed regardless of the availability of the console lock. As such, what would happen is that upon contention, the task buffers the output for the console lock owner to flush as when it releases the lock.

On large multi-core systems this scheme can lead to the console owner to pile up a lot unbound work before it can release the lock, triggering watchdog lockups. This was replaced with a new mechanism that, upon contention, the task will not delay the work to the console lock owner and return, but it'll stay around spinning until it is available. The heuristics imply a console owner and waiter such that if multiple CPUs are generating output, the console lock will circulate between them, and none will end up printing output for too long.
[Commit dbdda842fe96]

idr tree optimizations

With the extensions and improvements of the ID allocation API, there is a performance enhancement for ID numbering schemes that don't start at 0; which, according to the patch, accounts for ~20% of all the kernel users. So by using the new idr functions with the _base() suffix users can immediately benefit from unnecessary iterations in the underlying radix tree.
[Commit 6ce711f27500]

arm64: 52-bit physical address support

With ARMv8.2 the physical address space is extended from 48 to 52-bit, thus tasks are now able to address up to 4 pebibytes (PiB).
[Commit fa2a8445b1d3, 193383043f14, 529c4b05a3cb, 787fd1d019b2]

07 May 2018 5:53pm GMT