02 Oct 2015

feedPlanet Gentoo

Rafael Goncalves Martins: blogc: helper tools

While users may be able to use blogc as is with the help of generic tools, some of these tools are really boring to setup.

With that in mind, I'm trying to develop some simple tools to help the users getting their blogs online. At this point I have two tools ready for usage:

Packages are available for Gentoo, and in my Copr repository for RHEL/CentOS/Fedora.


blogc-git-receiver is a login shell and a git pre-receive hook, that can create git repositories and build/deploy your websites automatically. Users just need to create an user, configure it to use blogc-git-receiver as its login shell, then every time that some authorized user pushes to a repository it will create a local bare repository in the server, if needed, and if the push includes some change to the master branch, it will rebuild your website for you.

blogc-git-receiver tries to be as atomic as possible, building the new version of the website in a separate directory, and using a symlink to point to the most recent version of the website, removing the old version only after a successful new build.

blogc-git-receiver creates a symlink inside the bare git repository, called htdocs, that points to the last successful build. Users just need to make their webservers use this symlink as document root for their virtual host, and make sure that the webserver can follow symlinks.

With this tool, users can create their own PaaS-like environment, using a cheap VPS to host lots of blogs. ;-)

This tool is one of the reasons why I wanted to make blogc as fast as possible, because it will rebuild all the website every time, not just the changes, for the sake of consistency.

This tool is also a good sample code for people interested in understand how a login shell and a git hook works.

Gentoo package is dev-vcs/blogc-git-receiver and RHEL/CentOS/Fedora package is blogc-git-receiver.

Some simple documentation is available at: https://github.com/blogc/blogc-git-receiver


blogc-runserver is a simple HTTP server, that comes with several rules pre-defined, that tries to mimic the way most production servers work when serving static websites. Users just need to point blogc-runserver to the output directory where blogc created its result files.

A simple Makefile rule is able to run your website for testing:

serve: all
        blogc-runserver $(OUTPUT_DIR)

Yeah, it is really that simple! :)

Please keep in mind that this server should not be used in production. It is really simple, and developed for testing purposes.

Gentoo package is www-servers/blogc-runserver and RHEL/CentOS/Fedora package is blogc-runserver.

Some simple documentation is available at: https://github.com/blogc/blogc-runserver

Other tools

I have more ideas of new tools, that I will probably explain in future posts, but if you have ideas of useful tools, please let me know.


02 Oct 2015 3:51am GMT

29 Sep 2015

feedPlanet Gentoo

Paweł Hajdan, Jr.: Gentoo's top-notch user community

One of the best parts of Gentoo for me is the community.

For example, I regularly receive email from people who volunteer to help with testing hard masked www-client/chromium ebuilds. FWIW you don't need to email me or the Gentoo Chromium team - just start testing and filing bugs. On the other hand, I do appreciate when people express interest in helping out.

Another example is helping with getting bugs resolved.

Bug #556812 looked rather mysterious - until Guillaume ZITTA found that "build" is a red herring, and in fact the "tracing" module being imported is a different one (from dev-python/tracing as opposed to chromium sources). It was an unfortunate names collision - once found, quite easy to fix.

In bug #553502 Johan Hovold found we need to require postproc USE flag for libvpx to avoid a crash.

See bug #551666 for some Alexey Dobriyan's gdb magic mostly based on a single line segfault report from dmesg...

These are just a few examples.

By the way, the area where we could use more help is arm architecture support. Some specific bugs where help is wanted are #555702, #545368, #520180, and #483130. Please report whether you can reproduce or not, post emerge --info from your system and the compressed build log in case build fails.

29 Sep 2015 7:39pm GMT

27 Sep 2015

feedPlanet Gentoo

Domen Kožar: Friends sometimes let friends curl to shell

Every now and then (actually quite often), people complain on twitter they're afraid of our simple bash installer for Nix package manager:

$ bash <(curl https://nixos.org/nix/install)

Example (from today):

There are popular blog posts discouraging use of it.

Ask yourself a question, how would package manager install itself? Via another package manager?

If we assume nixos.org is not compromised (which is really hard to detect), using TLS to secure connection and with our simple trick to prevent partial download execution (you haven't read the script yet, right?), what can really go wrong?

It's the most transparent way to see how the package manager can be bootstrapped: read the source, Luke.

If you still have a reason why piping to shell is a bad idea, let me know.

27 Sep 2015 5:50pm GMT

23 Sep 2015

feedPlanet Gentoo

Alex Legler: Repackaging packages.gentoo.org

Gentoo has seen quite some refreshing updates this year: In April, our website was relaunched (twice ;)) and our main package repository is now finally powered by git. There is one service though that sort-of relates to both, but this service has not seen an update in quite some time: Let's talk about packages.gentoo.org.

The fact that we now use git for what was before called gentoo-x86 requires a set of changes to our package website. Several repository URLs are different now, and especially the changelogs that are now taken from git log messages need new code to display properly. As many participants in our website survey a while back have also noted, the time is most ripe for a change of scenery on "pgo".

As having a good package search is one of the most-requested feature for the site, I've built a new site around elasticsearch (which also powers the new Gentoo mailing list archives). It is nearing completion as a minimum viable product, but already available for you to test in a beta state.

So, head over to https://packagestest.gentoo.org/ and feel free to browse around. Some things are not available yet and the caching has not been fully enabled, so things might take a second to render. I'd be glad to hear your thoughts and ideas on the site-either as a comment here, or if you prefer on freenode in #gentoo-www.

As a sneak peek, here's the site rendered on a small screen in case you don't have your $device handy:


23 Sep 2015 6:59pm GMT

20 Sep 2015

feedPlanet Gentoo

Rafael Goncalves Martins: blogc, a blog compiler

So, this is a follow-up for my previous post, about my "new blog".

I promissed to write something about the "new blogging engine" that runs it, and my plan was to finish some documentation and release a first "stable" version of the project before writing this post. And surprisingly it is ready!

After a few beta versions, mainly to verify build system and packaging, I released the first "stable" version a few days ago, with some simple documentation (man pages) and all the features required for basic usage.

This blog is powered by blogc, that is a blog compiler. It works like an usual source code compiler, converting input files written in a custom input language into a target file, based on a template. It can generate almost any resource file needed by a website/blog as target file, depending on the template used, like posts, pages, feeds, sitemaps, etc.

Hmm, but this isn't similar to what [put your favorite static site generator here] does?!

Yeah, it is, indeed. But blogc was developed to be:

  1. Stateless and idempotent: no global state is shared by the binary between compiler calls, and compiler calls with the same arguments should always generate the same output. The compiler should not need to manage what needs to be rebuilt or not, this is a job for a build tool like make.
  2. Fast: the compiler should be able to generate a complex target file in a few miliseconds, making it easy to regenerate the full website/blog whenever needed. This is a feature required by another component of this project, that I'll introduce in another post.
  3. Simple: Both the source and template syntaxes are simple enough to be learned quickly, and are specific for this use case.
  4. Easy to host: it produces static files, that can be hosted anywhere.

And why create another blog-related project?!

As most of the people who read this blog knows, I like to research and implement stuff related to personal content publishing, despite not being a frequent blogger. This time I wanted to put in practice some concepts I had in mind for some time, and play with handwritten parsers.

This project is built around 3 parsers, that are basically state machines. The parsers are heavily inspired by nodejs' http parser, that is inspired by nginx's http-parser. This "methodology" of building parsers make it easy to create fast and memory efficient parsers.

blogc is totally implemented in C99, without any external dependencies.

blogc comes with a somewhat complete unit test suite, that depends on cmocka. Obviously, users who don't want to run the test suite don't need to install cmocka.

I'll provide more implementation details in a future post.

OK, how can I install it?! :D

Gentoo users can install blogc from the main package tree. Please note that the package is keyworded for ~x86 and ~amd64:

# emerge -av app-text/blogc

Fedora users, and users of other Linux distributions that support the EPEL repository, can install it from my Copr repository.

After enabling this repository, just type:

# yum install blogc

I have plans to package it for Ubuntu/Debian, but this is low priority right now.

Users without access to prebuilt packages can install blogc from source. Just visit http://blogc.org, download a source tarball, extract it and build. It is an autotools project, that builds with the usual command combo:

$ ./configure
$ make
# make install

And how can I get started with blogc?

The website contains lots of useful information.

With blogc installed, you can read man pages: blogc(1), blogc-source(7) and blogc-template(7). They are also available in the website.

You can also use our example website as base for your own website. It is very simple, but should be a good start point:

$ git clone https://github.com/blogc/blogc-example.git my-blog
$ cd my-blog
$ rm -rf .git
$ git init .
$ git add .
$ git commit -m 'My new blog, initialized from blogc-example'

The example website uses a Makefile to build the website. Reading this Makefile is a good way to learn about how blogc works. This Makefile just works with GNU Make, but users should be able to write a Makefile that works with any Make implemantation that follows POSIX standards, if wanted. Users can also use task runners like Grunt, if they know how to make it call blogc properly. :-)

What the name "blogc" means?

"blogc" stands for "blog compiler". The name has nothing to do with the programming language used to implement it. ;-)

And what about blohg? It was abandoned?

I'm still maintaining it, and it seems to be stable enough for general usage at this point. Feel free to contact me if you find some bug on it.

That's it for now.

I hope you like blogc!

20 Sep 2015 1:20pm GMT

14 Sep 2015

feedPlanet Gentoo

Alexys Jacob: MongoDB 3.0 upgrade in production : step 4 victory !

In my last post, I explained the new hope we had in following some newly added recommended steps before trying to migrate our production cluster to mongoDB 3.0 WiredTiger.

The most demanding step was migrating all our production servers data storage filesystems to XFS which obviously required a resync of each node… But we ended up being there pretty fast and were about to try again as 3.0.5 was getting ready, until we saw this bug coming !

I guess you can understand why we decided to wait for 3.0.6… which eventually got released with a more peaceful changelog this time.

The 3.0.6 crash test

We decided as usual to test the migration to WiredTiger in two phases.

  1. Migrate all our secondaries to the WiredTiger engine (full resync). Wait a week to see if this has any effect on our cluster.
  2. Switch all the MMapv1 primary nodes to secondary and let our WiredTiger secondary nodes become the primary nodes of our cluster. Pray hard that this time it will not break under our workload.

Step 1 results were good, nothing major changed and even our mongo dumps were still working this time (yay!). One week later, everything was still working smoothly.

Step 2 was the big challenge which failed horribly last time. Needless to say that we were quite stressed when doing the switch. But it worked smoothly and nothing broke + performances gains were huge !

The results

Nothing speaks better than metrics, so I'll just comment them quickly as they speak by themselves. I obviously can't disclose the scales sorry.

Insert-only operations gained 25x performance


Upsert-heavy operations gained 5x performance


Disk I/O also showed mercy to the disk overall usage. This is due to WiredTiger superior caching and disk flushing mechanisms.


Disk usage decreased dramatically thanks to WiredTiger compression


The last and next step

As of today, we still run our secondaries with the MMapv1 engine and are waiting a few weeks to see if anything goes wrong in the long run. Shall we need to roll back, we'd be able to do so very easily.

Then when we get enough uptime using WiredTiger, we will make the final switch to a full Roarring production cluster !

14 Sep 2015 5:10pm GMT

10 Sep 2015

feedPlanet Gentoo

Sven Vermeulen: Custom CIL SELinux policies in Gentoo

In Gentoo, we have been supporting custom policy packages for a while now. Unlike most other distributions, which focus on binary packages, Gentoo has always supported source-based packages as default (although binary packages are supported as well).

A recent commit now also allows CIL files to be used.

Policy ebuilds, how they work

Gentoo provides its own SELinux policy, based on the reference policy, and provides per-module ebuilds (packages). For instance, the SELinux policy for the screen package is provided by the sec-policy/selinux-screen package.

The package itself is pretty straight forward:

# Copyright 1999-2015 Gentoo Foundation
# Distributed under the terms of the GNU General Public License v2
# $Id$


inherit selinux-policy-2

DESCRIPTION="SELinux policy for screen"

if [[ $PV == 9999* ]] ; then
        KEYWORDS="~amd64 ~x86"

The real workhorse lays within a Gentoo eclass, something that can be seen as a library for ebuilds. It allows consolidation of functions and activities so that a large set of ebuilds can be simplified. The more ebuilds are standardized, the more development can be put inside an eclass instead of in the ebuilds. As a result, some ebuilds are extremely simple, and the SELinux policy ebuilds are a good example of this.

The eclass for SELinux policy ebuilds is called selinux-policy-2.eclass and holds a number of functionalities. One of these (the one we focus on right now) is to support custom SELinux policy modules.

Custom SELinux policy ebuilds

Whenever a user has a SELinux policy that is not part of the Gentoo policy repository, then the user might want to provide these policies through packages still. This has the advantage that Portage (or whatever package manager is used) is aware of the policies on the system, and proper dependencies can be built in.

To use a custom policy, the user needs to create an ebuild which informs the eclass not only about the module name (through the MODS variable) but also about the policy files themselves. These files are put in the files/ location of the ebuild, and referred to through the POLICY_FILES variable:

# Copyright 1999-2015 Gentoo Foundation
# Distributed under the terms of the GNU General Public License v2
# $Id$

POLICY_FILES="oracle.te oracle.if oracle.fc"

inherit selinux-policy-2

DESCRIPTION="SELinux policy for screen"

if [[ $PV == 9999* ]] ; then
        KEYWORDS="~amd64 ~x86"

The eclass generally will try to build the policies, converting them into .pp files. With CIL, this is no longer needed. Instead, what we do is copy the .cil files straight into the location where we place the .pp files.

From that point onwards, managing the .cil files is similar to .pp files. They are loaded with semodule -i and unloaded with semodule -r when needed.

Enabling CIL in our ebuilds is a small improvement (after the heavy workload to support the 2.4 userspace) which allows Gentoo to stay ahead in the SELinux world.

10 Sep 2015 5:13am GMT

07 Sep 2015

feedPlanet Gentoo

Bernard Cafarelli: Testing clang 3.7.0 OpenMP support

Soon after llvm 3.7.0 reached the Gentoo tree, Jeremi Piotrowski opened two bugs to fix OpenMP support in our ebuilds. sys-devel/llvm-3.7.0-r1 (a revision bump that also now installs LLVM utilities) now has a post-install message with a short recap, but here is the blog version.

As detailed in upstream release notes, OpenMP in clang is disabled by default (and I kept this default in the Gentoo package), and the runtime is a separate package. So:

07 Sep 2015 3:57pm GMT

06 Sep 2015

feedPlanet Gentoo

Patrick Lauer: Printers in Linux ...

I was trying to make a Brother printer do some printing. (Optimistic, I know ... but ...)
Out of the box CUPS can't handle it. Since it's CUPS it doesn't do errors, so who knows why. Anyway, the autoconfigured stuff ends up being eh whut ur doin is naught gud.

The config file points at a hostname that is not even within our local network, so I think this won't print locally. Cloud printing? Cups can do!
There's a printer driver from Brother, which is horribly stupid, bad, stupid, and bad. I read through the 'installer' script trying to figure out what it would do, but then I realized that I'm not on an RPM distro so it's futile.

So then I figured "why not postscript?"

And guess what. All the documentation I found was needlessly wrong and broken, when all you have to do is configure an IPP printer, generic postscript, and ... here comes instant good quality colour duplex output from the printer.

I'm confused about many things here - the complexity of Brother's attempt at a driver that doesn't work, CUPS autoconfiguration that sees the printer and then derps out, and all the documentation that doesn't point out that all this weird layercake of workarounds is not needed because it's a standards-compliant postscript printer.
How exhausting.

06 Sep 2015 10:46am GMT

02 Sep 2015

feedPlanet Gentoo

Sven Vermeulen: Maintaining packages and backporting

A few days ago I committed a small update to policycoreutils, a SELinux related package that provides most of the management utilities for SELinux systems. The fix was to get two patches (which are committed upstream) into the existing release so that our users can benefit from the fixed issues without having to wait for a new release.

Getting the patches

To capture the patches, I used git together with the commit id:

~$ git format-patch -n -1 73b7ff41
~$ git format-patch -n -1 4fbc6623

The two generated patch files contain all information about the commit. Thanks to the epatch support in the eutils.eclass, these patch files are immediately usable within Gentoo's ebuilds.

Updating the ebuilds

The SELinux userspace ebuilds in Gentoo all have live ebuilds available which are immediately usable for releases. The idea with those live ebuilds is that we can simply copy them and commit in order to make a new release.

So, in case of the patch backporting, the necessary patch files are first moved into the files/ subdirectory of the package. Then, the live ebuild is updated to use the new patches:

@@ -88,6 +85,8 @@ src_prepare() {
                epatch "${FILESDIR}/0070-remove-symlink-attempt-fails-with-gentoo-sandbox-approach.patch"
                epatch "${FILESDIR}/0110-build-mcstrans-bug-472912.patch"
                epatch "${FILESDIR}/0120-build-failure-for-mcscolor-for-CONTEXT__CONTAINS.patch"
+               epatch "${FILESDIR}/0130-Only-invoke-RPM-on-RPM-enabled-Linux-distributions-bug-534682.patch"
+               epatch "${FILESDIR}/0140-Set-self.sename-to-sename-after-calling-semanage-bug-557370.patch"

        # rlpkg is more useful than fixfiles

The patches themselves do not apply for the live ebuilds themselves (they are ignored there) as we want the live ebuilds to be as close to the upstream project as possible. But because the ebuilds are immediately usable for releases, we add the necessary information there first.

Next, the new release is created:

~$ cp policycoreutils-9999.ebuild policycoreutils-2.4-r2.ebuild

Testing the changes

The new release is then tested. I have a couple of scripts that I use for automated testing. So first I update these scripts to also try out the functionality that was failing before. On existing systems, these tests should fail:

Running task semanage (Various semanage related operations).
    Executing step "perm_port_on   : Marking portage_t as a permissive domain                              " -> ok
    Executing step "perm_port_off  : Removing permissive mark from portage_t                               " -> ok
    Executing step "selogin_modify : Modifying a SELinux login definition                                  " -> failed

Then, on a test system where the new package has been installed, the same testset is executed (together with all other tests) to validate if the problem is fixed.

Pushing out the new release

Finally, with the fixes in and validated, the new release is pushed out (into ~arch first of course) and the bugs are marked as RESOLVED:TEST-REQUEST. Users can confirm that it works (which would move it to VERIFIED:TEST-REQUEST) or we stabilize it after the regular testing period is over (which moves it to RESOLVED:FIXED or VERIFIED:FIXED).

I do still have to get used to Gentoo using git as its repository now. The workflow to use is documented though. Luckily, because I often get that the git push fails (due to changes to the tree since my last pull). So I need to run git pull --rebase=preserve followed by repoman full and then the push again sufficiently quick after each other).

This simple flow is easy to get used to. Thanks to the existing foundation for package maintenance (such as epatch for patching, live ebuilds that can be immediately used for releases and the ability to just cherry pick patches towards our repository) we can serve updates with just a few minutes of work.

02 Sep 2015 6:33pm GMT

Bernard Cafarelli: LLVM 3.7.0 released, Gentoo package changes

As announced here, the new LLVM release is now available. If you are interested after reading the release notes (or the clang ones), I just added the corresponding packages in Gentoo, although still masked for some additional testing. mesa-11.0 release candidates compile fine with it though.
2015/09/04 edit: llvm 3.7.0 is now unmasked!

On the Gentoo side, this version bump has some nice changes:

On the TODO list, some tests are still failing on 32bit multilib (bug #558604), ocaml doc compilation/installation has a strange bug (see the additional call on docs/ocaml_doc target in the ebuild). And let's not forget adding other tools, like polly. Another recent suggestion: add llvm's OpenMP library needed for OpenMP with clang.

Also, if people were still using the dragonegg GCC plugin, note that it is not released anymore upstream. As 3.6.x versions had compilation problems, this package received its last rites.

02 Sep 2015 11:53am GMT

28 Aug 2015

feedPlanet Gentoo

Michael Palimaka: There’s no such thing as KDE 5, and its status on Gentoo


For those that are not already aware, KDE's release structure has evolved. The familiar all-in-one release of KDE SC 4 has been split into three distinct components, each with their own release cycles - KDE Frameworks 5, KDE Plasma 5, and KDE Applications 5.

This means there's no such thing as KDE 5!

KDE Frameworks 5

KDE Frameworks 5 is a collection of libraries upon which Plasma and Applications are built. Each framework is distinct in terms of functionality, allowing consumers to depend on smaller individual libraries. This has driven adoption in other Qt-based projects such as LXQt as they no longer have to worry about "pulling in KDE".

We ship the latest version of KDE Frameworks 5 in the main tree, and plan to target it for stabilisation shortly.

KDE Plasma 5

KDE Plasma 5 is the next generation of the Plasma desktop environment. While some might not consider it as mature as Plasma 4, it is in a good state for general use and is shipped as stable by a number of other distributions.

We ship the latest version of KDE Plasma 5 in the main tree, and would expect to target it for stabilisation within a few months.

KDE Applications 5

KDE Applications 5 consists of the remaining applications and supporting libraries. Porting is gradual process, with each new major release containing more KF5-based and fewer KDE 4-based packages.

Unfortunately, current Applications releases are not entirely coherent - some packages have features that require unreleased dependencies, and some cannot be installed at the same time as others. This situation is expected to improve in future releases as porting efforts progress.

Because of this, it's not possible to ship KDE Applications in its entirety. Rather, we are in the process of cherry-picking known-good packages into the main tree. We have not discussed any stabilisation plan yet.


As Frameworks are just libraries, they are automatically pulled in as required by consuming packages, and no user intervention is required.

To upgrade to Plasma 5, please follow the upgrade guide. Unfortunately it's not possible to have both Plasma 4 and Plasma 5 installed at the same time, due to an upstream design decision.

Applications appear in the tree as a regular version bump, so will upgrade automatically.

Ongoing KDE 4 support

Plasma 4 has reached end-of-life upstream, and no further releases are expected. As per usual, we will keep it for a reasonable time before removing it completely.

As each Applications upgrade should be invisible, there's less need to retain old versions. It is likely that the existing policy of removing old shortly after stabilisation will continue.

What is this 15.08.0 stuff, or, why is it upgrading to KDE 5?

As described above, Applications are now released separately from Plasma, following a yy.mm.xx versioning scheme. This means that, regardless of whether they are KDE 4 or KF5-based, they will work correctly in both Plasma 4 and Plasma 5, or any other desktop environment.

It is the natural upgrade path, and there is no longer a "special relationship" between Plasma and Applications the way there was in KDE SC 4.


As always, feedback is appreciated - especially during major transitions like this. Sharing your experience will help improve it for the next person, and substantial improvements have already been made made thanks to the contributions of early testers.

Feel free to file a bug, send a mail, or drop by #gentoo-kde for a chat any time. Thanks for flying Gentoo KDE!

28 Aug 2015 8:06pm GMT

27 Aug 2015

feedPlanet Gentoo

Alexys Jacob: py3status v2.6

Ok I was a bit too hasty in my legacy module support code clean up and I broke quite a few things on the latest version 2.5 release sorry ! :(


thanks !

27 Aug 2015 5:04pm GMT

25 Aug 2015

feedPlanet Gentoo

Sven Vermeulen: Slowly converting from GuideXML to HTML

Gentoo has removed its support of the older GuideXML format in favor of using the Gentoo Wiki and a new content management system for the main site (or is it static pages, I don't have the faintest idea to be honest). I do still have a few GuideXML pages in my development space, which I am going to move to HTML pretty soon.

In order to do so, I make use of the guidexml2wiki stylesheet I developed. But instead of migrating it to wiki syntax, I want to end with HTML.

So what I do is first convert the file from GuideXML to MediaWiki with xsltproc.

Next, I use pandoc to convert this to restructured text. The idea is that the main pages on my devpage are now restructured text based. I was hoping to use markdown, but the conversion from markdown to HTML is not what I hoped it was.

The restructured text is then converted to HTML using rst2html.py. In the end, I use the following function (for conversion, once):

# Convert GuideXML to RestructedText and to HTML
gxml2html() {

  # Convert to Mediawiki syntax
  xsltproc ~/dev-cvs/gentoo/xml/htdocs/xsl/guidexml2wiki.xsl $1 > ${basefile}.mediawiki

  if [ -f ${basefile}.mediawiki ] ; then
    # Convert to restructured text
    pandoc -f mediawiki -t rst -s -S -o ${basefile}.rst ${basefile}.mediawiki;

  if [ -f ${basefile}.rst ] ; then
    # Use your own stylesheet links (use full https URLs for this)
    rst2html.py  --stylesheet=link-to-bootstrap.min.css,link-to-tyrian.min.css --link-stylesheet ${basefile}.rst ${basefile}.html

Is it perfect? No, but it works.

25 Aug 2015 9:30am GMT

17 Aug 2015

feedPlanet Gentoo

Alexys Jacob: py3status v2.5

This new py3status comes with an amazing number of contributions and new modules !

24 files changed, 1900 insertions(+), 263 deletions(-)

I'm also glad to say that py3status becomes my first commit in the new git repository of Gentoo Linux !


Please note that this version has deprecated the legacy implicit module loading support to favour and focus on the generic i3status order += module loading/ordering !

New modules


As usual, full changelog is available here.


Along with all those who reported issues and helped fixed them, quick and surely not exhaustive list:

What's next ?

Well something tells me @Horgix is working hard on some standardization and on the core of py3status ! I'm sure some very interesting stuff will emerge from this, so thank you !

17 Aug 2015 3:05pm GMT

13 Aug 2015

feedPlanet Gentoo

Sven Vermeulen: Finding a good compression utility

I recently came across a wiki page written by Herman Brule which gives a quick benchmark on a couple of compression methods / algorithms. It gave me the idea of writing a quick script that tests out a wide number of compression utilities available in Gentoo (usually through the app-arch category), with also a number of options (in case multiple options are possible).

The currently supported packages are:

app-arch/bloscpack      app-arch/bzip2          app-arch/freeze
app-arch/gzip           app-arch/lha            app-arch/lrzip
app-arch/lz4            app-arch/lzip           app-arch/lzma
app-arch/lzop           app-arch/mscompress     app-arch/p7zip
app-arch/pigz           app-arch/pixz           app-arch/plzip
app-arch/pxz            app-arch/rar            app-arch/rzip
app-arch/xar            app-arch/xz-utils       app-arch/zopfli

The script should keep the best compression information: duration, compression ratio, compression command, as well as the compressed file itself.

Finding the "best" compression

It is not my intention to find the most optimal compression, as that would require heuristic optimizations (which has triggered my interest in seeking such software, or writing it myself) while trying out various optimization parameters.

No, what I want is to find the "best" compression for a given file, with "best" being either

For me personally, I think I would use it for the various raw image files that I have through the photography hobby. Those image files are difficult to compress (the Nikon DS3200 I use is an entry-level camera which applies lossy compression already for its raw files) but their total size is considerable, and it would allow me to better use the storage I have available both on my laptop (which is SSD-only) as well as backup server.

But next to the best compression ratio, the efficiency is also an important metric as it shows how efficient the algorithm works in a certain time aspect. If one compression method yields 80% reduction in 5 minutes, and another one yields 80,5% in 45 minutes, then I might want to prefer the first one even though that is not the best compression at all.

Although the script could be used to get the most compression (without resolving to an optimization algorithm for the compression commands) for each file, this is definitely not the use case. A single run can take hours for files that are compressed in a handful of seconds. But it can show the best algorithms for a particular file type (for instance, do a few runs on a couple of raw image files and see which method is most succesful).

Another use case I'm currently looking into is how much improvement I can get when multiple files (all raw image files) are first grouped in a single archive (.tar). Theoretically, this should improve the compression, but by how much?

How the script works

The script does not contain much intelligence. It iterates over a wide set of compression commands that I tested out, checks the final compressed file size, and if it is better than a previous one it keeps this compressed file (and its statistics).

I tried to group some of the compressions together based on the algorithm used, but as I don't really know the details of the algorithms (it's based on manual pages and internet sites) and some of them combine multiple algorithms, it is more of a high-level selection than anything else.

The script can also only run the compressions of a single application (which I use when I'm fine-tuning the parameter runs).

A run shows something like the following:

Original file (test.nef) size 20958430 bytes
      package name                                                 command      duration                   size compr.Δ effic.:
      ------------                                                 -------      --------                   ---- ------- -------
app-arch/bloscpack                                               blpk -n 4           0.1               20947097 0.00054 0.00416
app-arch/bloscpack                                               blpk -n 8           0.1               20947097 0.00054 0.00492
app-arch/bloscpack                                              blpk -n 16           0.1               20947097 0.00054 0.00492
    app-arch/bzip2                                                   bzip2           2.0               19285616 0.07982 0.03991
    app-arch/bzip2                                                bzip2 -1           2.0               19881886 0.05137 0.02543
    app-arch/bzip2                                                bzip2 -2           1.9               19673083 0.06133 0.03211
    app-arch/p7zip                                      7za -tzip -mm=PPMd           5.9               19002882 0.09331 0.01592
    app-arch/p7zip                             7za -tzip -mm=PPMd -mmem=24           5.7               19002882 0.09331 0.01640
    app-arch/p7zip                             7za -tzip -mm=PPMd -mmem=25           6.4               18871933 0.09955 0.01551
    app-arch/p7zip                             7za -tzip -mm=PPMd -mmem=26           7.7               18771632 0.10434 0.01364
    app-arch/p7zip                             7za -tzip -mm=PPMd -mmem=27           9.0               18652402 0.11003 0.01224
    app-arch/p7zip                             7za -tzip -mm=PPMd -mmem=28          10.0               18521291 0.11628 0.01161
    app-arch/p7zip                                       7za -t7z -m0=PPMd           5.7               18999088 0.09349 0.01634
    app-arch/p7zip                                7za -t7z -m0=PPMd:mem=24           5.8               18999088 0.09349 0.01617
    app-arch/p7zip                                7za -t7z -m0=PPMd:mem=25           6.5               18868478 0.09972 0.01534
    app-arch/p7zip                                7za -t7z -m0=PPMd:mem=26           7.5               18770031 0.10442 0.01387
    app-arch/p7zip                                7za -t7z -m0=PPMd:mem=27           8.6               18651294 0.11008 0.01282
    app-arch/p7zip                                7za -t7z -m0=PPMd:mem=28          10.6               18518330 0.11643 0.01100
      app-arch/rar                                                     rar           0.9               20249470 0.03383 0.03980
      app-arch/rar                                                 rar -m0           0.0               20958497 -0.00000        -0.00008
      app-arch/rar                                                 rar -m1           0.2               20243598 0.03411 0.14829
      app-arch/rar                                                 rar -m2           0.8               20252266 0.03369 0.04433
      app-arch/rar                                                 rar -m3           0.8               20249470 0.03383 0.04027
      app-arch/rar                                                 rar -m4           0.9               20248859 0.03386 0.03983
      app-arch/rar                                                 rar -m5           0.8               20248577 0.03387 0.04181
    app-arch/lrzip                                                lrzip -z          13.1               19769417 0.05673 0.00432
     app-arch/zpaq                                                    zpaq           0.2               20970029 -0.00055        -0.00252
The best compression was found with 7za -t7z -m0=PPMd:mem=28.
The compression delta obtained was 0.11643 within 10.58 seconds.
This file is now available as test.nef.7z.

In the above example, the test file was around 20 MByte. The best compression compression command that the script found was:

~$ 7za -t7z -m0=PPMd:mem=28 a test.nef.7z test.nef

The resulting file (test.nef.7z) is 18 MByte, a reduction of 11,64%. The compression command took almost 11 seconds to do its thing, which gave an efficiency rating of 0,011, which is definitely not a fast one.

Some other algorithms don't do bad either with a better efficiency. For instance:

   app-arch/pbzip2                                                  pbzip2           0.6               19287402 0.07973 0.13071

In this case, the pbzip2 command got almost 8% reduction in less than a second, which is considerably more efficient than the 11-seconds long 7za run.

Want to try it out yourself?

I've pushed the script to my github location. Do a quick review of the code first (to see that I did not include anything malicious) and then execute it to see how it works:

~$ sw_comprbest -h
Usage: sw_comprbest --infile=<inputfile> [--family=<family>[,...]] [--command=<cmd>]
       sw_comprbest -i <inputfile> [-f <family>[,...]] [-c <cmd>]

Supported families: blosc bwt deflate lzma ppmd zpaq. These can be provided comma-separated.
Command is an additional filter - only the tests that use this base command are run.

The output shows
  - The package (in Gentoo) that the command belongs to
  - The command run
  - The duration (in seconds)
  - The size (in bytes) of the resulting file
  - The compression delta (percentage) showing how much is reduced (higher is better)
  - The efficiency ratio showing how much reduction (percentage) per second (higher is better)

When the command supports multithreading, we use the number of available cores on the system (as told by /proc/cpuinfo).

For instance, to try it out against a PDF file:

~$ sw_comprbest -i MEA6-Sven_Vermeulen-Research_Summary.pdf
Original file (MEA6-Sven_Vermeulen-Research_Summary.pdf) size 117763 bytes
The best compression was found with zopfli --deflate.
The compression delta obtained was 0.00982 within 0.19 seconds.
This file is now available as MEA6-Sven_Vermeulen-Research_Summary.pdf.deflate.

So in this case, the resulting file is hardly better compressed - the PDF itself is already compressed. Let's try it against the uncompressed PDF:

~$ pdftk MEA6-Sven_Vermeulen-Research_Summary.pdf output test.pdf uncompress
~$ sw_comprbest -i test.pdf
Original file (test.pdf) size 144670 bytes
The best compression was found with lrzip -z.
The compression delta obtained was 0.27739 within 0.18 seconds.
This file is now available as test.pdf.lrz.

This is somewhat better:

~$ ls -l MEA6-Sven_Vermeulen-Research_Summary.pdf* test.pdf*
-rw-r--r--. 1 swift swift 117763 Aug  7 14:32 MEA6-Sven_Vermeulen-Research_Summary.pdf
-rw-r--r--. 1 swift swift 116606 Aug  7 14:32 MEA6-Sven_Vermeulen-Research_Summary.pdf.deflate
-rw-r--r--. 1 swift swift 144670 Aug  7 14:34 test.pdf
-rw-r--r--. 1 swift swift 104540 Aug  7 14:35 test.pdf.lrz

The resulting file is 11,22% reduced from the original one.

13 Aug 2015 5:15pm GMT