26 Aug 2014
After I moved to a new OpenPGP key (see key transition statement) I have received comments about the short life length of my new key. When I created the key (see my GnuPG setup) I set it to expire after 100 days. Some people assumed that I would have to create a new key then, and therefore wondered what value there is to sign a key that will expire in two months. It doesn't work like that, and below I will explain how OpenPGP key expiration works; how to extend the expiration time of your key; and argue why having a relatively short validity period can be a good thing.
188.8.131.52. Key Expiration Time (4-octet time field) The validity period of the key. This is the number of seconds after the key creation time that the key expires. If this is not present or has a value of zero, the key never expires. This is found only on a self-signature.
You can print the sub-packets in your OpenPGP key with
gpg --list-packets. See below an output for my key, and notice the "created 1403464490″ (which is Unix time for 2014-06-22 21:14:50) and the "subpkt 9 len 4 (key expires after 100d0h0m)" which adds up to an expiration on 2014-09-26. Don't confuse the creation time of the key ("created 1403464321″) with when the signature was created ("created 1403464490″).
jas@latte:~$ gpg --export 54265e8c | gpg --list-packets |head -20 :public key packet: version 4, algo 1, created 1403464321, expires 0 pkey: [3744 bits] pkey: [17 bits] :user ID packet: "Simon Josefsson " :signature packet: algo 1, keyid 0664A76954265E8C version 4, created 1403464490, md5len 0, sigclass 0x13 digest algo 10, begin of digest be 8e hashed subpkt 27 len 1 (key flags: 03) hashed subpkt 9 len 4 (key expires after 100d0h0m) hashed subpkt 11 len 7 (pref-sym-algos: 9 8 7 13 12 11 10) hashed subpkt 21 len 4 (pref-hash-algos: 10 9 8 11) hashed subpkt 30 len 1 (features: 01) hashed subpkt 23 len 1 (key server preferences: 80) hashed subpkt 2 len 4 (sig created 2014-06-22) hashed subpkt 25 len 1 (primary user ID) subpkt 16 len 8 (issuer key ID 0664A76954265E8C) data: [3743 bits] :signature packet: algo 1, keyid EDA21E94B565716F version 4, created 1403466403, md5len 0, sigclass 0x10 jas@latte:~$
So the key will simply stop being valid after that time? No. It is possible to update the key expiration time value, re-sign the key, and distribute the key to people you communicate with directly or indirectly to OpenPGP keyservers. Since that date is a couple of weeks away, now felt like the perfect opportunity to go through the exercise of taking out my offline master key and boot from a Debian LiveCD and extend its expiry time. See my earlier writeup for LiveCD and USB stick conventions.
user@debian:~$ export GNUPGHOME=/media/FA21-AE97/gnupghome user@debian:~$ gpg --edit-key 54265e8c gpg (GnuPG) 1.4.12; Copyright (C) 2012 Free Software Foundation, Inc. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Secret key is available. pub 3744R/54265E8C created: 2014-06-22 expires: 2014-09-30 usage: SC trust: ultimate validity: ultimate sub 2048R/32F8119D created: 2014-06-22 expires: 2014-09-30 usage: S sub 2048R/78ECD86B created: 2014-06-22 expires: 2014-09-30 usage: E sub 2048R/36BA8F9B created: 2014-06-22 expires: 2014-09-30 usage: A [ultimate] (1). Simon Josefsson [ultimate] (2) Simon Josefsson gpg> expire Changing expiration time for the primary key. Please specify how long the key should be valid. 0 = key does not expire = key expires in n days w = key expires in n weeks m = key expires in n months y = key expires in n years Key is valid for? (0) 150 Key expires at Fri 23 Jan 2015 02:47:48 PM UTC Is this correct? (y/N) y You need a passphrase to unlock the secret key for user: "Simon Josefsson " 3744-bit RSA key, ID 54265E8C, created 2014-06-22 pub 3744R/54265E8C created: 2014-06-22 expires: 2015-01-23 usage: SC trust: ultimate validity: ultimate sub 2048R/32F8119D created: 2014-06-22 expires: 2014-09-30 usage: S sub 2048R/78ECD86B created: 2014-06-22 expires: 2014-09-30 usage: E sub 2048R/36BA8F9B created: 2014-06-22 expires: 2014-09-30 usage: A [ultimate] (1). Simon Josefsson [ultimate] (2) Simon Josefsson gpg> key 1 pub 3744R/54265E8C created: 2014-06-22 expires: 2015-01-23 usage: SC trust: ultimate validity: ultimate sub* 2048R/32F8119D created: 2014-06-22 expires: 2014-09-30 usage: S sub 2048R/78ECD86B created: 2014-06-22 expires: 2014-09-30 usage: E sub 2048R/36BA8F9B created: 2014-06-22 expires: 2014-09-30 usage: A [ultimate] (1). Simon Josefsson [ultimate] (2) Simon Josefsson gpg> expire Changing expiration time for a subkey. Please specify how long the key should be valid. 0 = key does not expire = key expires in n days w = key expires in n weeks m = key expires in n months y = key expires in n years Key is valid for? (0) 150 Key expires at Fri 23 Jan 2015 02:48:05 PM UTC Is this correct? (y/N) y You need a passphrase to unlock the secret key for user: "Simon Josefsson " 3744-bit RSA key, ID 54265E8C, created 2014-06-22 pub 3744R/54265E8C created: 2014-06-22 expires: 2015-01-23 usage: SC trust: ultimate validity: ultimate sub* 2048R/32F8119D created: 2014-06-22 expires: 2015-01-23 usage: S sub 2048R/78ECD86B created: 2014-06-22 expires: 2014-09-30 usage: E sub 2048R/36BA8F9B created: 2014-06-22 expires: 2014-09-30 usage: A [ultimate] (1). Simon Josefsson [ultimate] (2) Simon Josefsson gpg> key 2 pub 3744R/54265E8C created: 2014-06-22 expires: 2015-01-23 usage: SC trust: ultimate validity: ultimate sub* 2048R/32F8119D created: 2014-06-22 expires: 2015-01-23 usage: S sub* 2048R/78ECD86B created: 2014-06-22 expires: 2014-09-30 usage: E sub 2048R/36BA8F9B created: 2014-06-22 expires: 2014-09-30 usage: A [ultimate] (1). Simon Josefsson [ultimate] (2) Simon Josefsson gpg> key 1 pub 3744R/54265E8C created: 2014-06-22 expires: 2015-01-23 usage: SC trust: ultimate validity: ultimate sub 2048R/32F8119D created: 2014-06-22 expires: 2015-01-23 usage: S sub* 2048R/78ECD86B created: 2014-06-22 expires: 2014-09-30 usage: E sub 2048R/36BA8F9B created: 2014-06-22 expires: 2014-09-30 usage: A [ultimate] (1). Simon Josefsson [ultimate] (2) Simon Josefsson gpg> expire Changing expiration time for a subkey. Please specify how long the key should be valid. 0 = key does not expire = key expires in n days w = key expires in n weeks m = key expires in n months y = key expires in n years Key is valid for? (0) 150 Key expires at Fri 23 Jan 2015 02:48:14 PM UTC Is this correct? (y/N) y You need a passphrase to unlock the secret key for user: "Simon Josefsson " 3744-bit RSA key, ID 54265E8C, created 2014-06-22 pub 3744R/54265E8C created: 2014-06-22 expires: 2015-01-23 usage: SC trust: ultimate validity: ultimate sub 2048R/32F8119D created: 2014-06-22 expires: 2015-01-23 usage: S sub* 2048R/78ECD86B created: 2014-06-22 expires: 2015-01-23 usage: E sub 2048R/36BA8F9B created: 2014-06-22 expires: 2014-09-30 usage: A [ultimate] (1). Simon Josefsson [ultimate] (2) Simon Josefsson gpg> key 3 pub 3744R/54265E8C created: 2014-06-22 expires: 2015-01-23 usage: SC trust: ultimate validity: ultimate sub 2048R/32F8119D created: 2014-06-22 expires: 2015-01-23 usage: S sub* 2048R/78ECD86B created: 2014-06-22 expires: 2015-01-23 usage: E sub* 2048R/36BA8F9B created: 2014-06-22 expires: 2014-09-30 usage: A [ultimate] (1). Simon Josefsson [ultimate] (2) Simon Josefsson gpg> key 2 pub 3744R/54265E8C created: 2014-06-22 expires: 2015-01-23 usage: SC trust: ultimate validity: ultimate sub 2048R/32F8119D created: 2014-06-22 expires: 2015-01-23 usage: S sub 2048R/78ECD86B created: 2014-06-22 expires: 2015-01-23 usage: E sub* 2048R/36BA8F9B created: 2014-06-22 expires: 2014-09-30 usage: A [ultimate] (1). Simon Josefsson [ultimate] (2) Simon Josefsson gpg> expire Changing expiration time for a subkey. Please specify how long the key should be valid. 0 = key does not expire = key expires in n days w = key expires in n weeks m = key expires in n months y = key expires in n years Key is valid for? (0) 150 Key expires at Fri 23 Jan 2015 02:48:23 PM UTC Is this correct? (y/N) y You need a passphrase to unlock the secret key for user: "Simon Josefsson " 3744-bit RSA key, ID 54265E8C, created 2014-06-22 pub 3744R/54265E8C created: 2014-06-22 expires: 2015-01-23 usage: SC trust: ultimate validity: ultimate sub 2048R/32F8119D created: 2014-06-22 expires: 2015-01-23 usage: S sub 2048R/78ECD86B created: 2014-06-22 expires: 2015-01-23 usage: E sub* 2048R/36BA8F9B created: 2014-06-22 expires: 2015-01-23 usage: A [ultimate] (1). Simon Josefsson [ultimate] (2) Simon Josefsson gpg> save user@debian:~$ gpg -a --export 54265e8c > /media/KINGSTON/updated-key.txt user@debian:~$
I remove the "transport" USB stick from the "offline" computer, and back on my laptop I can inspect the new updated key. Let's use the same command as before. The key creation time is the same ("created 1403464321″), of course, but the signature packet has a new time ("created 1409064478″) since it was signed now. Notice "created 1409064478″ and "subpkt 9 len 4 (key expires after 214d19h35m)". The expiration time is computed based on when the key was generated, not when the signature packet was generated. You may want to double-check the pref-sym-algos, pref-hash-algos and other sub-packets so that you don't accidentally change anything else. (Btw, re-signing your key is also how you would modify those preferences over time.)
jas@latte:~$ cat /media/KINGSTON/updated-key.txt |gpg --list-packets | head -20 :public key packet: version 4, algo 1, created 1403464321, expires 0 pkey: [3744 bits] pkey: [17 bits] :user ID packet: "Simon Josefsson " :signature packet: algo 1, keyid 0664A76954265E8C version 4, created 1409064478, md5len 0, sigclass 0x13 digest algo 10, begin of digest 5c b2 hashed subpkt 27 len 1 (key flags: 03) hashed subpkt 11 len 7 (pref-sym-algos: 9 8 7 13 12 11 10) hashed subpkt 21 len 4 (pref-hash-algos: 10 9 8 11) hashed subpkt 30 len 1 (features: 01) hashed subpkt 23 len 1 (key server preferences: 80) hashed subpkt 25 len 1 (primary user ID) hashed subpkt 2 len 4 (sig created 2014-08-26) hashed subpkt 9 len 4 (key expires after 214d19h35m) subpkt 16 len 8 (issuer key ID 0664A76954265E8C) data: [3744 bits] :user ID packet: "Simon Josefsson " :signature packet: algo 1, keyid 0664A76954265E8C jas@latte:~$
Being happy with the new key, I import it and send it to key servers out there.
jas@latte:~$ gpg --import /media/KINGSTON/updated-key.txt gpg: key 54265E8C: "Simon Josefsson " 5 new signatures gpg: Total number processed: 1 gpg: new signatures: 5 jas@latte:~$ gpg --send-keys 54265e8c gpg: sending key 54265E8C to hkp server keys.gnupg.net jas@latte:~$ gpg --keyserver keyring.debian.org --send-keys 54265e8c gpg: sending key 54265E8C to hkp server keyring.debian.org jas@latte:~$
Finally: why go through this hassle, rather than set the key to expire in 50 years? Some reasons for this are:
- I don't trust myselt to keep track of a private key (or revocation cert) for 50 years.
- I want people to notice my revocation certificate as quickly as possible.
- I want people to notice other changes to my key (e.g., cipher preferences) as quickly as possible.
Let's look into the first reason a bit more. What would happen if I lose both the master key and the revocation cert, for a key that's valid 50 years? I would start from scratch and create a new key that I upload to keyservers. Then there would be two keys out there that are valid and identify me, and both will have a set of signatures on it. None of them will be revoked. If I happen to lose the new key again, there will be three valid keys out there with signatures on it. You may argue that this shouldn't be a problem, and that nobody should use any other key than the latest one I want to be used, but that's a technical argument - and at this point we have moved into usability, and that's a trickier area. Having users select which out of a couple of apparently all valid keys that exist for me is simply not going to work well.
The second is more subtle, but considerably more important. If people retrieve my key from keyservers today, and it expires in 50 years, there will be no need to refresh it from key servers. If for some reason I have to publish my revocation certificate, there will be people that won't see it. If instead I set a short validity period, people will have to refresh my key once in a while, and will then either get an updated expiration time, or will get the revocation certificate. This amounts to a CRL/OCSP-like model.
The third is similar to the second, but deserves to be mentioned on its own. Because the cipher preferences are expressed (and signed) in my key, and that ciphers come and go, I would expect that I will modify those during the life-time of my long-term key. If I have a long validity period of my key, people would not refresh it from key servers, and would encrypt messages to me with ciphers I may no longer want to be used.
The downside of having a short validity period is that I have to do slightly more work to get out the offline master key once in a while (which I have to once in a while anyway because I'm signing other peoples keys) and that others need to refresh my key from the key servers. Can anyone identify other disadvantages? Also, having to explain why I'm using a short validity period used to be a downside, but with this writeup posted that won't be the case any more.
26 Aug 2014 6:11pm GMT
This year I mentored two students doing work in support of Debian and free software (as well as those I mentored for Ganglia).
Both of them are presenting details about their work at DebConf 14 today.
While Juliana's work has been widely publicised already, mainly due to the fact it is accessible to every individual DD, Andrew's work is also quite significant and creates many possibilities to advance awareness of free software.
The Java project that is not just about Java
Andrew's project is about recursively building Java dependencies from third party repositories such as the Maven Central Repository. It matches up well with the wonderful new maven-debian-helper tool in Debian and will help us to fill out /usr/share/maven-repo on every Debian system.
Firstly, this is not just about Java. On a practical level, some aspects of the project are useful for many other purposes. One of those is the aim of scanning a repository for non-free artifacts, making a Git mirror or clone containing a dfsg branch for generating repackaged upstream source and then testing to see if it still builds.
Then there is the principle of software freedom. The Maven Central repository now requires that people publish a sources JAR and license metadata with each binary artifact they upload. They do not, however, demand that the sources JAR be complete or that the binary can be built by somebody else using the published sources. The license data must be specified but it does not appeared to be verified in the same way as packages inspected by Debian's legendary FTP masters.
Thanks to the transitive dependency magic of Maven, it is quite possible that many Java applications that are officially promoted as free software can't trace the source code of every dependency or build plugin.
Many organizations are starting to become more alarmed about the risk that they are dependent upon some rogue dependency. Maybe they will be hit with a lawsuit from a vendor stating that his plugin was only free for the first 3 months. Maybe some binary dependency JAR contains a nasty trojan for harvesting data about their corporate network.
People familiar with the principles of software freedom are in the perfect position to address these concerns and Andrew's work helps us build a cleaner alternative. It obviously can't rebuild every JAR for the very reason that some of them are not really free - however, it does give the opportunity to build a heat-map of trouble spots and also create a fast track to packaging for those heirarchies of JARs that are truly free.
Making WebRTC accessible to more people
People attending the session today or participating remotely are advised to set up your RTC / VoIP password at db.debian.org well in advance so the server will allow you to log in and try it during the session. It can take 30 minutes or so for the passwords to be replicated to the SIP proxy and TURN server.
Please also check my previous comments about what works and what doesn't and in particular, please be aware that Iceweasel / Firefox 24 on wheezy is not suitable unless you are on the same LAN as the person you are calling.
26 Aug 2014 4:33pm GMT
Just in case some of my free software friends would care and try understanding why I'm currently not attending my first DebConf since 2004...
Starting tomorrow 07:00am EST (so, 22:00 PST for Debconfers), I'll be running the "TDS" race of the Ultra-Trail du Mont-Blanc races.
Ultra-Trail du Mont-Blanc (UTMB) is one of the world famous long distance moutain trail races. It takes places in Chamonix, just below the Mont-Blanc, France's and Europe's highest moutain. The race is indeed simple : "go around the Mont-Blanc in a big circle, 160km long, with 10,000 meters positive climb cumulated on the climb of about 10 high passes between 2000 and 2700 meters altitude".
"My" race is a shortened version of UTMB that does half of the full loop, from Courmayeur in Italy (just "the other side" of Mont-Blanc, from Chamonix) and goes back to Chamonix. It is "only" 120 kilometers long with 7200 meters of positive climb. Some of these are however know as more difficult than UTMB itself.
Many firsts for me in this race : first "over 100km", first "over 24 hours running". Still, I trained hard for this, achieved a very though race in early July (60km, 5000m climb) with a very good result, and I expect to make it well.
Top runners complete this in 17 hours.....last arrivals are expected after 33 hours "running" (often fast walking, indeed). I plan to achieve the race in 28 hours but, indeed, I have no idea..:-)
So, in case you're boring in a night hacklab, or just want to draw your attention out of IRC, or don't have any package to polish...or just want to have a thought for an old friend, you can try to use the following link and follow all this live : http://utmb.livetrail.net/coureur.php?rech=6384⟨=en
Race start : 7am EST, Wednesday Aug 27th. bubulle arrival: Thursday Aug. 28th, between 10am and 4pm (best projection is 11am).
And there will be cheese at pit stops....
26 Aug 2014 6:09am GMT
A few changes to vmdebootstrap will need to go into the next version (0.3), including an example customise script to setup the u-boot support. With the changes, the command would be:
sudo ./vmdebootstrap --owner `whoami` --verbose --size 2G --mirror http://mirror.bytemark.co.uk/debian --log beaglebone-black.log --log-level debug --arch armhf --foreign /usr/bin/qemu-arm-static --no-extlinux --no-kernel --package u-boot --package linux-image-armmp --distribution sid --enable-dhcp --configure-apt --serial-console-command '/sbin/getty -L ttyO0 115200 vt100' --customize examples/beagleboneblack-customise.sh --bootsize 50m --boottype vfat --image bbb.img
Some of those commands are new but there are a few important elements:
- use of -arch and -foreign to provide the emulation needed to run the debootstrap second stage.
- drop extlinux and install u-boot as a package.
- linux-image-armmp kernel
- new command to configure an apt source
- serial-console-command as the BBB doesn't use the default /dev/ttyS0
- choice of sid to get the latest ARMMP and u-boot versions
- customize command - this is a script which does two things:
- copies the dtbs into the boot partition
- copies the u-boot files and creates a u-boot environment to use those files.
- use of a boot partition - note that it needs to be large enough to include the ARMMP kernel and a backup of the same files.
With this in place, a simple
dd to an SD card and the BBB boots directly into Debian ARMMP.
The examples are now in my branch and include an initial cubieboard script which is unfinished.
The current image is available for download. (222Mb).
I hope to upload the new vmdebootstrap soon - let me know if you do try the version in the branch.
26 Aug 2014 2:50am GMT
25 Aug 2014
Petter Reinholdtsen: Do you need an agreement with MPEG-LA to publish and broadcast H.264 video in Norway?
18.2. MPEG-4. MPEG-4 technology may be included with the software. MPEG LA, L.L.C. requires this notice:
This product is licensed under the MPEG-4 visual patent portfolio license for the personal and non-commercial use of a consumer for (i) encoding video in compliance with the MPEG-4 visual standard ("MPEG-4 video") and/or (ii) decoding MPEG-4 video that was encoded by a consumer engaged in a personal and non-commercial activity and/or was obtained from a video provider licensed by MPEG LA to provide MPEG-4 video. No license is granted or shall be implied for any other use. Additional information including that relating to promotional, internal and commercial uses and licensing may be obtained from MPEG LA, LLC. See http://www.mpegla.com. This product is licensed under the MPEG-4 systems patent portfolio license for encoding in compliance with the MPEG-4 systems standard, except that an additional license and payment of royalties are necessary for encoding in connection with (i) data stored or replicated in physical media which is paid for on a title by title basis and/or (ii) data which is paid for on a title by title basis and is transmitted to an end user for permanent storage and/or use, such additional license may be obtained from MPEG LA, LLC. See http://www.mpegla.com for additional details.
18.3. H.264/AVC. H.264/AVC technology may be included with the software. MPEG LA, L.L.C. requires this notice:
This product is licensed under the AVC patent portfolio license for the personal use of a consumer or other uses in which it does not receive remuneration to (i) encode video in compliance with the AVC standard ("AVC video") and/or (ii) decode AVC video that was encoded by a consumer engaged in a personal activity and/or was obtained from a video provider licensed to provide AVC video. No license is granted or shall be implied for any other use. Additional information may be obtained from MPEG LA, L.L.C. See http://www.mpegla.com.
Note the requirement that the videos created can only be used for personal or non-commercial purposes.
The Sorenson Media software have similar terms:
With respect to a license from Sorenson pertaining to MPEG-4 Video Decoders and/or Encoders: Any such product is licensed under the MPEG-4 visual patent portfolio license for the personal and non-commercial use of a consumer for (i) encoding video in compliance with the MPEG-4 visual standard ("MPEG-4 video") and/or (ii) decoding MPEG-4 video that was encoded by a consumer engaged in a personal and non-commercial activity and/or was obtained from a video provider licensed by MPEG LA to provide MPEG-4 video. No license is granted or shall be implied for any other use. Additional information including that relating to promotional, internal and commercial uses and licensing may be obtained from MPEG LA, LLC. See http://www.mpegla.com.
With respect to a license from Sorenson pertaining to MPEG-4 Consumer Recorded Data Encoder, MPEG-4 Systems Internet Data Encoder, MPEG-4 Mobile Data Encoder, and/or MPEG-4 Unique Use Encoder: Any such product is licensed under the MPEG-4 systems patent portfolio license for encoding in compliance with the MPEG-4 systems standard, except that an additional license and payment of royalties are necessary for encoding in connection with (i) data stored or replicated in physical media which is paid for on a title by title basis and/or (ii) data which is paid for on a title by title basis and is transmitted to an end user for permanent storage and/or use. Such additional license may be obtained from MPEG LA, LLC. See http://www.mpegla.com for additional details.
25 Aug 2014 8:10pm GMT
SigniTrend: Scalable Detection of Emerging Topics in Textual Streams by Hashed Significance Thresholds
Erich Schubert, Michael Weiler, Hans-Peter Kriegel
20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
25 Aug 2014 11:30am GMT
To round up the discussion of the Debian Administration site yesterday I flipped the switch on the load-balancing. Rather than this:
https -> pound \ \ http -------------> varnish --> apache
We now have the simpler route for all requests:
http -> haproxy -> apache https -> haproxy -> apache
This means we have one less HTTP-request for all incoming secure connections, and these days secure connections are preferred since a Strict-Transport-Security header is set.
In other news I've been juggling git repositories; I've setup an installation of GitBucket on my git-host. My personal git repository used to contain some private repositories and some mirrors.
Now it contains mirrors of most things on github, as well as many more private repositories.
The main reason for the switch was to get a prettier interface and bug-tracker support.
A side-benefit is that I can use "groups" to organize repositories, so for example:
- http://git.steve.org.uk/websites - Contains website source code.
- http://git.steve.org.uk/blogspam - Contains grouped repositories on a single theme.
Most of those are mirrors of the github repositories, but some are new. When signed in I see more sources, for example the source to http://steve.org.uk.
I've been pleased with the setup and performance, though I had to add some caching and some other magic at the nginx level to provide /robots.txt, etc, which are not otherwise present.
I'm not abandoning github, but I will no longer be using it for private repositories (I was gifted a free subscription a year or three ago), and nor will I post things there exclusively.
If a single canonical source location is required for a repository it will be one that I control, maintain, and host.
I don't expect I'll give people commit access on this mirror, but it is certainly possible. In the past I've certainly given people access to private repositories for collaboration, etc.
25 Aug 2014 8:16am GMT
Some people (including me :) are not native English speaking person, and also not use English for usual conversation. So, it's a bit tough for them to hear what you said if you speak as usual speed. We want to listen your presentation to understand and discuss about it (of course!), but sometimes machine gun speaking would prevent it.
Calm down, take a deep breath and do your presentation - then it'll be a fantastic, my cat will be pleased with it as below (meow!).
Thank you for your reading. See you in cheese & wine party.
25 Aug 2014 4:08am GMT
24 Aug 2014
We are happy to announce that live video streams will be available for talks and discussion meetings in DebConf14. Recordings will be posted soon after the events. You can also interact with other local and remote attendees by joining the IRC channels which are listed at the streams page.
For people who want to view the streams outside a webbrowser, the page for each room lists direct links to the streams.
More information on the streams and the various possibilities offered is available at DebConf Videostreams.
The schedule of talks is available at DebConf 14 Schedule.
Thanks to our amazing video volunteers for making it possible. If you like the video coverage, please add a thank you note to VideoTeam Thanks
24 Aug 2014 8:25pm GMT
Today is the first time I've taken an interstate train trip in something like 15 years. A few things about the trip were pleasantly surprising. Most of these will come as no surprise:
- Less time wasted in security theater at the station prior to departure.
- On-time departure
- More comfortable seats than a plane or bus.
- Permissive free wifi
Wifi was the biggest surprise. Not that it existed, since we're living in the future and wifi is expected everywhere. It's IPv4 only and stuck behind a NAT, which isn't a big surprise, but it is reasonably open. There isn't any port filtering of non-web TCP ports, and even non-TCP protocols are allowed out. Even my aiccu IPv6 tunnel worked fine from the train, although I did experience some weird behavior with it.
I haven't used aiccu much in quite a while, since I have a native IPv6 connection at home, but it can be convenient while travelling. I'm still trying to figure out happened today, though. The first symptoms were that, although I could ping IPv6 hosts, I could not actually log in via IMAP or ssh. Tcpdump showed all the standard symptoms of a PMTU blackhole. Small packets flow fine, large ones are dropped. The interface MTU is set to 1280, which is the minimum MTU for IPv6 and any path on the internet is expected to handle packets of at least that size. Experimentation via ping6 reveals that the largest payload size I can successfully exchange with a peer is 820 bytes. Add 8 bytes for the ICMPv6 header for 828 bytes of payload, plus 40 bytes for the IPv6 header gives an 868 byte packet, which is well under what should be the MTU for this path.
I've worked around this problem with an ip6tables rule to rewrite the MSS on outgoing SYN packets to 760 bytes, which should leave 40 for the IPv6 header and 20 for any extension headers:
sudo ip6tables -t mangle -A OUTPUT -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --set-mss 760
It is working well and will allow me to publish this from the train, which I'd otherwise have been unable to do. But... weird.
24 Aug 2014 8:19pm GMT
While I imagine Johannes Brahms was referring to music I think the sentiment applies to other endeavours just as well. The trap of believing an idea is worth something without an implementation occurs all too often, however this is not such an unhappy tale.
Lars Wirzenius, Steve McIntyre and myself were chatting a few weeks ago about several of the ongoing Debian discussions. As is often the case these discussions had devolved into somewhat unproductive noise and yet amongst all this was a voice of reason in Russ Allbery.
Lars decided that would take the opportunity of the upcoming opportunity of Debconf 14 to say thank you to Russ for his work. It was decided that a plaque would be a nice gift and I volunteered to do the physical manufacture. Lars came up with the idea of a DEBCON scale similar to the DEFCON scale and got some text together with an initial design idea.
I took the initial design and as is often the case what is practically possible forced several changes. The prototype was a steep learning curve on using the Cambridge makespace laser cutter to create all the separate pieces.
The construction is pretty simple and consisted of three layers of transparent acrylic plastic. The base layer is a single piece of plastic with the correct outline. The next layer has the DEBCON title, the Debian swirl and level numbers. The top layer has the text engraved in its back surface giving the impression the text floats above the layer behind it.
For the prototype I attempted to glue the pieces together. This was a complete disaster and required discarding the entire piece and starting again with new materials.
For the second version I used four small nylon bolts to hold the sandwich of layers together which worked very well.
Yesterday at the Debconf 14 opening Steve McIntyre presented it to Russ and I think he was pleased, certainly he was surprised (photo from Aigars Mahinovs).
The design files are available from my design git repo, though why anyone would want to reproduce it I have no idea ;-)
24 Aug 2014 7:47pm GMT
He makes the point (quoting slide 16) that the Free Software community is winning a war that is becoming increasingly pointless: yes, users have 100% Free Software thin client at their fingertips [or are really a few steps from there]. But all their relevant computations happen elsewhere, on remote systems they do not control, in the Cloud.
That give-up on control of computing is a huge and important problem, and probably the largest challenge for everybody caring about freedom, free speech, or privacy today. Stefano rightfully points out that we must do something about it. The big question is: how can we, as a community, address it?
Towards a Free Service Definition?
I believe that we all feel a bit lost with this issue because we are trying to attack it with our current tools & weapons. However, they are largely irrelevant here: the Free Software Definition is about software, and software is even to be understood strictly in it, as software programs. Applying it to services, or to computing in general, doesn't lead anywhere. In order to increase the general awareness about this issue, we should define more precisely what levels of control can be provided, to understand what services are not providing to users, and to make an informed decision about waiving a particular level of control when choosing to use a particular service.
Benjamin Mako Hill pointed out yesterday during the post-talk chat that services are not black or white: there aren't impure and pure services. Instead, there's a graduation of possible levels of control for the computing we do. The Free Software Definition lists four freedoms - how many freedoms, or types of control, should there be in a Free Service Definition, or a Controlled-Computing Definition? Again, this is not only about software: the platform on which a particular piece of software is executed has a huge impact on the available level of control: running your own instance of WordPress, or using an instance on wordpress.com, provides very different control (even if as Asheesh Laroia pointed out yesterday, WordPress does a pretty good job at providing export and import features to limit data lock-in).
The creation of such a definition is an iterative process. I actually just realized today that (according to Wikipedia) the very first occurrence of an attempt at a Free Software Definition was published in 1986 (GNU's bulletin Vol 1 No.1, page 8) - I thought it happened a couple of years earlier. Are there existing attempts at defining such freedoms or levels of controls, and at benchmarking such criteria against existing services? Such criteria would not only include control over software modifications and (re)distribution, but also likely include mentions of interoperability and open standards, both to enable the user to move to a compatible service, and to avoid forcing the user to use a particular implementation of a service. A better understanding of network effects is also needed: how much and what type of service lock-in is acceptable on social networks in exchange of functionality?
I think that we should inspire from what was achieved during the last 30 years on Free Software. The tools that were produced are probably irrelevant to address this issue, but there's a lot to learn from the way they were designed. I really look forward to the day when we will have:
- a Free Software Definition equivalent for services
- Debian Free Software Guidelines-like tests/checklist to evaluate services
- an equivalent of The Cathedral and the Bazaar, explaining how one can build successful business models on top of open services
24 Aug 2014 3:39pm GMT
DebConf 14 has started earlier today with the first two talks in sunny portland, oregon.
this year's edition of DebConf didn't feature a preceding DebCamp, & the attempts to organize a proper pkg-perl sprint were not very successful.
nevertheless, two other members of the Debian Perl Group & me met here in PDX on wednesday for our informal unofficial pkg-perl µ-sprint, & as intended, we've used the last days to work on some pkg-perl QA stuff:
- upload packages which were waiting for Perl 5.20
- upload packages which didn't have the Perl Group in Maintainer
- update OpenTasks wiki page
- update subscription to Perl packages in Ubuntu/Launchpad
- start annual git repos cleanup
- pkg-perl-tools: improve scripts to integrate upstream git repo
- update alternative (build) dependencies after perl 5.20 upload
- update Module::Build (build) dependencies
as usual, having someone to poke besides you, & the opportunity to get a second pair of eyes quickly was very beneficial. - & of course, spending time with my nice team mates is always a pleasure for me!
24 Aug 2014 12:25am GMT
23 Aug 2014
Yesterday I received my official diploma for the degree of Licentiate of Philosophy. The degree lies between a Master's degree and a doctorate, and is not required; it consists of the coursework required for a doctorate, and a Licentiate Thesis, "in which the student demonstrates good conversance with the field of research and the capability of independently and critically applying scientific research methods" (official translation of the Government decree on university degrees 794/2004, Section 23 Paragraph 2).
The title and abstract of my Licentiate Thesis follow:
The extent of empirical evidence that could inform evidence-based design of programming languages. A systematic mapping study.
Jyväskylä: University of Jyväskylä, 2014, 243 p.
(Jyväskylä Licentiate Theses in Computing,
ISSN 1795-9713; 18)
ISBN 978-951-39-5790-2 (nid.)
ISBN 978-951-39-5791-9 (PDF)
Background: Programming language design is not usually informed by empirical studies. In other fields similar problems have inspired an evidence-based paradigm of practice. Central to it are secondary studies summarizing and consolidating the research literature. Aims: This systematic mapping study looks for empirical research that could inform evidence-based design of programming languages. Method: Manual and keyword-based searches were performed, as was a single round of snowballing. There were 2056 potentially relevant publications, of which 180 were selected for inclusion, because they reported empirical evidence on the efficacy of potential design decisions and were published on or before 2012. A thematic synthesis was created. Results: Included studies span four decades, but activity has been sparse until the last five years or so. The form of conditional statements and loops, as well as the choice between static and dynamic typing have all been studied empirically for efficacy in at least five studies each. Error proneness, programming comprehension, and human effort are the most common forms of efficacy studied. Experimenting with programmer participants is the most popular method. Conclusions: There clearly are language design decisions for which empirical evidence regarding efficacy exists; they may be of some use to language designers, and several of them may be ripe for systematic reviewing. There is concern that the lack of interest generated by studies in this topic area until the recent surge of activity may indicate serious issues in their research approach.
Keywords: programming languages, programming language design, evidence-based paradigm, efficacy, research methods, systematic mapping study, thematic synthesis
A Licentiate Thesis is assessed by two examiners, usually drawn from outside of the home university; they write (either jointly or separately) a substantiated statement about the thesis, in which they suggest a grade. The final grade is almost always the one suggested by the examiners. I was very fortunate to have such prominent scientists as Dr. Stefan Hanenberg and Prof. Stein Krogdahl as the examiners of my thesis. They recommended, and I received, the grade "very good" (4 on a scale of 1-5).
The thesis has been accepted for publication in our faculty's licentiate thesis series and will in due course appear in our university's electronic database (along with a very small number of printed copies). In the mean time, if anyone wants an electronic preprint, send me email at firstname.lastname@example.org.
As you can imagine, the last couple of months in the spring were very stressful for me, as I pressed on to submit this thesis. After submission, it took me nearly two months to recover (which certain people who emailed me on Planet Haskell business during that period certainly noticed). It represents the fruit of almost four years of work (way more than normally is taken to complete a Licentiate Thesis, but never mind that), as I designed this study in Fall 2010.
Recently, I have been writing in my blog a series of posts in which I have been trying to clear my head about certain foundational issues that irritated me during the writing of the thesis. The thesis contains some of that, but that part of it is not very strong, as my examiners put it, for various reasons. The posts have been a deliberately non-academic attempt to shape the thoughts into words, to see what they look like fixed into a tangible form. (If you go read them, be warned: many of them are deliberately provocative, and many of them are intended as tentative in fact if not in phrasing; the series also is very incomplete at this time.)
I closed my previous post, the latest post in that series, as follows:
In fact, the whole of 20th Century philosophy of science is a big pile of failed attempts to explain science; not one explanation is fully satisfactory. [...] Most scientists enjoy not pondering it, for it's a bit like being a cartoon character: so long as you don't look down, you can walk on air.
I wrote my Master's Thesis (PDF) in 2002. It was about the formal method called "B"; but I took a lot of time and pages to examine the history and content of formal logic. My supervisor was, understandably, exasperated, but I did receive the highest possible grade for it (which I never have fully accepted I deserved). The main reason for that digression: I looked down, and I just had to go poke the bridge I was standing on to make sure I was not, in fact, walking on air. In the many years since, I've taken a lot of time to study foundations, first of mathematics, and more recently of science. It is one reason it took me about eight years to come up with a doable doctoral project (and I am still amazed that my department kept employing me; but I suppose they like my teaching, as do I). The other reason was, it took me that long to realize how to study the design of programming languages without going where everyone has gone before.
Debian people, if any are still reading, may find it interesting that I found significant use for the dctrl-tools toolset I have been writing for Debian for about fifteen years: I stored my data collection as a big pile of dctrl-format files. I ended up making some changes to the existing tools (I should upload the new version soon, I suppose), and I wrote another toolset (unfortunately one that is not general purpose, like the dctrl-tools are) in the process.
For the Haskell people, I mainly have an apology for not attending to Planet Haskell duties in the summer; but I am back in business now. I also note, somewhat to my regret, that I found very few studies dealing with Haskell. I just checked; I mention Haskell several times in the background chapter, but it is not mentioned in the results chapter (because there were not studies worthy of special notice).
I am already working on extending this work into a doctoral thesis. I expect, and hope, to complete that one faster.
23 Aug 2014 5:44pm GMT
After a bit more than 9 years, I am replacing Serendipity, which as been hosting my blog, by a self-made static solution. This means that when you are reading this, my server no longer has to execute some rather large body of untyped code to produce the bytes sent to you. Instead, that happens once in a while on my laptop, and they are stored as static files on the server.
I hope to get a little performance boost from this, so that my site can more easily hold up to being mentioned on hackernews. I also do not want to worry about security issues in Serendipity - static files are not hacked.
The actual implementation of the blog is rather masochistic, as my web page runs on one of these weird obfuscated languages (XSLT). Previously, it contained of XSLT stylesheets producing makefiles calling XSLT sheets. Now it is a bit more-self-contained, with one XSLT stylesheet writing out all the various html and rss files.
I managed to import all my old posts and comments thanks to this script by Michael Hamann (I had played around with this some months ago and just spend what seemed to be an hour to me to find this script again) and a small Haskell script. Old URLs are rewritten (using mod_rewrite) to the new paths, but feed readers might still be confused by this.
This opens the door to a long due re-design of my webpage. But not today...
23 Aug 2014 3:54pm GMT
I've mentored a number of students in 2013 and 2014 for Debian and Ganglia and most of the companies I've worked with have run internships and graduate programs from time to time. GSoC 2014 has just finished and with all the excitement, many students are already asking what they can do to prepare and be selected in 2015.
My own observation is that the more time the organization has to get to know the student, the more confident they can be selecting that student. Furthermore, the more time that the student has spent getting to know the free software community, the more easily they can complete GSoC.
Here I present a list of things that students can do to maximize their chance of selection and career opportunities at the same time. These tips are useful for people applying for GSoC itself and related programs such as GNOME's Outreach Program for Women or graduate placements in companies.
There is no guarantee that Google will run the program again in 2015 or any future year.
There is no guarantee that any organization or mentor (including myself) will be involved until the official list of organizations is published by Google.
Do not follow the advice of web sites that invite you to send pizza or anything else of value to prospective mentors.
Following the steps in this page doesn't guarantee selection. That said, people who do follow these steps are much more likely to be considered and interviewed than somebody who hasn't done any of the things in this list.
Understand what free software really is
You may hear terms like free software and open source software used interchangeably.
They don't mean exactly the same thing and many people use the term free software for the wrong things. Not all open source projects meet the definition of free software. Those that don't, usually as a result of deficiencies in their licenses, are fundamentally incompatible with the majority of software that does use genuinely free licenses.
Google Summer of Code is about both writing and publishing your code and it is also about community. It is fundamental that you know the basics of licensing and how to choose a free license that empowers the community to collaborate on your code well after GSoC has finished.
Please review the definition of free software early on and come back and review it from time to time. The The GNU Project / Free Software Foundation have excellent resources to help you understand what a free software license is and how it works to maximize community collaboration.
Don't look for shortcuts
There is no shortcut to GSoC selection and there is no shortcut to GSoC completion.
The student stipend (USD $5,500 in 2014) is not paid to students unless they complete a minimum amount of valid code. This means that even if a student did find some shortcut to selection, it is unlikely they would be paid without completing meaningful work.
If you are the right candidate for GSoC, you will not need a shortcut anyway. Are you the sort of person who can't leave a coding problem until you really feel it is fixed, even if you keep going all night? Have you ever woken up in the night with a dream about writing code still in your head? Do you become irritated by tedious or repetitive tasks and often think of ways to write code to eliminate such tasks? Does your family get cross with you because you take your laptop to Christmas dinner or some other significant occasion and start coding? If some of these statements summarize the way you think or feel you are probably a natural fit for GSoC.
An opportunity money can't buy
The GSoC stipend will not make you rich. It is intended to make sure you have enough money to survive through the summer and focus on your project. Professional developers make this much money in a week in leading business centers like New York, London and Singapore. When you get to that stage in 3-5 years, you will not even be thinking about exactly how much you made during internships.
GSoC gives you an edge over other internships because it involves publicly promoting your work. Many companies still try to hide the potential of their best recruits for fear they will be poached or that they will be able to demand higher salaries. Everything you complete in GSoC is intended to be published and you get full credit for it. Imagine a young musician getting the opportunity to perform on the main stage at a rock festival. This is how the free software community works. It is a meritocracy and there is nobody to hold you back.
Having a portfolio of free software that you have created or collaborated on and a wide network of professional contacts that you develop before, during and after GSoC will continue to pay you back for years to come. While other graduates are being screened through group interviews and testing days run by employers, people with a track record in a free software project often find they go straight to the final interview round.
Register your domain name and make a permanent email address
Free software is all about community and collaboration. Register your own domain name as this will become a focal point for your work and for people to get to know you as you become part of the community.
This is sound advice for anybody working in IT, not just programmers. It gives the impression that you are confident and have a long term interest in a technology career.
Choosing the provider: as a minimum, you want a provider that offers DNS management, static web site hosting, email forwarding and XMPP services all linked to your domain. You do not need to choose the provider that is linked to your internet connection at home and that is often not the best choice anyway. The XMPP foundation maintains a list of providers known to support XMPP.
Create an email address within your domain name. The most basic domain hosting providers will let you forward the email address to a webmail or university email account of your choice. Configure your webmail to send replies using your personalized email address in the From header.
Update your ~/.gitconfig file to use your personalized email address in your Git commits.
Create a web site and blog
Start writing a blog. Host it using your domain name.
Some people blog every day, other people just blog once every two or three months.
Create links from your web site to your other profiles, such as a Github profile page. This helps reinforce the pages/profiles that are genuinely related to you and avoid confusion with the pages of other developers.
Many mentors are keen to see their students writing a weekly report on a blog during GSoC so starting a blog now gives you a head start. Mentors look at blogs during the selection process to try and gain insight into which topics a student is most suitable for.
Create a profile on Github
Github is one of the most widely used software development web sites. Github makes it quick and easy for you to publish your work and collaborate on the work of other people. Create an account today and get in the habbit of forking other projects, improving them, committing your changes and pushing the work back into your Github account.
Github will quickly build a profile of your commits and this allows mentors to see and understand your interests and your strengths.
In your Github profile, add a link to your web site/blog and make sure the email address you are using for Git commits (in the ~/.gitconfig file) is based on your personal domain.
Start using PGP
Pretty Good Privacy (PGP) is the industry standard in protecting your identity online. All serious free software projects use PGP to sign tags in Git, to sign official emails and to sign official release files.
The most common way to start using PGP is with the GnuPG (GNU Privacy Guard) utility. It is installed by the package manager on most Linux systems.
When you create your own PGP key, use the email address involving your domain name. This is the most permanent and stable solution.
Print your key fingerprint using the gpg-key2ps command, it is in the signing-party package on most Linux systems. Keep copies of the fingerprint slips with you.
This is what my own PGP fingerprint slip looks like. You can also print the key fingerprint on a business card for a more professional look.
Using PGP, it is recommend that you sign any important messages you send but you do not have to encrypt the messages you send, especially if some of the people you send messages to (like family and friends) do not yet have the PGP software to decrypt them.
Get your PGP key signed
Once you have a PGP key, you will need to find other developers to sign it. For people I mentor personally in GSoC, I'm keen to see that you try and find another Debian Developer in your area to sign your key as early as possible.
Free software events
Try and find all the free software events in your area in the months between now and the end of the next Google Summer of Code season. Aim to attend at least two of them before GSoC.
Look closely at the schedules and find out about the individual speakers, the companies and the free software projects that are participating. For events that span more than one day, find out about the dinners, pub nights and other social parts of the event.
Try and identify people who will attend the event who have been GSoC mentors or who intend to be. Contact them before the event, if you are keen to work on something in their domain they may be able to make time to discuss it with you in person.
Take your PGP fingerprint slips. Even if you don't participate in a formal key-signing party at the event, you will still find some developers to sign your PGP key individually. You must take a photo ID document (such as your passport) for the other developer to check the name on your fingerprint but you do not give them a copy of the ID document.
Events come in all shapes and sizes. FOSDEM is an example of one of the bigger events in Europe, linux.conf.au is a similarly large event in Australia. There are many, many more local events such as the Debian France mini-DebConf in Lyon, 2015. Many events are either free or free for students but please check carefully if there is a requirement to register before attending.
On your blog, discuss which events you are attending and which sessions interest you. Write a blog during or after the event too, including photos.
Quantcast generously hosted the Ganglia community meeting in San Francisco, October 2013. We had a wild time in their offices with mini-scooters, burgers, beers and the Ganglia book. That's me on the pink mini-scooter and Bernard Li, one of the other Ganglia GSoC 2014 admins is on the right.
GSoC is fundamentally about free software. Linux is to free software what a tree is to the forest. Using Linux every day on your personal computer dramatically increases your ability to interact with the free software community and increases the number of potential GSoC projects that you can participate in.
This is not to say that people using Mac OS or Windows are unwelcome. I have worked with some great developers who were not Linux users. Linux gives you an edge though and the best time to gain that edge is now, while you are a student and well before you apply for GSoC.
If you must run Windows for some applications used in your course, it will run just fine in a virtual machine using Virtual Box, a free software solution for desktop virtualization. Use Linux as the primary operating system.
Here are links to download ISO DVD (and CD) images for some of the main Linux distributions:
If you are nervous about getting started with Linux, install it on a spare PC or in a virtual machine before you install it on your main PC or laptop. Linux is much less demanding on the hardware than Windows so you can easily run it on a machine that is 5-10 years old. Having just 4GB of RAM and 20GB of hard disk is usually more than enough for a basic graphical desktop environment although having better hardware makes it faster.
Your experiences installing and running Linux, especially if it requires some special effort to make it work with some of your hardware, make interesting topics for your blog.
Decide which technologies you know best
In a GSoC program, you will typically do most of your work in just one of these languages.
From the outset, decide which language you will focus on and do everything you can to improve your competence with that language. For example, if you have already used Java in most of your course, plan on using Java in GSoC and make sure you read Effective Java (2nd Edition) by Joshua Bloch.
Decide which themes appeal to you
Find a topic that has long-term appeal for you. Maybe the topic relates to your course or maybe you already know what type of company you would like to work in.
Here is a list of some topics and some of the relevant software projects:
- System administration, servers and networking: consider projects involving monitoring, automation, packaging. Ganglia is a great community to get involved with and you will encounter the Ganglia software in many large companies and academic/research networks. Contributing to a Linux distribution like Debian or Fedora packaging is another great way to get into system administration.
- Desktop and user interface: consider projects involving window managers and desktop tools or adding to the user interface of just about any other software.
- Big data and data science: this can apply to just about any other theme. For example, data science techniques are frequently used now to improve system administration.
- Business and accounting: consider accounting, CRM and ERP software.
- Finance and trading: consider projects like R, market data software like OpenMAMA and connectivity software (Apache Camel)
- Real-time communication (RTC), VoIP, webcam and chat: look at the JSCommunicator or the Jitsi project
Before the GSoC application process begins, you should aim to learn as much as possible about the theme you prefer and also gain practical experience using the software relating to that theme. For example, if you are attracted to the business and accounting theme, install the PostBooks suite and get to know it. Maybe you know somebody who runs a small business: help them to upgrade to PostBooks and use it to prepare some reports.
Make some small project, less than two week's work, to demonstrate your skills. It is important to make something that somebody will use for a practical purpose, this will help you gain experience communicating with other users through Github.
For an example, see the servlet Juliana Louback created for fixing phone numbers in December 2013. It has since been used as part of the Lumicall web site and Juliana was selected for a GSoC 2014 project with Debian.
There is no better way to demonstrate to a prospective mentor that you are ready for GSoC than by completing and publishing some small project like this yourself. If you don't have any immediate project ideas, many developers will also be able to give you tips on small projects like this that you can attempt, just come and ask us on one of the mailing lists.
Ideally, the project will be something that you would use anyway even if you do not end up participating in GSoC. Such projects are the most motivating and rewarding and usually end up becoming an example of your best work. To continue the example of somebody with a preference for business and accounting software, a small project you might create is a plugin or extension for PostBooks.
Getting to know prospective mentors
Many web sites provide useful information about the developers who contribute to free software projects. Some of these developers may be willing to be a GSoC mentor.
For example, look through some of the following:
- Planet / Blog aggregation sites: these sites all have links to the blogs of many developers. They are useful sources of information about events and also finding out who works on what.
- Developer profile pages. Many projects publish a page about each developer and the packages, modules or other components he/she is responsible for. Look through these lists for areas of mutual interest.
- Developer github profiles. Github makes it easy to see what projects a developer has contributed to. To see many of my own projects, browse through the history at my own Github profile
Getting on the mentor's shortlist
Once you have identified projects that are interesting to you and developers who work on those projects, it is important to get yourself on the developer's shortlist.
Basically, the shortlist is a list of all students who the developer believes can complete the project. If I feel that a student is unlikely to complete a project or if I don't have enough information to judge a student's probability of success, that student will not be on my shortlist.
If I don't have any student on my shortlist, then a project will not go ahead at all. If there are multiple students on the shortlist, then I will be looking more closely at each of them to try and work out who is the best match.
One way to get a developer's attention is to look at bug reports they have created. Github makes it easy to see complaints or bug reports they have made about their own projects or other projects they depend on. Another way to do this is to search through their code for strings like FIXME and TODO. Projects with standalone bug trackers like the Debian bug tracker also provide an easy way to search for bug reports that a specific person has created or commented on.
Once you find some relevant bug reports, email the developer. Ask if anybody else is working on those issues. Try and start with an issue that is particularly easy and where the solution is interesting for you. This will help you learn to compile and test the program before you try to fix any more complicated bugs. It may even be something you can work on as part of your academic program.
Find successful projects from the previous year
Contact organizations and ask them which GSoC projects were most successful. In many organizations, you can find the past students' project plans and their final reports published on the web. Read through the plans submitted by the students who were chosen. Then read through the final reports by the same students and see how they compare to the original plans.
Start building your project proposal now
Don't wait for the application period to begin. Start writing a project proposal now.
When writing a proposal, it is important to include several things:
- Think big: what is the goal at the end of the project? Does your work help the greater good in some way, such as increasing the market share of Linux on the desktop?
- Details: what are specific challenges? What tools will you use?
- Time management: what will you do each week? Are there weeks where you will not work on GSoC due to vacation or other events? These things are permitted but they must be in your plan if you know them in advance. If an accident or death in the family cut a week out of your GSoC project, which work would you skip and would your project still be useful without that? Having two weeks of flexible time in your plan makes it more resilient against interruptions.
- Communication: are you on mailing lists, IRC and XMPP chat? Will you make a weekly report on your blog?
- Users: who will benefit from your work?
- Testing: who will test and validate your work throughout the project? Ideally, this should involve more than just the mentor.
If your project plan is good enough, could you put it on Kickstarter or another crowdfunding site? This is a good test of whether or not a project is going to be supported by a GSoC mentor.
Learn about packaging and distributing software
Packaging is a vital part of the free software lifecycle. It is very easy to upload a project to Github but it takes more effort to have it become an official package in systems like Debian, Fedora and Ubuntu.
Packaging and the communities around Linux distributions help you reach out to users of your software and get valuable feedback and new contributors. This boosts the impact of your work.
To start with, you may want to help the maintainer of an existing package. Debian packaging teams are existing communities that work in a team and welcome new contributors. The Debian Mentors initiative is another great starting place. In the Fedora world, the place to start may be in one of the Special Interest Groups (SIGs).
Think from the mentor's perspective
After the application deadline, mentors have just 2 or 3 weeks to choose the students. This is actually not a lot of time to be certain if a particular student is capable of completing a project. If the student has a published history of free software activity, the mentor feels a lot more confident about choosing the student.
Some mentors have more than one good student while other mentors receive no applications from capable students. In this situation, it is very common for mentors to send each other details of students who may be suitable. Once again, if a student has a good Github profile and a blog, it is much easier for mentors to try and match that student with another project.
Getting into the world of software engineering is much like joining any other profession or even joining a new hobby or sporting activity. If you run, you probably have various types of shoe and a running watch and you may even spend a couple of nights at the track each week. If you enjoy playing a musical instrument, you probably have a collection of sheet music, accessories for your instrument and you may even aspire to build a recording studio in your garage (or you probably know somebody else who already did that).
The things listed on this page will not just help you walk the walk and talk the talk of a software developer, they will put you on a track to being one of the leaders. If you look over the profiles of other software developers on the Internet, you will find they are doing most of the things on this page already. Even if you are not selected for GSoC at all or decide not to apply, working through the steps on this page will help you clarify your own ideas about your career and help you make new friends in the software engineering community.
23 Aug 2014 11:37am GMT