21 Feb 2018

feedPlanet Ubuntu

Colin King: Linux Kernel Module Growth

The Linux kernel grows at an amazing pace, each kernel release adds more functionality, more drivers and hence more kernel modules. I recently wondered what the trend was for kernel module growth per release, so I performed module builds on kernels v2.6.24 through to v4.16-rc2 for x86-64 to get a better idea of growth rates:


..as one can see, the rate of growth is relatively linear with about 89 modules being added to each kernel release, which is not surprising as the size of the kernel is growing at a fairly linear rate too. It is interesting to see that the number of modules has easily more than tripled in the 10 years between v2.6.24 and v4.16-rc2, with a rate of about 470 new modules per year. At this rate, Linux will see the 10,000th module land in around the year 2025.

21 Feb 2018 4:46pm GMT

Ubuntu Insights: Canonical announces Ubuntu Core across Rigado’s IoT gateways

Rigado customers will benefit from open source, cost-effective, secure software in their commercial IoT deployments

London, UK - 21st February 2018 - Canonical, the company behind Ubuntu, today announced that Ubuntu Core will be deployed across Rigado's Edge Connectivity gateway solutions, further establishing Ubuntu Core as the premier operating system for edge computing devices. Rigado's enterprise-grade, easily configurable IoT gateways will offer Ubuntu Core's secure and open architecture for companies globally to deploy and manage their commercial IoT applications, such as asset tracking and connected guest experiences.

Rigado's IoT gateways can be used across a variety of vertical market use cases, and are particularly popular across retail and hospitality industries where commercial spaces require a secure and scalable infrastructure for edge computing. The gateways support a wide range of connected technologies, including smart lighting, asset tracking, sensors and monitoring, and will be available with Ubuntu Core by summer 2018.

In addition to integrating Ubuntu Core with their gateways, Rigado has also adopted Canonical's IoT app store, enabling themselves and their reseller partners to curate a customised suite of IoT applications for their customers. System integrators and IoT solution providers can create their own private app stores for industry-specific solutions.

Ben Corrado, CEO at Rigado commented: "Large-scale commercial IoT deployments are not one size fits all. They require infrastructure that is open, scalable, and secure, with software that can evolve to meet the changing needs of the customer. Ubuntu Core brings a powerful and open architecture to IoT gateways, ensuring they're an enabler of business success for customers, now and in the future."

Tom Canning, Vice President of Devices and IoT at Canonical added: "The combination of Ubuntu Core with Rigado's IoT Gateways creates a truly open architecture for large-scale edge connectivity and computing. Rigado allows companies to quickly and economically deploy and manage IoT applications in a way that's simple, secure and scalable - similar to what we've come to expect from mobile and web applications, and so Ubuntu Core with the snap ecosystem is the perfect platform to meet these expectations."

Rigado's Edge Connectivity solutions will support the Ubuntu Core and snap architecture, providing a platform for the development of sophisticated control, monitoring and tracking applications. Further supporting security, Rigado gateways include a secure boot function and encrypted file systems which protect the integrity of the code and data on the gateways, preventing unauthorized code from being loaded. And, each instance of Ubuntu Core on a Rigado gateway is fully supported by the Canonical security and update service with updates and software installations managed through Rigado.

Ubuntu Core is a minimalist, transactional version of Ubuntu designed for IoT devices, running a new breed of highly secure, remotely upgradeable Linux app packages, the aforementioned snaps. Rigado's integration of Ubuntu Core with their IoT gateways is the latest example of Ubuntu Core being trusted and deployed by leading IoT players, from chipset vendors to device makers and system integrators.

Snaps allow developers to quickly build and users to easily install software, making it ideal for fast IoT deployments plus the ability to have multiple applications running on a single gateway. Snaps enable Rigado users to purchase a device now that will be able to install new features and applications, without the need for future hardware upgrades.

Rigado's gateway with Ubuntu Core will be on display at Embedded World 2018 in Nuremberg on Canonical's stand (4-568) from 27th February - 1st March.

###

About Canonical
Canonical is the company behind Ubuntu, the leading OS for cloud operations. Most public cloud workloads use Ubuntu, as do most new smart gateways, switches, self-driving cars and advanced robots. Canonical provides enterprise support and services for commercial users of Ubuntu. Established in 2004, Canonical is a privately held company.

About Rigado

Rigado delivers edge connectivity and computing solutions for large-scale IoT deployments. Their products include IoT Edge Gateways with integrated management services, as well as certified low-energy wireless modules for end devices. Rigado's edge connectivity solutions power more than 250 global customers and 5 million connected devices.

21 Feb 2018 2:55pm GMT

Sebastian Dröge: How to write GStreamer Elements in Rust Part 2: A raw audio sine wave source

A bit later than anticipated, this is now part two of the blog post series about writing GStreamer elements in Rust. Part one can be found here, and I'll assume that everything written there is known already.

In this part, a raw audio sine wave source element is going to be written. It will be similar to the one Mathieu was writing in his blog post about writing such a GStreamer element in Python. Various details will be different though, but more about that later.

The final code can be found here.

Table of Contents

  1. Boilerplate
  2. Caps Negotiation
  3. Query Handling
  4. Buffer Creation
  5. (Pseudo) Live Mode
  6. Unlocking
  7. Seeking

Boilerplate

The first part here will be all the boilerplate required to set up the element. You can safely skip this if you remember all this from the previous blog post.

Our sine wave element is going to produce raw audio, with a number of channels and any possible sample rate with both 32 bit and 64 bit floating point samples. It will produce a simple sine wave with a configurable frequency, volume/mute and number of samples per audio buffer. In addition it will be possible to configure the element in (pseudo) live mode, meaning that it will only produce data in real-time according to the pipeline clock. And it will be possible to seek to any time/sample position on our source element. It will basically be a more simply version of the audiotestsrc element from gst-plugins-base.

So let's get started with all the boilerplate. This time our element will be based on the BaseSrc base class instead of BaseTransform.

use glib;
use gst;
use gst::prelude::*;
use gst_base::prelude::*;
use gst_audio;

use byte_slice_cast::*;

use gst_plugin::properties::*;
use gst_plugin::object::*;
use gst_plugin::element::*;
use gst_plugin::base_src::*;

use std::{i32, u32};
use std::sync::Mutex;
use std::ops::Rem;

use num_traits::float::Float;
use num_traits::cast::NumCast;

// Default values of properties
const DEFAULT_SAMPLES_PER_BUFFER: u32 = 1024;
const DEFAULT_FREQ: u32 = 440;
const DEFAULT_VOLUME: f64 = 0.8;
const DEFAULT_MUTE: bool = false;
const DEFAULT_IS_LIVE: bool = false;

// Property value storage
#[derive(Debug, Clone, Copy)]
struct Settings {
    samples_per_buffer: u32,
    freq: u32,
    volume: f64,
    mute: bool,
    is_live: bool,
}

impl Default for Settings {
    fn default() -> Self {
        Settings {
            samples_per_buffer: DEFAULT_SAMPLES_PER_BUFFER,
            freq: DEFAULT_FREQ,
            volume: DEFAULT_VOLUME,
            mute: DEFAULT_MUTE,
            is_live: DEFAULT_IS_LIVE,
        }
    }
}

// Metadata for the properties
static PROPERTIES: [Property; 5] = [
    Property::UInt(
        "samples-per-buffer",
        "Samples Per Buffer",
        "Number of samples per output buffer",
        (1, u32::MAX),
        DEFAULT_SAMPLES_PER_BUFFER,
        PropertyMutability::ReadWrite,
    ),
    Property::UInt(
        "freq",
        "Frequency",
        "Frequency",
        (1, u32::MAX),
        DEFAULT_FREQ,
        PropertyMutability::ReadWrite,
    ),
    Property::Double(
        "volume",
        "Volume",
        "Output volume",
        (0.0, 10.0),
        DEFAULT_VOLUME,
        PropertyMutability::ReadWrite,
    ),
    Property::Boolean(
        "mute",
        "Mute",
        "Mute",
        DEFAULT_MUTE,
        PropertyMutability::ReadWrite,
    ),
    Property::Boolean(
        "is-live",
        "Is Live",
        "(Pseudo) live output",
        DEFAULT_IS_LIVE,
        PropertyMutability::ReadWrite,
    ),
];

// Stream-specific state, i.e. audio format configuration
// and sample offset
struct State {
    info: Option,
    sample_offset: u64,
    sample_stop: Option,
    accumulator: f64,
}

impl Default for State {
    fn default() -> State {
        State {
            info: None,
            sample_offset: 0,
            sample_stop: None,
            accumulator: 0.0,
        }
    }
}

// Struct containing all the element data
struct SineSrc {
    cat: gst::DebugCategory,
    settings: Mutex,
    state: Mutex,
}

impl SineSrc {
    // Called when a new instance is to be created
    fn new(element: &BaseSrc) -> Box> {
        // Initialize live-ness and notify the base class that
        // we'd like to operate in Time format
        element.set_live(DEFAULT_IS_LIVE);
        element.set_format(gst::Format::Time);

        Box::new(Self {
            cat: gst::DebugCategory::new(
                "rssinesrc",
                gst::DebugColorFlags::empty(),
                "Rust Sine Wave Source",
            ),
            settings: Mutex::new(Default::default()),
            state: Mutex::new(Default::default()),
        })
    }

    // Called exactly once when registering the type. Used for
    // setting up metadata for all instances, e.g. the name and
    // classification and the pad templates with their caps.
    //
    // Actual instances can create pads based on those pad templates
    // with a subset of the caps given here. In case of basesrc,
    // a "src" and "sink" pad template are required here and the base class
    // will automatically instantiate pads for them.
    //
    // Our element here can output f32 and f64
    fn class_init(klass: &mut BaseSrcClass) {
        klass.set_metadata(
            "Sine Wave Source",
            "Source/Audio",
            "Creates a sine wave",
            "Sebastian Dröge ",
        );

        // On the src pad, we can produce F32/F64 with any sample rate
        // and any number of channels
        let caps = gst::Caps::new_simple(
            "audio/x-raw",
            &[
                (
                    "format",
                    &gst::List::new(&[
                        &gst_audio::AUDIO_FORMAT_F32.to_string(),
                        &gst_audio::AUDIO_FORMAT_F64.to_string(),
                    ]),
                ),
                ("layout", &"interleaved"),
                ("rate", &gst::IntRange::::new(1, i32::MAX)),
                ("channels", &gst::IntRange::::new(1, i32::MAX)),
            ],
        );
        // The src pad template must be named "src" for basesrc
        // and specific a pad that is always there
        let src_pad_template = gst::PadTemplate::new(
            "src",
            gst::PadDirection::Src,
            gst::PadPresence::Always,
            &caps,
        );
        klass.add_pad_template(src_pad_template);

        // Install all our properties
        klass.install_properties(&PROPERTIES);
    }
}

impl ObjectImpl for SineSrc {
    // Called whenever a value of a property is changed. It can be called
    // at any time from any thread.
    fn set_property(&self, obj: &glib::Object, id: u32, value: &glib::Value) {
        let prop = &PROPERTIES[id as usize];
        let element = obj.clone().downcast::().unwrap();

        match *prop {
            Property::UInt("samples-per-buffer", ..) => {
                let mut settings = self.settings.lock().unwrap();
                let samples_per_buffer = value.get().unwrap();
                gst_info!(
                    self.cat,
                    obj: &element,
                    "Changing samples-per-buffer from {} to {}",
                    settings.samples_per_buffer,
                    samples_per_buffer
                );
                settings.samples_per_buffer = samples_per_buffer;
                drop(settings);

                let _ =
                    element.post_message(&gst::Message::new_latency().src(Some(&element)).build());
            }
            Property::UInt("freq", ..) => {
                let mut settings = self.settings.lock().unwrap();
                let freq = value.get().unwrap();
                gst_info!(
                    self.cat,
                    obj: &element,
                    "Changing freq from {} to {}",
                    settings.freq,
                    freq
                );
                settings.freq = freq;
            }
            Property::Double("volume", ..) => {
                let mut settings = self.settings.lock().unwrap();
                let volume = value.get().unwrap();
                gst_info!(
                    self.cat,
                    obj: &element,
                    "Changing volume from {} to {}",
                    settings.volume,
                    volume
                );
                settings.volume = volume;
            }
            Property::Boolean("mute", ..) => {
                let mut settings = self.settings.lock().unwrap();
                let mute = value.get().unwrap();
                gst_info!(
                    self.cat,
                    obj: &element,
                    "Changing mute from {} to {}",
                    settings.mute,
                    mute
                );
                settings.mute = mute;
            }
            Property::Boolean("is-live", ..) => {
                let mut settings = self.settings.lock().unwrap();
                let is_live = value.get().unwrap();
                gst_info!(
                    self.cat,
                    obj: &element,
                    "Changing is-live from {} to {}",
                    settings.is_live,
                    is_live
                );
                settings.is_live = is_live;
            }
            _ => unimplemented!(),
        }
    }

    // Called whenever a value of a property is read. It can be called
    // at any time from any thread.
    fn get_property(&self, _obj: &glib::Object, id: u32) -> Result {
        let prop = &PROPERTIES[id as usize];

        match *prop {
            Property::UInt("samples-per-buffer", ..) => {
                let settings = self.settings.lock().unwrap();
                Ok(settings.samples_per_buffer.to_value())
            }
            Property::UInt("freq", ..) => {
                let settings = self.settings.lock().unwrap();
                Ok(settings.freq.to_value())
            }
            Property::Double("volume", ..) => {
                let settings = self.settings.lock().unwrap();
                Ok(settings.volume.to_value())
            }
            Property::Boolean("mute", ..) => {
                let settings = self.settings.lock().unwrap();
                Ok(settings.mute.to_value())
            }
            Property::Boolean("is-live", ..) => {
                let settings = self.settings.lock().unwrap();
                Ok(settings.is_live.to_value())
            }
            _ => unimplemented!(),
        }
    }
}

// Virtual methods of gst::Element. We override none
impl ElementImpl for SineSrc { }

impl BaseSrcImpl for SineSrc {
    // Called when starting, so we can initialize all stream-related state to its defaults
    fn start(&self, element: &BaseSrc) -> bool {
        // Reset state
        *self.state.lock().unwrap() = Default::default();

        gst_info!(self.cat, obj: element, "Started");

        true
    }

    // Called when shutting down the element so we can release all stream-related state
    fn stop(&self, element: &BaseSrc) -> bool {
        // Reset state
        *self.state.lock().unwrap() = Default::default();

        gst_info!(self.cat, obj: element, "Stopped");

        true
    }
}

struct SineSrcStatic;

// The basic trait for registering the type: This returns a name for the type and registers the
// instance and class initializations functions with the type system, thus hooking everything
// together.
impl ImplTypeStatic for SineSrcStatic {
    fn get_name(&self) -> &str {
        "SineSrc"
    }

    fn new(&self, element: &BaseSrc) -> Box> {
        SineSrc::new(element)
    }

    fn class_init(&self, klass: &mut BaseSrcClass) {
        SineSrc::class_init(klass);
    }
}

// Registers the type for our element, and then registers in GStreamer under
// the name "sinesrc" for being able to instantiate it via e.g.
// gst::ElementFactory::make().
pub fn register(plugin: &gst::Plugin) {
    let type_ = register_type(SineSrcStatic);
    gst::Element::register(plugin, "rssinesrc", 0, type_);
}

If any of this needs explanation, please see the previous blog post and the comments in the code. The explanation for all the structs fields and what they're good for will follow in the next sections.

With all of the above and a small addition to src/lib.rs this should compile now.

mod sinesrc;
[...]

fn plugin_init(plugin: &gst::Plugin) -> bool {
    [...]
    sinesrc::register(plugin);
    true
}

Also a couple of new crates have to be added to Cargo.toml and src/lib.rs, but you best check the code in the repository for details.

Caps Negotiation

The first part that we have to implement, just like last time, is caps negotiation. We already notified the base class about any caps that we can potentially handle via the caps in the pad template in class_init but there are still two more steps of behaviour left that we have to implement.

First of all, we need to get notified whenever the caps that our source is configured for are changing. This will happen once in the very beginning and then whenever the pipeline topology or state changes and new caps would be more optimal for the new situation. This notification happens via the BaseTransform::set_caps virtual method.

fn set_caps(&self, element: &BaseSrc, caps: &gst::CapsRef) -> bool {
        use std::f64::consts::PI;

        let info = match gst_audio::AudioInfo::from_caps(caps) {
            None => return false,
            Some(info) => info,
        };

        gst_debug!(self.cat, obj: element, "Configuring for caps {}", caps);

        element.set_blocksize(info.bpf() * (*self.settings.lock().unwrap()).samples_per_buffer);

        let settings = *self.settings.lock().unwrap();
        let mut state = self.state.lock().unwrap();

        // If we have no caps yet, any old sample_offset and sample_stop will be
        // in nanoseconds
        let old_rate = match state.info {
            Some(ref info) => info.rate() as u64,
            None => gst::SECOND_VAL,
        };

        // Update sample offset and accumulator based on the previous values and the
        // sample rate change, if any
        let old_sample_offset = state.sample_offset;
        let sample_offset = old_sample_offset
            .mul_div_floor(info.rate() as u64, old_rate)
            .unwrap();

        let old_sample_stop = state.sample_stop;
        let sample_stop =
            old_sample_stop.map(|v| v.mul_div_floor(info.rate() as u64, old_rate).unwrap());

        let accumulator =
            (sample_offset as f64).rem(2.0 * PI * (settings.freq as f64) / (info.rate() as f64));

        *state = State {
            info: Some(info),
            sample_offset: sample_offset,
            sample_stop: sample_stop,
            accumulator: accumulator,
        };

        drop(state);

        let _ = element.post_message(&gst::Message::new_latency().src(Some(element)).build());

        true
    }

In here we parse the caps into a AudioInfo and then store that in our internal state, while updating various fields. We tell the base class about the number of bytes each buffer is usually going to hold, and update our current sample position, the stop sample position (when a seek with stop position happens, we need to know when to stop) and our accumulator. This happens by scaling both positions by the old and new sample rate. If we don't have an old sample rate, we assume nanoseconds (this will make more sense once seeking is implemented). The scaling is done with the help of the muldiv crate, which implements scaling of integer types by a fraction with protection against overflows by doing up to 128 bit integer arithmetic for intermediate values.

The accumulator is the updated based on the current phase of the sine wave at the current sample position.

As a last step we post a new LATENCY message on the bus whenever the sample rate has changed. Our latency (in live mode) is going to be the duration of a single buffer, but more about that later.

BaseSrc is by default already selecting possible caps for us, if there are multiple options. However these defaults might not be (and often are not) ideal and we should override the default behaviour slightly. This is done in the BaseSrc::fixate virtual method.

fn fixate(&self, element: &BaseSrc, caps: gst::Caps) -> gst::Caps {
        // Fixate the caps. BaseSrc will do some fixation for us, but
        // as we allow any rate between 1 and MAX it would fixate to 1. 1Hz
        // is generally not a useful sample rate.
        //
        // We fixate to the closest integer value to 48kHz that is possible
        // here, and for good measure also decide that the closest value to 1
        // channel is good.
        let mut caps = gst::Caps::truncate(caps);
        {
            let caps = caps.make_mut();
            let s = caps.get_mut_structure(0).unwrap();
            s.fixate_field_nearest_int("rate", 48_000);
            s.fixate_field_nearest_int("channels", 1);
        }

        // Let BaseSrc fixate anything else for us. We could've alternatively have
        // called Caps::fixate() here
        element.parent_fixate(caps)
    }

Here we take the caps that are passed in, truncate them (i.e. remove all but the very first Structure) and then manually fixate the sample rate to the closest value to 48kHz. By default, caps fixation would result in the lowest possible sample rate but this is usually not desired.

For good measure, we also fixate the number of channels to the closest value to 1, but this would already be the default behaviour anyway. And then chain up to the parent class' implementation of fixate, which for now basically does the same as Caps::fixate(). After this, the caps are fixated, i.e. there is only a single Structure left and all fields have concrete values (no ranges or sets).

Query Handling

As our source element will work by generating a new audio buffer from a specific offset, and especially works in Time format, we want to notify downstream elements that we don't want to run in Pull mode, only in Push mode. In addition would prefer sequential reading. However we still allow seeking later. For a source that does not know about Time, e.g. a file source, the format would be configured as Bytes. Other values than Time and Bytes generally don't make any sense.

The main difference here is that otherwise the base class would ask us to produce data for arbitrary Byte offsets, and we would have to produce data for that. While possible in our case, it's a bit annoying and for other audio sources it's not easily possible at all.

Downstream elements will try to query this very information from us, so we now have to override the default query handling of BaseSrc and handle the SCHEDULING query differently. Later we will also handle other queries differently.

fn query(&self, element: &BaseSrc, query: &mut gst::QueryRef) -> bool {
        use gst::QueryView;

        match query.view_mut() {
            // We only work in Push mode. In Pull mode, create() could be called with
            // arbitrary offsets and we would have to produce for that specific offset
            QueryView::Scheduling(ref mut q) => {
                q.set(gst::SchedulingFlags::SEQUENTIAL, 1, -1, 0);
                q.add_scheduling_modes(&[gst::PadMode::Push]);
                return true;
            }
            _ => (),
        }
        BaseSrcBase::parent_query(element, query)
    }

To handle the SCHEDULING query specifically, we first have to match on a view (mutable because we want to modify the view) of the query check the type of the query. If it indeed is a scheduling query, we can set the SEQUENTIAL flag and specify that we handle only Push mode, then return true directly as we handled the query already.

In all other cases we fall back to the parent class' implementation of the query virtual method.

Buffer Creation

Now we have everything in place for a working element, apart from the virtual method to actually generate the raw audio buffers with the sine wave. From a high-level BaseSrc works by calling the create virtual method over and over again to let the subclass produce a buffer until it returns an error or signals the end of the stream.

Let's first talk about how to generate the sine wave samples themselves. As we want to operate on 32 bit and 64 bit floating point numbers, we implement a generic function for generating samples and storing them in a mutable byte slice. This is done with the help of the num_traits crate, which provides all kinds of useful traits for abstracting over numeric types. In our case we only need the Float and NumCast traits.

fn process(
        data: &mut [u8],
        accumulator_ref: &mut f64,
        freq: u32,
        rate: u32,
        channels: u32,
        vol: f64,
    ) {
        use std::f64::consts::PI;

        // Reinterpret our byte-slice as a slice containing elements of the type
        // we're interested in. GStreamer requires for raw audio that the alignment
        // of memory is correct, so this will never ever fail unless there is an
        // actual bug elsewhere.
        let data = data.as_mut_slice_of::().unwrap();

        // Convert all our parameters to the target type for calculations
        let vol: F = NumCast::from(vol).unwrap();
        let freq = freq as f64;
        let rate = rate as f64;
        let two_pi = 2.0 * PI;

        // We're carrying a accumulator with up to 2pi around instead of working
        // on the sample offset. High sample offsets cause too much inaccuracy when
        // converted to floating point numbers and then iterated over in 1-steps
        let mut accumulator = *accumulator_ref;
        let step = two_pi * freq / rate;

        for chunk in data.chunks_mut(channels as usize) {
            let value = vol * F::sin(NumCast::from(accumulator).unwrap());
            for sample in chunk {
                *sample = value;
            }

            accumulator += step;
            if accumulator >= two_pi {
                accumulator -= two_pi;
            }
        }

        *accumulator_ref = accumulator;
    }

This function takes the mutable byte slice from our buffer as argument, as well as the current value of the accumulator and the relevant settings for generating the sine wave.

As a first step, we "cast" the byte slice to one of the target type (f32 or f64) with the help of the byte_slice_cast crate. This ensures that alignment and sizes are all matching and returns a mutable slice of our target type if successful. In case of GStreamer, the buffer alignment is guaranteed to be big enough for our types here and we allocate the buffer of a correct size later.

Now we convert all the parameters to the types we will use later, and store them together with the current accumulator value in local variables. Then we iterate over the whole floating point number slice in chunks with all channels, and fill each channel with the current value of our sine wave.

The sine wave itself is calculated by val = volume * sin(2 * PI * frequency * (i + accumulator) / rate), but we actually calculate it by simply increasing the accumulator by 2 * PI * frequency / rate for every sample instead of doing the multiplication for each sample. We also make sure that the accumulator always stays between 0 and 2 * PI to prevent any inaccuracies from floating point numbers to affect our produced samples.

Now that this is done, we need to implement the BaseSrc::create virtual method for actually allocating the buffer, setting timestamps and other metadata and it and calling our above function.

fn create(
        &self,
        element: &BaseSrc,
        _offset: u64,
        _length: u32,
    ) -> Result {
        // Keep a local copy of the values of all our properties at this very moment. This
        // ensures that the mutex is never locked for long and the application wouldn't
        // have to block until this function returns when getting/setting property values
        let settings = *self.settings.lock().unwrap();

        // Get a locked reference to our state, i.e. the input and output AudioInfo
        let mut state = self.state.lock().unwrap();
        let info = match state.info {
            None => {
                gst_element_error!(element, gst::CoreError::Negotiation, ["Have no caps yet"]);
                return Err(gst::FlowReturn::NotNegotiated);
            }
            Some(ref info) => info.clone(),
        };

        // If a stop position is set (from a seek), only produce samples up to that
        // point but at most samples_per_buffer samples per buffer
        let n_samples = if let Some(sample_stop) = state.sample_stop {
            if sample_stop <= state.sample_offset {
                gst_log!(self.cat, obj: element, "At EOS");
                return Err(gst::FlowReturn::Eos);
            }

            sample_stop - state.sample_offset
        } else {
            settings.samples_per_buffer as u64
        };

        // Allocate a new buffer of the required size, update the metadata with the
        // current timestamp and duration and then fill it according to the current
        // caps
        let mut buffer =
            gst::Buffer::with_size((n_samples as usize) * (info.bpf() as usize)).unwrap();
        {
            let buffer = buffer.get_mut().unwrap();

            // Calculate the current timestamp (PTS) and the next one,
            // and calculate the duration from the difference instead of
            // simply the number of samples to prevent rounding errors
            let pts = state
                .sample_offset
                .mul_div_floor(gst::SECOND_VAL, info.rate() as u64)
                .unwrap()
                .into();
            let next_pts: gst::ClockTime = (state.sample_offset + n_samples)
                .mul_div_floor(gst::SECOND_VAL, info.rate() as u64)
                .unwrap()
                .into();
            buffer.set_pts(pts);
            buffer.set_duration(next_pts - pts);

            // Map the buffer writable and create the actual samples
            let mut map = buffer.map_writable().unwrap();
            let data = map.as_mut_slice();

            if info.format() == gst_audio::AUDIO_FORMAT_F32 {
                Self::process::(
                    data,
                    &mut state.accumulator,
                    settings.freq,
                    info.rate(),
                    info.channels(),
                    settings.volume,
                );
            } else {
                Self::process::(
                    data,
                    &mut state.accumulator,
                    settings.freq,
                    info.rate(),
                    info.channels(),
                    settings.volume,
                );
            }
        }
        state.sample_offset += n_samples;
        drop(state);

        gst_debug!(self.cat, obj: element, "Produced buffer {:?}", buffer);

        Ok(buffer)
    }

Just like last time, we start with creating a copy of our properties (settings) and keeping a mutex guard of the internal state around. If the internal state has no AudioInfo yet, we error out. This would mean that no caps were negotiated yet, which is something we can't handle and is not really possible in our case.

Next we calculate how many samples we have to generate. If a sample stop position was set by a seek event, we have to generate samples up to at most that point. Otherwise we create at most the number of samples per buffer that were set via the property. Then we allocate a buffer of the corresponding size, with the help of the bpf field of the AudioInfo, and then set its metadata and fill the samples.

The metadata that is set is the timestamp (PTS), and the duration. The duration is calculated from the difference of the following buffer's timestamp and the current buffer's. By this we ensure that rounding errors are not causing the next buffer's timestamp to have a different timestamp than the sum of the current's and its duration. While this would not be much of a problem in GStreamer (inaccurate and jitterish timestamps are handled just fine), we can prevent it here and do so.

Afterwards we call our previously defined function on the writably mapped buffer and fill it with the sample values.

With all this, the element should already work just fine in any GStreamer-based application, for example gst-launch-1.0. Don't forget to set the GST_PLUGIN_PATH environment variable correctly like last time. Before running this, make sure to turn down the volume of your speakers/headphones a bit.

export GST_PLUGIN_PATH=`pwd`/target/debug
gst-launch-1.0 rssinesrc freq=440 volume=0.9 ! audioconvert ! autoaudiosink

You should hear a 440Hz sine wave now.

(Pseudo) Live Mode

Many audio (and video) sources can actually only produce data in real-time and data is produced according to some clock. So far our source element can produce data as fast as downstream is consuming data, but we optionally can change that. We simulate a live source here now by waiting on the pipeline clock, but with a real live source you would only ever be able to have the data in real-time without any need to wait on a clock. And usually that data is produced according to a different clock than the pipeline clock, in which case translation between the two clocks is needed but we ignore this aspect for now. For details check the GStreamer documentation.

For working in live mode, we have to add a few different parts in various places. First of all, we implement waiting on the clock in the create function.

fn create(...
        [...]
        state.sample_offset += n_samples;
        drop(state);

        // If we're live, we are waiting until the time of the last sample in our buffer has
        // arrived. This is the very reason why we have to report that much latency.
        // A real live-source would of course only allow us to have the data available after
        // that latency, e.g. when capturing from a microphone, and no waiting from our side
        // would be necessary..
        //
        // Waiting happens based on the pipeline clock, which means that a real live source
        // with its own clock would require various translations between the two clocks.
        // This is out of scope for the tutorial though.
        if element.is_live() {
            let clock = match element.get_clock() {
                None => return Ok(buffer),
                Some(clock) => clock,
            };

            let segment = element
                .get_segment()
                .downcast::()
                .unwrap();
            let base_time = element.get_base_time();
            let running_time = segment.to_running_time(buffer.get_pts() + buffer.get_duration());

            // The last sample's clock time is the base time of the element plus the
            // running time of the last sample
            let wait_until = running_time + base_time;
            if wait_until.is_none() {
                return Ok(buffer);
            }

            let id = clock.new_single_shot_id(wait_until).unwrap();

            gst_log!(
                self.cat,
                obj: element,
                "Waiting until {}, now {}",
                wait_until,
                clock.get_time()
            );
            let (res, jitter) = id.wait();
            gst_log!(
                self.cat,
                obj: element,
                "Waited res {:?} jitter {}",
                res,
                jitter
            );
        }

        gst_debug!(self.cat, obj: element, "Produced buffer {:?}", buffer);

        Ok(buffer)
    }

To be able to wait on the clock, we first of all need to calculate the clock time until when we want to wait. In our case that will be the clock time right after the end of the last sample in the buffer we just produced. Simply because you can't capture a sample before it was produced.

We calculate the running time from the PTS and duration of the buffer with the help of the currently configured segment and then add the base time of the element on this to get the clock time as result. Please check the GStreamer documentation for details, but in short the running time of a pipeline is the time since the start of the pipeline (or the last reset of the running time) and the running time of a buffer can be calculated from its PTS and the segment, which provides the information to translate between the two. The base time is the clock time when the pipeline went to the Playing state, so just an offset.

Next we wait and then return the buffer as before.

Now we also have to tell the base class that we're running in live mode now. This is done by calling set_live(true) on the base class before changing the element state from Ready to Paused. For this we override the Element::change_state virtual method.

impl ElementImpl for SineSrc {
    fn change_state(
        &self,
        element: &BaseSrc,
        transition: gst::StateChange,
    ) -> gst::StateChangeReturn {
        // Configure live'ness once here just before starting the source
        match transition {
            gst::StateChange::ReadyToPaused => {
                element.set_live(self.settings.lock().unwrap().is_live);
            }
            _ => (),
        }

        element.parent_change_state(transition)
    }
}

And as a last step, we also need to notify downstream elements about our latency. Live elements always have to report their latency so that synchronization can work correctly. As the clock time of each buffer is equal to the time when it was created, all buffers would otherwise arrive late in the sinks (they would appear as if they should've been played already at the time when they were created). So all the sinks will have to compensate for the latency that it took from capturing to the sink, and they have to do that in a coordinated way (otherwise audio and video would be out of sync if both have different latencies). For this the pipeline is querying each sink for the latency on its own branch, and then configures a global latency on all sinks according to that.

This querying is done with the LATENCY query, which we will now also have to handle.

fn query(&self, element: &BaseSrc, query: &mut gst::QueryRef) -> bool {
        use gst::QueryView;

        match query.view_mut() {
            // We only work in Push mode. In Pull mode, create() could be called with
            // arbitrary offsets and we would have to produce for that specific offset
            QueryView::Scheduling(ref mut q) => {
                [...]
            }
            // In Live mode we will have a latency equal to the number of samples in each buffer.
            // We can't output samples before they were produced, and the last sample of a buffer
            // is produced that much after the beginning, leading to this latency calculation
            QueryView::Latency(ref mut q) => {
                let settings = *self.settings.lock().unwrap();
                let state = self.state.lock().unwrap();

                if let Some(ref info) = state.info {
                    let latency = gst::SECOND
                        .mul_div_floor(settings.samples_per_buffer as u64, info.rate() as u64)
                        .unwrap();
                    gst_debug!(self.cat, obj: element, "Returning latency {}", latency);
                    q.set(settings.is_live, latency, gst::CLOCK_TIME_NONE);
                    return true;
                } else {
                    return false;
                }
            }
            _ => (),
        }
        BaseSrcBase::parent_query(element, query)
    }

The latency that we report is the duration of a single audio buffer, because we're simulating a real live source here. A real live source won't be able to output the buffer before the last sample of it is captured, and the difference between when the first and last sample were captured is exactly the latency that we add here. Other elements further downstream that introduce further latency would then add their own latency on top of this.

Inside the latency query we also signal that we are indeed a live source, and additionally how much buffering we can do (in our case, infinite) until data would be lost. The last part is important if e.g. the video branch has a higher latency, causing the audio sink to have to wait some additional time (so that audio and video stay in sync), which would then require the whole audio branch to buffer some data. As we have an artificial live source, we can always generate data for the next time but a real live source would only have a limited buffer and if no data is read and forwarded once that runs full, data would get lost.

You can test this again with e.g. gst-launch-1.0 by setting the is-live property to true. It should write in the output now that the pipeline is live.

In Mathieu's blog post this was implemented without explicit waiting and the usage of the get_times virtual method, but as this is only really useful for pseudo live sources like this one I decided to explain how waiting on the clock can be achieved correctly and even more important how that relates to the next section.

Unlocking

With the addition of the live mode, the create function is now blocking and waiting on the clock for some time. This is suboptimal as for example a (flushing) seek would have to wait now until the clock waiting is done, or when shutting down the application would have to wait.

To prevent this, all waiting/blocking in GStreamer streaming threads should be interruptible/cancellable when requested. And for example the ClockID that we got from the clock for waiting can be cancelled by calling unschedule() on it. We only have to do it from the right place and keep it accessible. The right place is the BaseSrc::unlock virtual method.

struct ClockWait {
    clock_id: Option,
    flushing: bool,
}

struct SineSrc {
    cat: gst::DebugCategory,
    settings: Mutex,
    state: Mutex,
    clock_wait: Mutex,
}

[...]

    fn unlock(&self, element: &BaseSrc) -> bool {
        // This should unblock the create() function ASAP, so we
        // just unschedule the clock it here, if any.
        gst_debug!(self.cat, obj: element, "Unlocking");
        let mut clock_wait = self.clock_wait.lock().unwrap();
        if let Some(clock_id) = clock_wait.clock_id.take() {
            clock_id.unschedule();
        }
        clock_wait.flushing = true;

        true
    }

We store the clock ID in our struct, together with a boolean to signal whether we're supposed to flush already or not. And then inside unlock unschedule the clock ID and set this boolean flag to true.

Once everything is unlocked, we need to reset things again so that data flow can happen in the future. This is done in the unlock_stop virtual method.

fn unlock_stop(&self, element: &BaseSrc) -> bool {
        // This signals that unlocking is done, so we can reset
        // all values again.
        gst_debug!(self.cat, obj: element, "Unlock stop");
        let mut clock_wait = self.clock_wait.lock().unwrap();
        clock_wait.flushing = false;

        true
    }

To make sure that this struct is always initialized correctly, we also call unlock from stop, and unlock_stop from start.

Now as a last step, we need to actually make use of the new struct we added around the code where we wait for the clock.

// Store the clock ID in our struct unless we're flushing anyway.
            // This allows to asynchronously cancel the waiting from unlock()
            // so that we immediately stop waiting on e.g. shutdown.
            let mut clock_wait = self.clock_wait.lock().unwrap();
            if clock_wait.flushing {
                gst_debug!(self.cat, obj: element, "Flushing");
                return Err(gst::FlowReturn::Flushing);
            }

            let id = clock.new_single_shot_id(wait_until).unwrap();
            clock_wait.clock_id = Some(id.clone());
            drop(clock_wait);

            gst_log!(
                self.cat,
                obj: element,
                "Waiting until {}, now {}",
                wait_until,
                clock.get_time()
            );
            let (res, jitter) = id.wait();
            gst_log!(
                self.cat,
                obj: element,
                "Waited res {:?} jitter {}",
                res,
                jitter
            );
            self.clock_wait.lock().unwrap().clock_id.take();

            // If the clock ID was unscheduled, unlock() was called
            // and we should return Flushing immediately.
            if res == gst::ClockReturn::Unscheduled {
                gst_debug!(self.cat, obj: element, "Flushing");
                return Err(gst::FlowReturn::Flushing);
            }

The important part in this code is that we first have to check if we are already supposed to unlock, before even starting to wait. Otherwise we would start waiting without anybody ever being able to unlock. Then we need to store the clock id in the struct and make sure to drop the mutex guard so that the unlock function can take it again for unscheduling the clock ID. And once waiting is done, we need to remove the clock id from the struct again and in case of ClockReturn::Unscheduled we directly return FlowReturn::Flushing instead of the error.

Similarly when using other blocking APIs it is important that they are woken up in a similar way when unlock is called. Otherwise the application developer's and thus user experience will be far from ideal.

Seeking

As a last feature we implement seeking on our source element. In our case that only means that we have to update the sample_offset and sample_stop fields accordingly, other sources might have to do more work than that.

Seeking is implemented in the BaseSrc::do_seek virtual method, and signalling whether we can actually seek in the is_seekable virtual method.

fn is_seekable(&self, _element: &BaseSrc) -> bool {
        true
    }

    fn do_seek(&self, element: &BaseSrc, segment: &mut gst::Segment) -> bool {
        // Handle seeking here. For Time and Default (sample offset) seeks we can
        // do something and have to update our sample offset and accumulator accordingly.
        //
        // Also we should remember the stop time (so we can stop at that point), and if
        // reverse playback is requested. These values will all be used during buffer creation
        // and for calculating the timestamps, etc.

        if segment.get_rate() < 0.0 {
            gst_error!(self.cat, obj: element, "Reverse playback not supported");
            return false;
        }

        let settings = *self.settings.lock().unwrap();
        let mut state = self.state.lock().unwrap();

        // We store sample_offset and sample_stop in nanoseconds if we
        // don't know any sample rate yet. It will be converted correctly
        // once a sample rate is known.
        let rate = match state.info {
            None => gst::SECOND_VAL,
            Some(ref info) => info.rate() as u64,
        };

        if let Some(segment) = segment.downcast_ref::() {
            use std::f64::consts::PI;

            let sample_offset = segment
                .get_start()
                .unwrap()
                .mul_div_floor(rate, gst::SECOND_VAL)
                .unwrap();

            let sample_stop = segment
                .get_stop()
                .map(|v| v.mul_div_floor(rate, gst::SECOND_VAL).unwrap());

            let accumulator =
                (sample_offset as f64).rem(2.0 * PI * (settings.freq as f64) / (rate as f64));

            gst_debug!(
                self.cat,
                obj: element,
                "Seeked to {}-{:?} (accum: {}) for segment {:?}",
                sample_offset,
                sample_stop,
                accumulator,
                segment
            );

            *state = State {
                info: state.info.clone(),
                sample_offset: sample_offset,
                sample_stop: sample_stop,
                accumulator: accumulator,
            };

            true
        } else if let Some(segment) = segment.downcast_ref::() {
            use std::f64::consts::PI;

            if state.info.is_none() {
                gst_error!(
                    self.cat,
                    obj: element,
                    "Can only seek in Default format if sample rate is known"
                );
                return false;
            }

            let sample_offset = segment.get_start().unwrap();
            let sample_stop = segment.get_stop().0;

            let accumulator =
                (sample_offset as f64).rem(2.0 * PI * (settings.freq as f64) / (rate as f64));

            gst_debug!(
                self.cat,
                obj: element,
                "Seeked to {}-{:?} (accum: {}) for segment {:?}",
                sample_offset,
                sample_stop,
                accumulator,
                segment
            );

            *state = State {
                info: state.info.clone(),
                sample_offset: sample_offset,
                sample_stop: sample_stop,
                accumulator: accumulator,
            };

            true
        } else {
            gst_error!(
                self.cat,
                obj: element,
                "Can't seek in format {:?}",
                segment.get_format()
            );

            false
        }
    }

Currently no support for reverse playback is implemented here, that is left as an exercise for the reader. So as a first step we check if the segment has a negative rate, in which case we just fail and return false.

Afterwards we again take a copy of the settings, keep a mutable mutex guard of our state and then start handling the actual seek.

If no caps are known yet, i.e. the AudioInfo is None, we assume a rate of 1 billion. That is, we just store the time in nanoseconds for now and let the set_caps function take care of that (which we already implemented accordingly) once the sample rate is known.

Then, if a Time seek is performed, we convert the segment start and stop position from time to sample offsets and save them. And then update the accumulator in a similar way as in the set_caps function. If a seek is in Default format (i.e. sample offsets for raw audio), we just have to store the values and update the accumulator but only do so if the sample rate is known already. A sample offset seek does not make any sense until the sample rate is known, so we just fail here to prevent unexpected surprises later.

Try the following pipeline for testing seeking. You should be able to seek the current time drawn over the video, and with the left/right cursor key you can seek. Also this shows that we create a quite nice sine wave.

gst-launch-1.0 rssinesrc ! audioconvert ! monoscope ! timeoverlay ! navseek ! glimagesink

And with that all features are implemented in our sine wave raw audio source.

21 Feb 2018 2:05pm GMT

Simos Xenitellis: How to repartition a Hetzner VPS disk for ZFS on its own partition for LXD

In that post, we saw how to set up LXD on a Hetzner VPS (Hetzner Cloud). The particular VPS comes with a fixed size partition, and we had to put the LXD containers in a loopback device file, not on its own partition. The reason is that once the VPS is booted up, you cannot …

Continue reading

21 Feb 2018 1:37am GMT

20 Feb 2018

feedPlanet Ubuntu

Stuart Langridge: Collecting user data while protecting user privacy

Lots of companies want to collect data about their users. This is a good thing, generally; being data-driven is important, and it's jolly hard to know where best to focus your efforts if you don't know what your people are like. However, this sort of data collection also gives people a sense of disquiet; what are you going to do with that data about me? How do I get you to stop using it? What conclusions are you drawing from it? I've spoken about this sense of disquiet in the past, and you can watch (or read) that talk for a lot more detail about how and why people don't like it.

So, what can we do about it? As I said, being data-driven is a good thing, and you can't be data-driven if you haven't got any data to be driven by. How do we enable people to collect data about you without compromising your privacy?

Well, there are some ways. Before I dive into them, though, a couple of brief asides: there are some people who believe that you shouldn't be allowed to collect any data on your users whatsoever; that the mere act of wanting to do so is in itself a compromise of privacy. This is not addressed to those people. What I want is a way that both sides can get what they want: companies and projects can be data-driven, and users don't get their privacy compromised. If what you want is that companies are banned from collecting anything… this is not for you. Most people are basically OK with the idea of data collection, they just don't want to be victimised by it, now or in the future, and it's that property that we want to protect.

Similarly, if you're a company who wants to know everything about each individual one of your users so you can sell that data for money, or exploit it on a user-by-user basis, this isn't for you either. Stop doing that.

Aggregation

The key point here is that, if you're collecting data about a load of users, you're usually doing so in order to look at it in aggregate; to draw conclusions about the general trends and the general distribution of your user base. And it's possible to do that data collection in ways that maintain the aggregate properties of it while making it hard or impossible for the company to use it to target individual users. That's what we want here: some way that the company can still draw correct conclusions from all the data when collected together, while preventing them from targeting individuals or knowing what a specific person said.

In the 1960s, Warner and Greenberg put together the randomised response technique for social science interviews. Basically, the idea here is that if you want to ask people questions about sensitive topics - have they committed a crime? what are their sexual preferences? - then you need to be able to draw aggregate conclusions about what percentages of people have done various things, but any one individual's ballot shouldn't be a confession that can be used against them. The technique varies a lot in exactly how it's applied, but the basic concept is that for any question, there's a random chance that the answerer should lie in their response. If some people lie in one direction (saying that they did a thing, when they didn't), and the same proportion of people lie in the other direction (saying they didn't do the thing when they did), then if you've got enough answerers, all the lies pretty much cancel out. So your aggregate statistics are still pretty much accurate - you know that X percent of people did the thing - but any one individual person's response isn't incriminating, because they might have been lying. This gives us the privacy protection we need for people, while preserving the aggregate properties that allow the survey-analysers to draw accurate conclusions.

It's something like whether you'll find a ticket inspector on a train. Train companies realised a long time ago that you don't need to put a ticket inspector on every single train. Instead, you can put inspectors on enough trains that the chance of fare-dodgers being caught is high enough that they don't want to take the risk. This randomised response is similar; if you get a ballot from someone saying that they smoked marijuana, then you can't know whether they were one of those who were randomly selected to lie about their answer, and therefore that answer isn't incriminating, but the overall percentage of people who say they smoked will be roughly equal to the percentage of people who actually did.

A worked example

Let's imagine you're, say, an operating system vendor. You'd like to know what sorts of machines your users are installing on (Ubuntu are looking to do this as most other OSes already do), and so how much RAM those machines have would be a useful figure to know. (Lots of other stats would also be useful, of course, but we'll just look at one for now while we're explaining the process. And remember this all applies to any statistic you want to collect; it's not particular to OS vendors, or RAM. If you want to know how often your users open your app, or what country they're in, this process works too.)

So, we assume that the actual truth about how much RAM the users' computers have looks something like this graph. Remember, the company does not know this. They want to know it, but they currently don't.

So, how can they collect data to know this graph, without being able to tell how much RAM any one specific user has?

As described above, the way to do this is to randomise the responses. Let's say that we tell 20% of users to lie about their answer, one category up or down. So if you've really got 8GB of RAM, then there's an 80% chance you tell the truth, and a 20% chance you lie; 10% of users lie in a "downwards" direction, so they claim to have 4GB of RAM when they've actually got 8GB, and 10% of users lie in an "upwards" direction and claim to have 16GB. Obviously, we wouldn't actually have the users lie - the software that collects this info would randomly either produce the correct information or not with the above probabilities, and people wouldn't even know it was doing it; the deliberately incorrect data is only provided to the survey. (Your computer doesn't lie to you about how much RAM it's got, just the company.) What does that do to the graph data?

We show in this graph the users that gave accurate information in green, and inaccurate lies in red. And the graph looks pretty much the same! Any one given user's answers are unreliable and can't be trusted, but the overall shape of the graph is pretty similar to the actual truth. There are still peaks at the most popular points, and still troughs at the unpopular ones. Each bar in the graph is reasonably accurate (accuracy figures are shown below each bar, and they'll normally be around 90-95%, although because it's random it may fluctuate a little for you.) So our company can draw conclusions from this data, and they'll be generally correct. They'll have to take those conclusions with a small pinch of salt, because we've deliberately introduced inaccuracy into them, but the trends and the overall shape of the data will be good.

The key point here is that, although you can see in the graph which answers are truth and which are incorrect, the company can't. They don't get told whether an answer is truth or lies; they just get the information and no indication of how true it is. They'll know the percentage chance that an answer is untrue, but they won't know whether any one given answer is.

Can we be more inaccurate? Well, here's a graph to play with. You can adjust what percentage of users' computers lie about their survey results by dragging the slider, and see what that does to the data.

0% 100%

20% of submissions are deliberately incorrect

Even if you make every single user lie about their values, the graph shape isn't too bad. Lying tends to "flatten out" the graph; it makes tall peaks less tall, and short troughs more tall, and every single person lying probably flattens out things so much that conclusions you draw are probably now going to be wrong. But you can see from this that it ought to be possible to run the numbers and come up with a "lie" percentage which accurately balances the company's need for accurate information with the user's need to not provide accuracy.

It is of course critical to this whole procedure that the lies cancel out, which means that they need to be evenly distributed. If everyone just makes up random answers then obviously this doesn't work; answers have to start with the truth and then (maybe) lie in one direction or another.

This is a fairly simple description of this whole process of introducing noise into the data, and data scientists would be able to bring much more learning to bear on this. For example, how much does it affect accuracy if user information can lie by more than one "step" in every direction? Do we make it so instead of n% truth and 100-n% lies, we distribute the lies normally across the graph with the centrepoint being the truth? Is it possible to do this data collection without flattening out the graph to such an extent? And the state of the data art has moved on since the 1960s, too: Dwork wrote an influential 2006 paper on differential privacy which goes into this in more detail. Obviously we'll be collecting data on more than one number - someone looking for data on computers on which their OS is installed will want for example version info, network connectivity, lots of hardware stats, device vendor, and so on. And that's OK, because it's safe to collect this data now… so how do our accuracy figures change when there are lots of stats and not just one? There will be better statistical ways to quantify how inaccurate the results are than my simple single-bar percentage measure, and how to tweak the percentage-of-lying to give the best results for everyone. This whole topic seems like something that data scientists in various communities could really get their teeth into and provide great suggestions and help to companies who want to collect data in a responsible way.

Of course, this applies to any data you want to collect. Do you want analytics on how often your users open your app? What times of day they do that? Which OS version they're on? How long do they spend using it? All your data still works in aggregate, but the things you're collecting aren't so personally invasive, because you don't know if a user's records are lies. This needs careful thought - there has been plenty of research on deanonymising data and similar things, and the EFF's Panopticlick project shows how a combination of data can be cross-referenced and that needs protecting against too, but that's what data science is for; to tune the parameters used here so that individual privacy isn't compromised while aggregate properties are preserved.

If a company is collecting info about you and they're not actually interested in tying your submitted records to you (see previous point about how this doesn't apply to companies who do want to do this, who are a whole different problem), then this in theory isn't needed. They don't have to collect IP addresses or usernames and record them against each submission, and indeed if they don't want that information then they probably don't do that. But there's always a concern: what if they're really doing that and lying about it? Well, this is how we alleviate that problem. Even if a company actually are trying to collect personally-identifiable data and they're lying to us about doing that it doesn't matter, because we protect ourselves by - with a specific probability - lying back to them. And then everyone gets what they want. There's a certain sense of justice in that.

20 Feb 2018 11:58pm GMT

Dustin Kirkland: RFC: The New Ubuntu 18.04 LTS Server Installer

One of the many excellent suggestions from last year's HackerNews thread, Ask HN: What do you want to see in Ubuntu 17.10?, was to refresh the Ubuntu server's command line installer:



We're pleased to introduce this new installer, which will be the default Server installer for 18.04 LTS, and solicit your feedback.

Follow the instructions below, to download the current daily image, and install it into a KVM. Alternatively, you could write it to a flash drive and install a physical machine, or try it in your virtual machine of your choice (VMware, VirtualBox, etc.).

$ wget http://cdimage.ubuntu.com/ubuntu-server/daily-live/current/bionic-live-server-amd64.iso
$ qemu-img create -f raw target.img 10G
$ kvm -m 1024 -boot d -cdrom bionic-live-server-amd64.iso -hda target.img
...
$ kvm -m 1024 target.img

For those too busy to try it themselves at the moment, I've taken a series of screenshots below, for your review.














Finally, you can provide feedback, bugs, patches, and feature requests against the Subiquity project in Launchpad:



Cheers,
Dustin

20 Feb 2018 8:32pm GMT

Benjamin Mako Hill: Lookalikes

Hippy/mako lookalikes

Did I forget a period of my life when I grew a horseshoe mustache and dreadlocks, walked around topless, and illustrated this 2009 article in the Economist on the economic boon that hippy festivals represent to rural American communities?


Previous lookalikes are here.

20 Feb 2018 7:45pm GMT

Raphaël Hertzog: Time to Join Extended Long Term Support for Debian 7 Wheezy

Debian 7 Wheezy LTS period ends on May 31st and some companies asked Freexian if they could get security support past this date. Since about half of the current team of paid LTS contributors is willing to continue to provide security updates for Wheezy, I have started to work on making this possible.

I just initiated a discussion on debian-devel with multiple Debian teams to see whether it is possible to continue to use debian.org infrastructure to host the wheezy security updates that would be prepared in this extended LTS period.

From the sponsor side, this extended LTS will not work like the regular LTS. It is unrealistic to continue to support all packages and all architectures so only the packages/architectures requested by sponsors will be supported. The amount invoiced to each sponsor will be directly related to the package list that they ask us to support. We made an estimation (based on history) of how much it costs to support each package and we split that cost between all the sponsors that are requesting support for this package. That cost is re-evaluated quarterly and will likely increase over time as sponsors are stopping their support (when they finished to migrate all their machines for example).

This extended LTS will also have some restrictions in terms of packages that we can support. For instance, we will no longer support the linux kernel from wheezy, you will have to switch to the kernel used in jessie (or maybe we will maintain a backport ourselves in wheezy). It is also not yet clear whether we can support OpenJDK since upstream support of version 7 stops at the end of June. And switching to OpenJDK 8 is likely non-trivial. There are likely other unsupportable packages too.

Anyway, if your company needs wheezy security support past end of May, now is the time to worry about it. Please send us a mail with the list of source packages that you would like to see supported. The more companies get involved, the less it will cost to each of them. Our plans are to gather the required data from interested companies in the next few weeks and make a first estimation of the price they will have to pay for the first quarter by mid-march. Then they confirm that they are OK with the offer and we will emit invoices in April so that they can be paid before end of May.

Note however that we decided that it would not be possible to get extended wheezy support if you are not among the regular LTS sponsors (at bronze level at least). Extended LTS would not be possible without the regular LTS so if you need the former, you have to support the latter too.

No comment | Liked this article? Click here. | My blog is Flattr-enabled.

20 Feb 2018 4:57pm GMT

Ubuntu Insights: LXD Weekly Status #35

Introduction

This past week we've been focusing on a number of open pull requests, getting closer to merging improvements to our storage volume handling, unix char/block devices handling and the massive clustering branch that's been cooking for a while.

We're hoping to see most of those land at some point this coming week.

On the LXC side of things, the focus was on bugfixes and cleanups as well as preparing for the removal of the python3 and lua bindings from the main repository. We're also making good progress on distrobuilder and hope to start moving some of our images to using it as the build tool very soon.

On the snap front, we've now added automatic testing for Ubuntu 17.10, Fedora 27 and Centos 7, all of which are now passing for all tracks and channels.

We've also been doing some work on our CI infrastructure, automating a large portion of it to limit the amount of manual interactions we have with Jenkins.

Upcoming conferences and events

Ongoing projects

The list below is feature or refactoring work which will span several weeks/months and can't be tied directly to a single Github issue or pull request.

Upstream changes

The items listed below are highlights of the work which happened upstream over the past week and which will be included in the next release.

LXD

LXC

LXCFS

Distribution work

This section is used to track the work done in downstream Linux distributions to ship the latest LXC, LXD and LXCFS as well as work to get various software to work properly inside containers.

Ubuntu

Snap

20 Feb 2018 2:02pm GMT

Laura Czajkowski: FOSDEM 2018 Community DevRoom Recap: Simon Phipps & Rich Sands

It's been a few weeks now since FOSDEM and if you didn't have a chance to attend or watch the livestream of the FOSDEM 2018 Community DevRoom, Leslie my co-chair, and I are doing a round up summary on posts on each of the talks to bring you the video and the highlights of each presentation. You can read the preview post of Rich Sands and Simon Phipps pre FOSDEM blog post here.

You've Got Some Explaining to Do! So Use An FAQ!

https://video.fosdem.org/2018/K.4.601/community_explaining_faq.webm

I found this talk to be very informative kicking off with a brief history lesson - It was truly amazing hearing the history of Java Opensource release from Simon Phipps and Rich Sands. Releasing Java under the GPL would not have been possible without @richsands writing an FAQ, which did have some ramification such as "Chief Open Source Officers hated us"!

It's really worth finding the time to watch the video of the talk however some highlights from the talk stood out to us as:

You can follow Simon and Rich on twitter and follow up with them on the topic!

You can read the follow up posts Leslie has written on Brian Proffitt and Jermery Garcia and her first post from Deb Nicholson & Mike McQuaid.

20 Feb 2018 1:30pm GMT

Daniel Pocock: Hacking at EPFL Toastmasters, Lausanne, tonight

As mentioned in my earlier blog, I give a talk about Hacking at the Toastmasters club at EPFL tonight. Please feel free to join us and remember to turn off your mobile device or leave it at home, you never know when it might ring or become part of a demonstration.

20 Feb 2018 11:39am GMT

Simos Xenitellis: How to run TeamViewer in LXD

TeamViewer is a popular remote desktop tool. It is the typical tool to use, in order to help remotely your colleagues that keep using Windows. You can install TeamViewer on Linux using these instructions on installing TeamViewer on Linux. In this post though, you will install TeamViewer in a LXD (pronounced LexDee) container on your …

Continue reading

20 Feb 2018 12:15am GMT

19 Feb 2018

feedPlanet Ubuntu

Raphaël Hertzog: Freexian’s report about Debian Long Term Support, January 2017

A Debian LTS logoLike each month, here comes a report about the work of paid contributors to Debian LTS.

Individual reports

In January, about 160 work hours have been dispatched among 11 paid contributors. Their reports are available:

Evolution of the situation

The number of sponsored hours increased slightly at 187 hours per month. It would be nice if the slow growth could continue as the amount of work seems to be slowly growing too.

The security tracker currently lists 23 packages with a known CVE and the dla-needed.txt file 23 too. The number of open issues seems to be stable compared to last month which is a good sign.

Thanks to our sponsors

New sponsors are in bold.

No comment | Liked this article? Click here. | My blog is Flattr-enabled.

19 Feb 2018 5:18pm GMT

Matthew Helmke: Article on Opensource.com

An article I wrote has just been posted on Opensource.com: How the Grateful Dead were a precursor to Creative Commons licensing.

19 Feb 2018 12:17pm GMT

Daniel Holbach: I’m joining Weaveworks

Weaveworks

My sabbatical is over and today is my first day working at Weaveworks where I'm joining the Developer Experience team. I'm incredibly excited about this.

I got to know quite a few of my colleagues in the past weeks and they were without exception all incredibly likeable and smart people. The company believes in open source, is quite diverse and has an office in Berlin - also I'll get to work with Cezzaine, Jonathan and Steve again.

Right from the start the technology really impressed me. Weave Cloud solves key problems many organisations and companies face today: being able to deploy services seamlessly, securely and easily and making monitoring and snapshots obvious and simple-to-use have an immediate impact on what you can do and what you spend your time on.

Here's what it looks like in action:

If you're on Google Cloud Platform, you can even use it for free or you can check out the tutorials to play around with it without having to install anything.

It's going to be great to immerse myself and learn more about the underlying technologies and connect with the Cloud Native communities. It's a big landscape with lots of activity and overlap, strong roots in the open source world and the passion to make modern workloads a more manageable problem.

At Weaveworks, I'll be able to work on what I like best: talk to and work with devs, figure out what people need, look at docs and tools, connect people and make people's lives easier.

One thing makes this experience even sweeter: I'll get to reconnect with a lot of you folks! If you're working in the space and I haven't talked to you in a while, hit me up and let's catch up soon again!

Alright, I need to start packing and off to the office for today… 🚲

19 Feb 2018 7:26am GMT

Bryan Quigley: Stop changing the clocks

Florida, Tennessee, the EU and more are considering one timezone for the entire year - no more changing the clocks. Massachusetts had a group study the issue and recommend making the switch, but only if a majority of Northeast states decide to join them. I would like to see the NJ legislature vote to join them.

Interaction between countries would be helped by having one less factor that can impact collaboration. Below are two examples of ways this will help.

Meeting Times

Let's consider a meeting scheduled in EST with partipants from NJ, the EU, and Arizona.
NJ - normal disruption of changing times, but the clock time for the meeting stays the same.
Arizona - The clock time for the meeting changes twice a year.
EU - because they also change their clocks at different points throughout the year. Due to this, they have 4 clock time changes during each year.

This gets more complicated as we add partipants from more countries. UTC can help, but any location that has a time change has to be considered for both of it's timezones.

Global shift work or On-call

Generally, these are scheduled in UTC, but the shifts people actually work are in their local time. That can be disruptive in other ways, like finding child care.

In conclusion, while these may be minor compared to other concerns (like the potential health effects associated with change the clocks), the concerns of global collaboration should also be considered.

19 Feb 2018 12:00am GMT