11 Mar 2010

feedPlanet OpenSolaris

Simon Phipps: OSI Logo

The OSI Logo§ I gather that the Board of Directors of the Open Source Initiative met on Sunday to elect the board for their 2010-11 financial year. I am both honoured and delighted to discover that they have elected me as a Director, with effect from April 1st.

That's an auspicious date :-) One or two friends have asked why on earth anyone would want to commit time to OSI. They point out that the OSI has had a much lower profile over the past few years as the more notable founding members have moved on. It has been criticised for allowing too many licenses to be approved - including some of questionable merit. It's easy to find people ready to criticise - some apparently just for the sake of it - and hard to find much more than grudging respect.

But the OSI still plays a very important and relevant role in the world of software freedom. Some points:

There's no question that the "open source" concept and brand remain powerful forces for positive change. That green logo carries weight.

I think it's time for change, first at OSI and then more widely. OSI needs to move from a "supreme court" model to a member-based model. I'd like to see activities promoting open source around the world both encouraged and represented by OSI - education, policy development and perhaps organisational support for open source projects. And if there was any way at all to be a more uniting force - well, I can dream!

My goal as a Director will be to facilitate that change, a change that is already well under way following recent face-to-face discussions and the great work that Andrew Oliver and danese Cooper have already put in. Expect to hear more on this subject as the year progresses.


11 Mar 2010 12:45pm GMT

Simon Phipps: webmink


11 Mar 2010 12:08pm GMT

Jim Grisanzio: A New MPL Coming Soon

It will be useful to observe the Mozilla community update their Mozilla Public License v1.1 (The Register, Mozilla Updating the MPL, Cnet News). I haven't been involved in any licensing lately, so I think I'll try to follow this process. I remember when we were plowing through our own licensing gyrations for OpenSolaris. GPL, BSD, MPL, Apache, Write-Your-Own. And a few others. Back and forth. My goodness, it was so long and painful for so many reasons. So, I'm interested in understanding Mozilla's processes and goals for this update and comparing that to what we did five years ago. For OpenSolaris, we settled on writing the Common Development and Distribution License, which is based on MPL v1.1.

Some CDDL information: FAQ on opensolaris.org, CDDL background, Redline diffs (pdf) between MPL v1.1 & CDDL v1.0, Free and Open Source Licensing at Sun (pdf).

11 Mar 2010 9:39am GMT

Clay Baenziger: How to use ISC DHCP for the OpenSolaris Automated Installer

DHCP, DHCP, DHCP everywhere!

Most everyone is used to using DHCP. It's used at coffee shops and wireless networks to acquaint traveling laptops with their DNS and router settings as well as, of course, to provide the machines an IP address too. However, corporate and enterprise use of DHCP is often reasonable too. One can use dynamic DNS updates to handle having a static reference for a machine traveling on various networks. When network migrations are necessary (i.e. say your Fortune 500 gets bought by another) and you need to move thousands of machines, it's much easier to simply tell your DHCP server to migrate the machines than have to log on to each and every machine and change network settings.

How does DHCP apply to the OpenSolaris Automated Installer?

The OpenSolaris Auto Installer uses DHCP to allow administrators to perform hands-off installations. The Auto Installer client (machine to be installed) receives its IP address, subnet mask, router, DNS server and boot image all though DHCP. The installadm(1M) tool which one uses to configure an Auto Installer server provides the commands for a Solaris DHCP server but below are the steps for an ISC DHCP server as is common on Linux and even Solaris shops which are simply more comfortable with ISC's DHCP implementation.

Where to get ISC DHCP for Solaris?

Software from ISC is usually very stable and well maintained to be easy to compile. However, Solaris seems to have changed a bit from the expectations of ISC DHCP. In my testing, on build 132, I found that ISC DHCP 4.0.0 from Sun Freeware, the ISC DHCP 4.1.0 from pkg.opensolaris.org/contrib and the ISC DHCP 4.1.1 off ISC's download page would all not respond to DHCPDISCOVER's on the wire (but it would report a DHCPOFFER in the logs still just to confuse things). I suspect a compilation issue I saw while compiling 4.1.1 (but I have no actual knowledge why responses are not getting on the wire):

ld: warning: symbol `MD5_version' has differing sizes:
        (file ../dst/libdst.a(md5_dgst.o) value=0x4; file /lib/libcrypto.so valu
e=0x76);
        ../dst/libdst.a(md5_dgst.o) definition taken

However, ISC's 4.2.0a1 worked flawlessly! You can get their 4.2.0 alpha from their download page and easily compile it with pfexec pkg install SUNWhea SUNWgcc; ./configure; make; make install.

Great, I have ISC DHCP compiled and installed, but how do I Configure this thing?

Normally the issue is not installation and compilation but configuration. For the OpenSolaris Auto Installer there are a number of things to think about:

Networking primitives (IP address, subnet, router, DNS)

Without some vital information, even fully functional networks seem useless. The use of an IP address, the subnet mask for the network and a router to get off the network all fit this bill. For most intents and purposes DNS is also in this same boat -- though some administrators may not find DNS as necessary. To get ISC DHCP to serve these pieces of information require certain directives.

A basic network

In one's dhcpd.conf a basic network is very easy to define, it looks like the following, here defining for the 192.168.0.0/24 network to serve out IP addresses between 2-100 and with a router of 192.168.0.1:

subnet 192.168.0.0 netmask 255.255.255.0 {
  range 192.168.0.2 192.168.0.100;
  option routers 192.168.0.1;
}

DNS

To add DNS information to one's dhcpd.conf is similarly easy. If one wants each subnet served to get different DNS info one may put the lines in the subnet block or else the directives can go at the beginning of the file and apply to all subnets served:

option domain-name "example.com";
option domain-name-servers 192.168.0.1;

Boot server and boot file

Now, for the Auto Installer specific pieces: to boot a machine, one needs a machine to download a boot file from and the name of such a boot file. These pieces of information will be given by the installadm create-service [...] command when setting up a Auto Installer service. For example:

Boot server IP (BootSrvA) : 192.168.0.1
Boot file      (BootFile) : install_test_ai_x86
GRUB Menu      (GrubMenu) : menu.lst.install_test_ai_x86

The boot server in ISC DHCP terms is the next-sever directive and the boot file is the filename directive. To make it simple, just add these to your subnet group:

subnet 192.168.0.0 netmask 255.255.255.0 {
  range 192.168.0.2 192.168.0.100;
  option routers 192.168.0.1;
  filename "install_test_ai_x86";
  next-server 192.168.0.1;
}

If you're using a SPARC Auto Installer service, you can use the same directives for the BootFile object. What about the GrubMenu entry you ask? Well, that's unnecessary and will be removed (see bug 7481) so do not worry about it; similarly, SPARC does not need a next-server (BootSvrA) directive.

What if you have both SPARC and X86 architectures?

If you have both SPARC and X86 machines in your Auto Installer environment then there are are some simple ways to define classes to provide your SPARC machines their correct boot-file and X86 machines their boot-file and boot-server. This allows one to have a default service for each architecture on the network.

A PXE boot class?

If you want a specific way to separate out your SPARC and X86 clients, then one uses ISC DHCP's class directive and applies all boot specific information there. To create a class for X86 hardware booting, then you can use the following class definition:

class "PXEBoot" {
  option dhcp-class-identifier "PXEClient";
  filename "install_test_ai_x86";
  next-server 192.168.0.1;
}

SPARC class

To define a class to match SPARC clients, one uses ISC DHCP's class directive and applies all boot specific information there. Note, you do not need a next-server directive for SPARC:

class "SPARC" {
  match if ( substring (option vendor-class-identifier, 0, 5) = "SUNW." ) and not
           ( option vendor-class-identifier = "SUNW.i86pc" );
  filename "http://192.168.0.1:5555/cgi-bin/wanboot-cgi";
}

Now, SPARC clients will request a lease and get SPARC specific information, while X86 clients will request and get information specific to an X86.

The entire dhcpd.conf looks like:

# option definitions common to all supported networks...
option domain-name "example.com";
option domain-name-servers 192.168.0.1;

default-lease-time 600;
max-lease-time 7200;

# If this DHCP server is the official DHCP server for the local
# network, the authoritative directive should be uncommented.
authoritative;

# Use this to send dhcp log messages to a different log file (you also
# have to hack syslog.conf to complete the redirection).
log-facility local7;

# This is an easy way to discriminate on SPARC clients
class "SPARC" {
  match if ( substring (option vendor-class-identifier, 0, 5) = "SUNW." ) and not
           ( option vendor-class-identifier = "SUNW.i86pc" );
  filename "http://192.168.0.1:5555/cgi-bin/wanboot-cgi";
}

# This is a class to discriminate on PXE booting X86 clients
class "PXEBoot" {
  option dhcp-class-identifier "PXEClient";
  filename "install_test_ai_x86";
  next-server 192.168.0.1;
}
        
# This is a very basic subnet declaration
subnet 192.168.0.0 netmask 255.255.255.0 {
  range 192.168.8.2 192.168.0.100;
  option routers 192.168.0.1;
}

11 Mar 2010 1:55am GMT

Clay Baenziger: Mercurial Learning

Why make folks do new things?

I realized that despite the importance for software development, many folks are largely unaware of the features and power their S(ource) C(ode) M(anagement) tool can provide them. In my group, we use Mercurial. It is an awesome SCM and perfect for our team (which is largely Python writing) as it is written in Python.

However, as with any tool under development, there are features which were not mature when we started using Mercurial, or otherwise, we have just never taken advantage of some features despite their obvious use.

For us, we have a release of OpenSolaris every six months roughly and to prevent stopping all non-release development, we will fork our source base into a release and development (non-release targeting) code base. The first time I did this, as the gatekeeper (what we call the team's build maintainer and release engineer) I simply created a second Mercurial repository -- this was for the OpenSolaris 0906 release. This repository mirrored our slim_source Mercurial repository. All was well. This approach illustrated three areas of uncertainty with developer understanding, however:

Further, this required the team to hold a dead code base around (the now dead slim_0906 repository for the similarly named release). If we want to know what changes made that release we must clone in and go through it trying to compare it to our main-line code base slim_source.

This was awesome to know, for as a gatekeeper one becomes far more intimate with the SCM than most developers need to be. But it also questioned the power lurking underneath the SCM. For Mercurial can address all these issues easily:

The idea

Given these issues and the potential fixes, I figured I should try and help the developer community using our code base learn these new tools. Further, trying to automate as much as I can, using Mercurial named branches was a key thing to keeping all of our source code in one repository for later reference and analysis. Though, it meant going beyond established OpenSolaris territory of Mercurial use as to my knowledge no other team is using named branches on hg.opensolaris.org yet.

Issues

Named branches caused some issues!

First, the OpenSolaris developer environment uses community-developed extensions called Cadmium. These are to mimic some tools which Sun's previous TeamWare software provided as the Solaris SCM of old (such as our recommit extension). Other features are simply to establish standard operating procedure for development (like webrev's as can be seen at cr.opensolaris.org). These Cadmium extensions were developed assuming no-one would be working on branched code bases.

Luckily, since we rely on Cadmium's recommit the IPS folks had talked about the issues of using branches before. Rich Lowe in particular already even had a potential fix for hg recommit ready for codereview! Putting this in place until it is accepted into the OpenSolaris SUNWonbld package simply required sending a heads-up to the slim_source developing community to install the updated Cadmium files. has integrated into build 134 and later (see the changeset log for build 134) a fix for bug 6922488 - cadmium recommit should be more relaxed about in-repository branching. To use the new code do nothing more than pkg image-update to build 134 or newer and your pkg:/developer/build/onbld package will be updated!

However, then as development continued, we found that there was an issue on our gate machine. The gate machine for the OS/Networking consolidation is different than hg.opensolaris.org so that the ON gatekeepers can configure their own Mercurial gate-hooks. On our gate machine hg.opensolaris.org we can not configure our own gate-hooks, and this caused us issue as the gate machine, of course, will accept one making or merging a branch in the source code base.

Mercurial, will of course, let you happily close a branch (e.g. hg commit --close-branch slim_1003). However, we found folks could do interesting things like:

Case 1

hg clone ssh://anon@hg.opensolaris.org/hg/caiman/slim_source
hg update -C 
[do changes]
hg branch -f 
hg commit
hg push

Case 2 (simply merge the branches)

hg clone ssh://anon@hg.opensolaris.org/hg/caiman/slim_source
hg update -C 
[do changes]
hg commit
[do more changes]
hg commit
hg recommit (with an un-patched hg recommit)
[hg recommit would say to merge all heads -- each branch has its own head!]
hg merge -r 
hg commit
hg push

These two cases, though undesirable to our community, would happily be accepted by hg.opensolaris.org into our slim_source repository and require lots of coordination to revert their affects. So the answer was to write a Mercurial hook. But this hook would be a client-side hook.

As such, with the help of Dave Marker we were able to craft a user-side hook to check if all pushes were continuing on the same branch or if someone is jumping branches. This hook needs to be added to an hgrc file in you path. For a build machine, this can be /etc/hgrc or for a user you can use your ~/.hgrc.

First, download the tar ball with the necessary hook and Python file. Next, modify your hgrc as below and you might have other lines in your [extensions] and [hooks] sections simply append these lines to their correct sections:

[extensions]
hook = ~/slim_hooks

[hooks]
pretxncommit.0 = python:hook.branch_change_chk.branch_change_chk

What about export and import?

Well, unintentionally, developers were exposed to hg export and hg import. As some pushes before the user-hook was available were unfortunately toxic, developers ended up having to learn to export their changes and import them into clean source pulls. Luckily this went very smoothly but a copy of the instructions are reproduced below for interest's sake:

Instructions to move your changes to a non-toxic workspace:


  1. First, determine which changeset you wish to preserve:

    Use hg log to see what changes you have made to the workspace. For example, here I would want changeset 716:7c6e6a587d85:
    [0] clayb@xsplat:hg log|less
    changeset: 716:7c6e6a587d85
    branch: slim_1003
    tag: tip
    user: Clay Baenziger 
    date: Fri Feb 05 15:32:44 2010 -0700
    description:
    Test
    
    changeset: 715:a836473e08d3
    branch: slim_1003
    user: Matt Keenan 
    date: Fri Feb 05 14:52:13 2010 +0000
    description:
    14358 Installer graphic needs changes
    
  2. Next, export your change:

    [0] clayb@xsplat:hg export 7c6e6a587d85 > /tmp/mychange
    
  3. Now, pull a clean workspace via:

    [0] clayb@xsplat:hg clone \
    ssh://anon at hg dot opensolaris dot org/hg/caiman/slim_source
    
  4. Next, select the branch you want to push to:

    Confirm it's clean:


    [0] clayb@xsplat:cd slim_source
    [0] clayb@xsplat:hg log --template="True\n" -r 5f70c8ce60b9|| \
    echo "False"
    abort: unknown revision '5f70c8ce60b9'!
    False
    

    For slim_1003 (blocker bugs):


    [0] clayb@xsplat:hg update -C slim_1003
    6 files updated, 0 files merged, 0 files removed, 0 files unresolved
    

    For default (future development):


    [0] clayb@xsplat:hg update -C default
    0 files updated, 0 files merged, 0 files removed, 0 files unresolved
    

    Add your change to the clean repo:

    [0] clayb@xsplat:hg import /tmp/mychange
    applying /tmp/mychange
    
  5. Confirm only your change is outgoing (and in the right branch):

    [0] clayb@xsplat:hg outgoing
    comparing with ssh://anon at hg dot opensolaris dot org/hg/caiman/slim_source
    searching for changes
    changeset: 715:b3867e9369ed
    branch: slim_1003
    tag: tip
    parent: 712:ec8ad8a8fb9c
    user: Clay Baenziger 
    date: Fri Feb 05 15:32:44 2010 -0700
    description:
    Test
    

11 Mar 2010 1:48am GMT

Clay Baenziger: How to Edit an Automated Installer Manifest with NetBeans

Editing XML

Editing XML is something system administrators seem to really hate; understandably, XML can employ a soup of standards! (Click to see Ken Sall's awesome image-map of XML related standards.) However, XML is awesome in providing a structured format for configuration and data files. In the OpenSolaris Automated Installer, XML is used as the manifest to select the system for install and how to install the system once selected. The Automated Installer uses a schema to verify the XML is acceptable to the installer engine and to provide a vocabulary of acceptable tags. However, if using a traditional text editor the process to author a manifest is rather painful since this data is not easily available to the author.

Using NetBeans for XML authoring

NetBeans, being a modern, integrated, developer environment provides a comprehensive XML authoring environment. One can provide a reference to an XML schema and then get: auto-completion of XML tags, the ability to validate the XML file under development against the schema and, if documented, tool-tips for each tag. However, the one issue for someone authoring Automated Installer manifests is NetBeans only supports XML Schema type schemas and the Automated Installer uses a RelaxNG schema. The answer is to convert the schema to XML Schema and for that, enter the Trang multi-format schema converter which can convert RelaxNG (.rng) to XML Schema (.xsd). Or download my annotated version below.

How to use schemas with NetBeans

To use NetBeans' context sensitive code completion, one needs an AI schema in XSD format. An annotated schema can be downloaded: AI Schema, Criteria Schema.

Lastly, you simply need to add a reference for NetBeans to find your schema. This can be done by using touch(1) to create a new file in the same directory as your two .xsd schemas (end the file with .xml). Then open the file in NetBeans and paste the following line in NetBeans to create an Automated Installer manifest:

<ai_manifest name="default"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="./ai_manifest_annotated.xsd">
</ai_manifest>

Now, if you begin typing a tag (start a tag with <) between the <ai_manifest> tags, you should get a pop-up window with context acceptable tags and documentation on each one. Further, you can go to the Run menu and use the Validate XML option to use NetBeans for help on any typos in the file.

You can do the same for criteria manifests using the provided schema for them as well. However, you will not be able to use the <ai_embedded_manifest> or <sc_embedded_manifest> tags, instead use the <ai_manifest> and <sc_manifest> tags to specify a file URI and keep your manifests in separate files for ease of manipulation.

Notice the context sensitive editor here. It shows that the <ai_http_proxy> tag has a url attribute and provides documentation on the <ai_http_proxy> tag. Lastly, check out the XML navigator provided on the right side showing the hierarchal breakdown of the document.
Screenshot of the NetBeans context sensitive editor.

What's it all look like in the end?

When you're done crafting your criteria and AI manifests. you should have two files; plus you will need a system configuration manifest based on the SMF group's DTD. It should look something like:

<ai_criteria_manifest
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:noNamespaceSchemaLocation="./criteria_schema_annotated.xsd">
    <ai_criteria name="MEM">
        <value>
        1024 2048
        </value>
    </ai_criteria>
    <ai_manifest_file URI="/tmp/ai_manifest.xml"/>
    <sc_manifest_file name="bogus" URI="/tmp/sc_manifest.xml"/>            
</ai_criteria_manifest>

They will all be linked in your criteria manifest using the <ai_manifest> and <sc_manifest> tags. Then to publish your manifest, simply run installadm add -m path to your criteria manifest -n your AI service name!

How to use Trang to get an XSD from the Automated Installer RelaxNG schemas

To create the above XSD schemas, I used an open source tool, Trang. Trang is capable of converting schemas from various schema languages. For example, it can deal with cross-converting:

To use Trang, one simply provides a schema's current type and the desired output type, and an input and output filename. For the AI Manifest Schema, one simply downloads Trang and runs, java -jar trang.jar -I RNG -O XSD /usr/share/auto_install/ai_manifest.rng ai_manifest.xsd.

To convert the AI Criteria Schema tickles one feature of RelaxNG which Trang does not yet support, however. The externalRef directive which simply nests one schema inside another is unsupported, however, it is easy to overcome. It is possible to modify the schema to remove this requirement and easily generate an XSD; the difference for removing support for "embedded" AI and SC manifests for the current schema revision are:

--- /usr/share/auto_install/criteria_schema.rng
+++ modified_criteria_schema.rng
@@ -60,7 +60,6 @@
        </start>
 
        <define name="nm_ai_manifest_src">
-               <choice>
                        <!--
                            Either embed or point to an A/I manifest
                        -->
@@ -69,14 +68,9 @@
                                        <data type="anyURI"/>
                                </attribute>
                        </element>
-                       <element name="ai_embedded_manifest">
-                               <externalRef href="ai_manifest.rng"/>
-                       </element>
-               </choice>
        </define>
 
        <define name="nm_sc_manifest_src">
-               <choice>
                        <!--
                            Either embed or point to an SC manifest
                        -->
@@ -88,17 +82,6 @@
                                        <data type="anyURI"/>
                                </attribute>
                        </element>
-                       <element name="sc_embedded_manifest">
-                               <attribute name="name">
-                                       <text/>
-                               </attribute>
-                               <!--
-                                   Note: the embedded manifest being DTD
-                                   based can not be verified and as such
-                                   should be included inside comment tags
-                               -->
-                       </element>
-               </choice>
        </define>
 
        <!--

11 Mar 2010 1:24am GMT

Bryan Cantrill: Turning the corner

It's a little hard to believe that it's been only fifteen months since we shipped our first product. It's been a hell of a ride; there is nothing as exhilarating nor as exhausting as having a newly developed product that is both intricate and wildly popular. Especially in the domain of enterprise storage -- where perfection is not just the standard but (entirely reasonably) the expectation -- this makes for some seriously spiked punch.

For my own part, I have had my head down for the last six months as the Technical Lead for our latest software release, 2010.Q1, which is publicly available as of today. In my experience, I have found that in software (if not in life), one may only ever pick two of quality, features and schedule -- and for 2010.Q1, we very much picked quality and features. (As for schedule, let it be only said that this release was once known as "2009.Q4"...)

2010.Q1 Quality

You don't often see enterprise storage vendors touting quality improvements for a very simple reason: if the product was perfect when you sold it to me, why are you talking about how much you've improved it? So I'm going to break a little bit with established tradition and acknowledge that the product has not been perfect, though not without good reason. With our initial development of the product, we were pushing many new technologies very aggressively: not only did we seek to build enterprise-grade storage on commodity components (a deceptively daunting challenge in its own right), we were also building on entirely new elements like flash -- and then topped it all off with an ambitious, from-scratch management stack. What were we possibly thinking by making so many bets at once? We made these bets not out of recklessness, but rather because they were essential elements of our Big Bet: that customers were sick of paying monopoly rents for enterprise storage, and that we could deliver a quantum leap in price-performance. (And if nothing else, let it be said that we got that one very, very right -- seemingly too right, at times.) As for the specific technology bets, some have proven to be unblemished winners, while others have been more of a struggle. Sometimes the struggle was because the problem was hard, sometimes it was because the software was immature, and sometimes it was because a component that was assumed to have known failure modes had several (or many) unanticipated (or byzantine) failure modes. And in the worst cases, of course, it was all three...

I'm pleased to report that in 2010.Q1, we turned the corner on all fronts: in addition to just fixing a boatload of bugs in key areas like clustering and networking, we engaged in fundamental work like Dave's rearchitecture of remote replication, adapted to new device failure modes as with Greg's rearchitecture around resilience to HBA logic failure, and -- perhaps most importantly -- integrated critical firmware upgrades to each of the essential components of the I/O path (HBAs, SIM cards and disks). Also in 2010.Q1, we changed the way the way that we run the evaluation of the software, opening the door to many in our rapidly growing customer base. As a result, this release is already running on more customer production systems than any of its predecessors were at the time that they shipped -- and on many more eval and production machines within our own walls.

2010.Q1 Features

But as important as quality is to this release, it's not the full story: the release is also packed with major features like deduplication, iSER/SRP support, Kerberized NFS support and Fibre Channel support. Of these, the last is of particular interest to me because, in addition to my role as the Technical Lead for 2010.Q1, I was also responsible for the integration of FC support into the product. There was a lot of hard work here, but much of it was born by John Forte and his COMSTAR team, who did a terrific job not only on the SCSI Target Management facility (STMF) but also on the base ALUA support necessary to allow proper FC operation in a cluster. As for my role, it was fun to cut the code to make all of this stuff work. Thanks to some great design work by Todd Patrick, along with some helpful feedback from field-facing colleagues like Ryan Matthews, I think we came up with a clean, functional interface. And working closely with both John and our test team, we have developed a rock-solid FC product. But of course (and as one might imagine), for me personally, the really gratifying bit was adding FC support to analytics. With just a pinch of DTrace and a bit of glue code, we now have visibility into FC operations by LUN, by project, by target, by initiator, by operation, by SCSI command, by size, by offset and by latency -- and by any combination thereof.

As I was developing FC analytics, I would use as my source of load a silly disk benchmark I wrote back in the day when Adam and I were evaluating SSDs. Here for example, is that benchmark running against a LUN that I named "thicktail-bench":


The initiator here is the machine "thicktail"; it's interesting to break down by initiator and see the paths by which thicktail is accessing the LUN:


(These names are human readable because I have added aliases for each of thicktail's two HBA ports. Had I not added those aliases, we would see WWNs here.) The above shows us that thicktail is accessing the LUN through both of its paths, which is what we would expect (but good to visually confirm). Let's see how it's accessing the LUN in terms of operations:


Nothing too surprising here -- this is the write phase of the benchmark and we have no log devices on this system, so we fully expect this. But let's break down by offset:


The first time I saw this, I was surprised. Not because of what it shows -- I wrote this benchmark, and I know what it does -- but rather because it was so eye-popping to really see its behavior for the first time. In particular, this captures an odd phase I added to this benchmark: it does random writes across an increasing large range. I did this because we had discovered that some SSDs did fine when the writes were confined to a small logical region, but broke down -- badly -- when the writes were over a larger reason. And no, I don't know why this was the case (presumably the firmware was in fragmented/wear-leveling/cache-busting hell); all I know is that we rejected any further exploration once the writes to the SSD were of a higher latency than that of my first hard drive: the IBM PC XT's 10 MB ST-412, which had roughly 95 ms writes! (We felt that expecting an SSD to have better write latency than a hard drive from the first Reagan Administration was tough but fair...)

What now?

As part of our ongoing maturity as a product, we have developed a new role here at Fishworks: starting in 2010.Q1, the Technical Lead for the release will, as the release ships, transition to become the full-time Support Lead for that release in the field. This means many things for the way we support the product, but for our customers, it means that if and when you do have an issue on 2010.Q1, you should know that the buck on your support call will ultimately stop with me. We are establishing an unprecedented level of engineering integration with our support teams, and we believe that it will show in the support experience. So welcome to 2010.Q1 -- and happy upgrading!

11 Mar 2010 1:09am GMT

Eric Schrock: Multiple pools in 2010.Q1

When the Sun Storage 7000 was first introduced, a key design decision was to allow only a single ZFS storage pool per host. This forces users to fully take advantage of the ZFS pool storage model, and prevents them from adopting ludicrous schemes such as "one pool per filesystem." While RAID-Z has non-trivial performance implications for IOPs-bound workloads, the hope was that by allowing logzilla and readzilla devices to be configured per-filesystem, users could adjust relative performance and implement different qualities of service on a single pool.

While this works for the majority of workloads, there are still some that benefit from mirrored performance even in the presence of cache and log devices. As the maximum size of Sun Storage 7000 systems increases, it became apparent that we needed a way to allow pools with different RAS and performance characteristics in the same system. With this in mind, we relaxed the "one pool per system" rule1 with the 2010.Q1 release.


The storage configuration user experience is relatively unchanged. Instead of having a single pool (or two pools in a cluster), and being able to configure one or the other, you can simply click the '+' button and add pools as needed. When creating a pool, you can now specify a name for the pool. When importing a pool, you can either accept the existing name or give it a new one at the time you select the pool. Ownership of pools in a cluster is now managed exclusively through the Configuration -> Cluster screen, as with other shared resources.

When managing shares, there is a new dropdown menu at the top left of the navigation bar. This controls which shares are shown in the UI. In the CLI, the equivalent setting is the 'pool' property at the 'shares' node.

While this gives some flexibility in storage configuration, it also allows users to create poorly constructed storage topologies. The intent is to allow the user to create pools with different RAS and performance characteristics, not to create dozens of different pools with the same properties. If you attempt to do this, the UI will present a warning summarizing the drawbacks if you were to continue:

You can still commit the operation, but such configurations are discouraged. The exception is when configuring a second pool on one head in a cluster.

We hope this feature will allow users to continue to consolidate storage and expand use of the Sun Storage 7000 series in more complicated environments.


  1. Clever users figured out that this mechanism could be circumvented in a cluster to have two pools active on the same host in an active/passive configuration.

11 Mar 2010 12:58am GMT

10 Mar 2010

feedPlanet OpenSolaris

Jim Grisanzio: Moscow OpenSolaris User Group Meeting

I am looking forward to doing a quick video conference call next week with the guys in the Moscow OpenSolaris User Group. It will be at 2:30 in the morning for me, so I think I will make it a quick chat. Meeting details here. MOSUG info here. Special thans to Vladimir Legeza for the invite. Hopefully, some day I will be able to get to Russia. Never been. Always wanted to go.

10 Mar 2010 4:53pm GMT

Simon Phipps: webmink


10 Mar 2010 12:07pm GMT

Clay Baenziger: Network Interactions of a Net Booted X86 AI Client

What all does an X86 do while net booting and installing?

I often get asked how the OpenSolaris Automated Installer works. The big question is how all the pieces tie together. To help answer these questions I have drafted a few UML sequence diagrams showing the boot process of an X86 type machine net booting and installing via the Automated Installer.

PXE running DHCP PXE running TFTP GRUB live-fs-root manifest-locator script auto-installer

AI Boot Flow Overview

(You can click down the client timeline for relevant links to source code and Wikipedia descriptions of the protocols.)
UML sequence diagram showing AI boot process -- image map links to relevant code and protocol Wikipedia entries.

Bug with of criteria selection manifest_xml() in webserver.py manifest_html() in webserver.py

AI Manifest Selection Flow Detail

(Click the note for a link to the relevant code implementing the AI webserver (which usually runs on http://<ai_server>:46501, http://<ai_server>:46502, etc.) manifest.html page or the note on manifest criteria for Bug 9106 which documents some issues with criteria processing at this time. Otherwise, the rest of the image links to the AI webserver's code implementing manifest.xml.)
UML sequence diagram showing AI webserver communication process for downloading an AI manifest

10 Mar 2010 1:16am GMT

09 Mar 2010

feedPlanet OpenSolaris

Simon Phipps: OpenSolaris Logo

OpenSolaris Logo§ I am standing in the election for the OpenSolaris Governing Board one last time (this would be my third consecutive term if elected, so it has to be the last time). Each term has been quite different to the others, and I have no doubt this next year will be very different again for the OpenSolaris community.

Since I no longer work at Sun, I'd like to make clear what my "platform" is in this election in addition to my candidate statement.

I encourage you to go vote right now if you are eligible to do so, and most especially to ratify the new OpenSolaris Constitution which I believe is essential if the new OGB is to be able to focus on anything other than bureacracy.


09 Mar 2010 7:32pm GMT

James Dickens: One more ZFS video

Dynamic LUN expansion

http://blogs.sun.com/video/entry/zfs_dynamic_lun_expansion

09 Mar 2010 2:33pm GMT

James Dickens: Cool ZFS Dedup video

Check out this vide by George Wilson of the ZFS team.

http://blogs.sun.com/video/entry/zfs_dedup

09 Mar 2010 2:31pm GMT

Simon Phipps: webmink

While the social contract behind copyright has merit (creating a protected 'space' where a copyright creation can be monetised in exchange for its dedication to the public domain), the digital age has driven a switch from a control-centric ('hub-and-spoke') society to an emergiunbg peer-to-peer society. There are no 'fixes' we can do to copyright to make it work right; we need to start again and invent a copyright for the digitial age.

The Digital Economy Bill is well intentioned, but there are no fixes available to make analogue copyright law work for a digital society and I fear the Bill will just make things worse, unleashing a "sorcerer's apprentice" effect of unintended consequences the way the US DMCA has done. The Bill has to be stopped, not patched.

(tags: Debill DigitalEconomyBill UK Policy Legislation ACTA DMCA Copyright Internet Privacy)
The history of MySQL AB
A great potted history from Dries Buytaert, timeline-style.
(tags: MySQL History FOSS OpenSource Database)

09 Mar 2010 12:08pm GMT

Jim Grisanzio: OpenSolaris Community Growth in Japan

The Japanese OpenSolaris community continues to grow. It's now the 3rd largest community in the OpenSolaris world following the Spanish and Indian communities, it's the 3rd most active, and Tokyo is the #1 city outside the United States for sending traffic to opensolaris.org. The community in Japan also continues to diversify as well with general users mixing with kernel developers and globalization engineers. In fact, this diversity is driving the need to run concurrent sessions for beginners and advanced developers and users at community events.

There are multiple parts to the community in Japan:

There is a lot going on. I try to track what I can at this tag.

09 Mar 2010 7:54am GMT