19 Nov 2010

feedPlanet CherryPy

Robert Brewer: logging.statistics

Statistics about program operation are an invaluable monitoring and debugging tool. How many requests are being handled per second, how much of various resources are in use, how long we've been up. Unfortunately, the gathering and reporting of these critical values is usually ad-hoc. It would be nice if we had 1) a centralized place for gathering statistical performance data, 2) a system for extrapolating that data into more useful information, and 3) a method of serving that information to both human investigators and monitoring software. I've got a proposal. Let's examine each of those points in more detail.

Data Gathering

Just as Python's logging module provides a common importable for gathering and sending messages, statistics need a similar mechanism, and one that does not require each package which wishes to collect stats to import a third-party module. Therefore, we choose to re-use the logging module by adding a statistics object to it.

That logging.statistics object is a nested dict:

import logging
if not hasattr(logging, 'statistics'): logging.statistics = {}

It is not a custom class, because that would 1) require apps to import a third-party module in order to participate, 2) inhibit innovation in extrapolation approaches and in reporting tools, and 3) be slow. There are, however, some specifications regarding the structure of the dict.

   +----"SQLAlchemy": {
   |        "Inserts": 4389745,
   |        "Inserts per Second":
   |            lambda s: s["Inserts"] / (time() - s["Start"]),
   |  C +---"Table Statistics": {
   |  o |        "widgets": {-----------+
 N |  l |            "Rows": 1.3M,      | Record
 a |  l |            "Inserts": 400,    |
 m |  e |        },---------------------+
 e |  c |        "froobles": {
 s |  t |            "Rows": 7845,
 p |  i |            "Inserts": 0,
 a |  o |        },
 c |  n +---},
 e |        "Slow Queries":
   |            [{"Query": "SELECT * FROM widgets;",
   |              "Processing Time": 47.840923343,
   |              },
   |             ],

The logging.statistics dict has strictly 4 levels. The topmost level is nothing more than a set of names to introduce modularity. If SQLAlchemy wanted to participate, it might populate the item logging.statistics['SQLAlchemy'], whose value would be a second-layer dict we call a "namespace". Namespaces help multiple emitters to avoid collisions over key names, and make reports easier to read, to boot. The maintainers of SQLAlchemy should feel free to use more than one namespace if needed (such as 'SQLAlchemy ORM').

Each namespace, then, is a dict of named statistical values, such as 'Requests/sec' or 'Uptime'. You should choose names which will look good on a report: spaces and capitalization are just fine.

In addition to scalars, values in a namespace MAY be a (third-layer) dict, or a list, called a "collection". For example, the CherryPy StatsTool keeps track of what each worker thread is doing (or has most recently done) in a 'Worker Threads' collection, where each key is a thread ID; each value in the subdict MUST be a fourth dict (whew!) of statistical data about
each thread. We call each subdict in the collection a "record". Similarly, the StatsTool also keeps a list of slow queries, where each record contains data about each slow query, in order.

Values in a namespace or record may also be functions, which brings us to:


def extrapolate_statistics(scope):
    """Return an extrapolated copy of the given scope."""
    c = {}
    for k, v in scope.items():
        if isinstance(v, dict):
            v = extrapolate_statistics(v)
        elif isinstance(v, (list, tuple)):
            v = [extrapolate_statistics(record) for record in v]
        elif callable(v):
            v = v(scope)
        c[k] = v
    return c

The collection of statistical data needs to be fast, as close to unnoticeable as possible to the host program. That requires us to minimize I/O, for example, but in Python it also means we need to minimize function calls. So when you are designing your namespace and record values, try to insert the most basic scalar values you already have on hand.

When it comes time to report on the gathered data, however, we usually have much more freedom in what we can calculate. Therefore, whenever reporting tools fetch the contents of logging.statistics for reporting, they first call extrapolate_statistics (passing the whole statistics dict as the only argument). This makes a deep copy of the statistics dict so that the reporting tool can both iterate over it and even change it without harming the original. But it also expands any functions in the dict by calling them. For example, you might have a 'Current Time' entry in the namespace with the value "lambda scope: time.time()". The "scope" parameter is the current namespace dict (or record, if we're currently expanding one of those instead), allowing you access to existing static entries. If you're truly evil, you can even modify more than one entry at a time.

However, don't try to calculate an entry and then use its value in further extrapolations; the order in which the functions are called is not guaranteed. This can lead to a certain amount of duplicated work (or a redesign of your schema), but that's better than complicating the spec.

After the whole thing has been extrapolated, it's time for:


A reporting tool would grab the logging.statistics dict, extrapolate it all, and then transform it to (for example) HTML for easy viewing, or JSON for processing by Nagios etc (and because JSON will be a popular output format, you should seriously consider using Python's time module for datetimes and arithmetic, not the datetime module). Each namespace might get its own header and attribute table, plus an extra table for each collection. This is NOT part of the statistics specification; other tools can format how they like.

Turning Collection Off

It is recommended each namespace have an "Enabled" item which, if False, stops collection (but not reporting) of statistical data. Applications SHOULD provide controls to pause and resume collection by setting these entries to False or True, if present.


    import logging
    # Initialize the repository
    if not hasattr(logging, 'statistics'): logging.statistics = {}
    # Initialize my namespace
    mystats = logging.statistics.setdefault('My Stuff', {})
    # Initialize my namespace's scalars and collections
        'Enabled': True,
        'Start Time': time.time(),
        'Important Events': 0,
        'Events/Second': lambda s: (
            (s['Important Events'] / (time.time() - s['Start Time']))),
    for event in events:
        # Collect stats
        if mystats.get('Enabled', False):
            mystats['Important Events'] += 1

Original post blogged on b2evolution.

19 Nov 2010 7:08am GMT

12 Nov 2010

feedPlanet CherryPy

Kevin Dangoor: Paver is now on GitHub, thanks to Almad

Paver, the project scripting tool for Python, has just moved to GitHub thanks to Almad. Almad has stepped forward and offered to properly bring Paver into the second decade of the 21st century (doesn't have the same ring to it as bringing something into the 21st century, does it? :)

Seriously, though, Paver reached the point where it was good enough for me and did what I wanted (and, apparently, a good number of other people wanted as well). Almad has some thoughts and where the project should go next and I'm looking forward to hearing more about them. Sign up for the googlegroup to see where Paver is going next.

12 Nov 2010 3:11am GMT

09 Nov 2010

feedPlanet CherryPy

Kevin Dangoor: Paver: project that works, has users, needs a leader

Paver is a Python project scripting tool that I initially created in 2007 to automate a whole bunch of tasks around projects that I was working on. It knows about setuptools and distutils, it has some ideas on handling documentation with example code. It also has users who occasionally like to send in patches. The latest release has had more than 3700 downloads on PyPI.

Paver hasn't needed a lot of work, because it does what it says on the tin: helps you automate project tasks. Sure, there's always more that one could do. But, there isn't more that's required for it to be a useful tool, day-to-day.

Here's the point of my post: Paver is in danger of being abandoned. At this point, everything significant that I am doing is in JavaScript, not Python. The email and patch traffic is low, but it's still too much for someone that's not even actively using the tool any more.

If you're a Paver user and either:

1. want to take the project in fanciful new directions or,

2. want to keep the project humming along with a new .x release every now and then

please let me know.

09 Nov 2010 7:44pm GMT

26 Aug 2010

feedPlanet CherryPy

Kevin Dangoor: MichiPUG’s 5th Birthday!

The Michigan Python Users Group turns 5 with the September meeting. Woohoo! Come and celebrate with food and lightning talks!

Thursday, September 2nd at 7pm at SRT Solutions.

RSVP today to save a spot! While you're at it, pitch in with a lightning talk of interest to Python users.

26 Aug 2010 6:25pm GMT

12 Mar 2010

feedPlanet CherryPy

Robert Brewer: Zen of CherryPy video

My PyCon 2010 talk video is up. Enjoy: The Zen of CherryPy

Original post blogged on b2evolution.

12 Mar 2010 2:29am GMT

02 Sep 2009

feedPlanet CherryPy

Kevin Dangoor: 4th anniversary MichiPUG meeting tomorrow!

I kicked off the Michigan Python Users Group (MichiPUG) in September 2005, so this month's meeting marks 4 years since the group began!

This month's meeting is going to be one of our "topic free" meetings. Despite the lack of a topic, we never have trouble finding Python things to discuss. If you're going to be around Ann Arbor Thursday evening and have a burning Python question, do stop in!

The meeting will be at 7pm at SRT Solutions in downtown Ann Arbor. Parking is free and easy next to City Hall a couple blocks north (on Ann St.).

This month, I am stepping aside as the de facto leader of the group. While I am still a Python fan and heavy user, my interests have branched out enough that I plan to devote my rather limited "user group time" elsewhere. Stay tuned for more on that soon. Mark Ramm will be taking over my duties as "the guy who sends the monthly 'what's our topic?' email message".

I'm almost certainly going to be arriving late to tomorrow's meeting, but I do hope to catch up with folks for drinks afterwards at the very least! See you there!

02 Sep 2009 5:38pm GMT

28 Aug 2009

feedPlanet CherryPy

Kevin Dangoor: Python packaging/install: what I want

Python packaging and deployment can be annoying. It's been nearly 4 years since I released the first TurboGears release as an early adopter of setuptools/easy_install. Since then, there's been the release of virtualenv, pip and zc.buildout. Somehow, it still seems like more trouble than it should be to get development and production environments set up.

On Bespin, I've been using a combination of virtualenv and pip (scripted with Paver) in development and production environments. But, I've found pip -freeze to be nearly unusable.

My Ideal World

After monkeying with this stuff a fair bit over the past few years, I have an idea of what I'd really like to have but I don't think anyone's working on it. I'd love to hear contrasting opinions or learn about projects that I'm not aware of.

pip is close to being usable, except freeze doesn't work. zc.buildout is close to being usable, too. I think there's a "freeze" like plugin for it, but I don't know how well it works. I don't like zc.buildout quite as much as virtualenv, and I see that people even use virtualenv+zc.buildout to eliminate site-packages from leaking in. I also find that it leaves tons of old packages around in every buildout, again with no way to manage them.

What I've found using both zc.buildout and pip is that they are slow and annoying, because they're constantly reinstalling things that I already have. The main reason for having a shared site-packages as I suggest above is not to save on disk space, but to save on time. In development, I want to be able to update to the latest versions of packages quickly, installing/building only the ones that have changed. How fast something runs changes how you use it, and I know that the scripts that I have for updating development and production environments reflect that.

So,I think the main thing that I'm looking for is a new tool to manage the packages that I have installed globally and within virtualenvs. Are there tools out there that are heading down this path at all?

Also, I understand the starting point that Tarek is taking with Distribute (splitting it up into logical pieces), but is there any roadmap for where it's going to go functionally from there? Or is the intention purely that tools like the one I'm angling for will be written against the newly refactored libraries? I do know about the uninstall PEP, and that's pleasing.

28 Aug 2009 6:48pm GMT

15 Aug 2009

feedPlanet CherryPy

Kevin Dangoor: One Python-based version control system to rule them all!

We've just released Bespin 0.4, the major new feature of which is the first bit of the collaboration feature. Bespin 0.4 includes a ton of other changes, including one that I'm going to focus on here: Subversion support, which Gordon P. Hemsley kicked off for us a few weeks back.

Bespin's initial version control support showed up in 0.2 with support for Mercurial. Knowing that we wanted to support multiple version control systems (VCS), I took an unorthodox approach from the beginning. Rather than providing the "hg" commands that people know and love, I created a separate set of "vcs" commands. Ultimately, we want to make it easy to grab a random open source project of the net and start hacking on it. Using the "vcs" commands, for the most common version control operations you won't even have to think about which VCS is used by a given project.

I can run "vcs clone" ("vcs checkout" also works) to check out Bespin (in a Mercurial repository), Paver (in a Subversion repo) and hopefully soon Narwhal (in a Git repo). Also new in Bespin 0.4: Bespin's command line has been tricked out to be able to have fancier interactions with commands, so you can enter all of the extra information that Bespin needs for checking out a repository right in the output area.

If you've used Subversion and one of the Distributed VCSes, you'll know that they have a different model. The DVCSes do almost everything in a local repository copy and only talk to a remote server for push/pull. That's actually true of Subversion as well, with one notable exception: commit. For Subversion, the "vcs commit" command will simply save your commit message for later. When you run "vcs push", that is when an actual "svn commit" operation is run.

What's neat about the "vcs" commands is that they operate the same from VCS to VCS. svn doesn't have a feature to "add all files that are unknown", whereas Mercurial does. "vcs add -a" operates the same on both systems.

If you're interested, you can also use these commands on the command line by installing the Über Version Controller (uvc) Python package. After doing so, you can head into a random Subversion or Mercurial working copy and type "uvc status" to see what's different. I will note that the command line tool has been, um, lightly tested since uvc is mostly used as a library for Bespin at this point.

One final note: Bespin will soon also support native "svn" and "hg" commands so that you can stick to commands and options you're familiar with or for performing more complex operations that don't have equivalent "vcs" commands.

You can learn more about version control in Bespin from this section of the User Guide.

15 Aug 2009 2:35am GMT

20 Jul 2009

feedPlanet CherryPy

Uche and Chimezie Ogbuji: Ngozi Isabella (Chioma) Ogbuji

Initial 'portrait' shot of the new baby!

Ngozi(chukwu.nyere) means the blessing that God (or heaven/celestial, by my interpretation) gave. My name (Chimezie) is an Igbo invokation for "(May) God resolve". My interpretation substitutes the 'abstract' notion of the celestial for the word God. My name was an invokation by my father to resolve things and after everything that has transpired, Ngozi is the gift or blessing that may bring about this resolution. My dad prefers to call her Chioma. It means 'good god.' Again, substituting the idea of the celetrial (as a stark contrast to phenomena of the 'terrestrial' ) for the Islamic/Judeo-Christian notion of God/god, you have the idea of a benevolent circumstance or fate.

Kwenu.com has a nice volume of decent translations of Igbo names. In fact, it also has a nice overview of the concept of "Chi" in Igbo mythology. Ironically, this concept as a strong correlation with my understanding of how the celestial interacts with the terrestrial, the rule of natural law, and how it relates to myself and humans in general. The chinese word "Chi" has similar connotation:

the life-process or "flow" of energy that sustains living beings are found in many belief systems, especially in Asia.

Igbo mythology and ancient Indo-Chinese spiritual philosophy have much in common, so I guess in the end it is not so odd that I have come to find myself practicing Indo-Chinese spiritual philosophy as a way to understand the crazy world that I live in. Their themes appeal to me in the same long-lasting way that the meanings invoked by the names of my children appeal to me.

I have a flickr album set of our (Roschella and I) favorite pictures of her

Permalink | Leave a comment »

20 Jul 2009 8:00pm GMT

Uche and Chimezie Ogbuji: Magic sets, DLP, and other strange ways to implement semantic web expert systems

I just finished some changes to python-dlp including a modification to
FuXi that includes an implementation of the Magic Set Transformation
(MST) method for RIF-like horn clauses. The most useful, immediate
value this has for me is to be able to (essentially) implement a DLP
(description logic programming) entailment regime for a SPARQL query

Consider the test case for the SymmetricProperty OWL test.

The base facts are:

first:Ghent first:path first:Antwerp .
first:path a owl:SymmetricProperty

The goal we are trying to prove is:

first:Antwerp first:path first:Ghent

I.e., a user wants to query an RDF dataset that includes an RDF graph
with the above statements and is expected to implement an entailment
regime for OWL-DL RDF such that the following query gives a positive

ASK { first:Antwerp first:path first:Ghent }

The general pD* rule that would normally apply in helping answer this query is:

{?P a owl:SymmetricProperty. ?S ?P ?O} => {?O ?P ?S}.

Re-written in a familiar (prolog-like) RIF-BLD syntax:

Forall ?P ?O ?S ( ?P(?O ?S) :- And( owl:SymmetricProperty(?P) ?P(?S ?O) ) )

In order to maintain consistency, a rule-based engine that used this
clause to implement the definition of a symmetric property would need
to fire it for *every* triple in the fact base (in order to properly
calculate the herbrand base) because of the 2nd triple pattern in the
body / antecedent / left-hand-side of the rule: ?S ?P ?O

However, the DLP approach that converted tree-based OWL axioms into
colloquial horn clauses would allow us to use (instead) a
domain-specific rule:

Forall ?Y ?X ( first:path(?Y ?X) :- first:path(?X ?Y) )]

This rule is domain-specific in the sense that it only applies to
instances of the first:path predicate rather than for every predicate.
As a result of this transformation, the procedural evaluation of the
rule for symmetry has been reduced from the worst case to only the
fraction of the RDF dataset concerning first:path statements.

So, a knowledge base that could exhaustively evaluate rules in a
top-down fashion (via 'forward chaining') prior to bringing up the
SPARQL service could answer that question against the (smaller)
entailed RDF graph.

However, with the MST implementation, if the query was known a priori
the ruleset can be modified into a version that further restricts the
amount of redundant work done during the inference process. For
example, even if the SPARQL service is known to never have to answer
that query, the colloquial rule above would still be needed by a niave
implemenation and would apply to every statement that used the
first:path predicate.


The diagram above is a Graph-viz rendering of a Proof Markup Language
(PML) proof tree generated by taking the colloquial rule, modifying it
using the MST algorithm, evaluating the base facts against the
ruleset, and adding an RDF statement that 'triggers' the
backward-chaining process. Fuxi includes a nice set of utilities for generating proof tree vizualizations.

Essentially, performing a top-down (or
forward chaining) evaluation of the rules and the facts simulates a
backward-chained proof.

Below are the 3 rules that replace the original domain-specific rule:

:path_magic(?LOC1 ? LOC2) :- And( :path_magic(?X ? LOC2)
:path_bf(?X ? LOC1) :path_magic(?X) )
:path_magic(?X) :- :path_magic(?X ?LOC)
:path(?X ?LOC1) :- And( :path_magic(?X ? LOC1) :path_magic(?X)
:path(?X ?LOC2) :path_magic(? LOC2 ? LOC1) :path(?LOC2 ?LOC1) )

And finally, the trigger for the proof is the following RDF statement:

first:Antwerp first:path_magic first:Ghent

The first two rules, pass through information about the sub-goals of
the query and essentially block the final rule from taking effect
until the trigger is added to the fact graph. It is clear to see that
the 3rd rule, will no longer apply to every RDF statement with a
first:path predicate, but rather only statements of that kind where
the subject and / or object terms are part of a query. So, for a
SPARQL service where we do not expect to answer queries that rely on
supporting symmetric properties in the first:path predicate, no
calculations will be performed and no unnecessary RDF statements will
be entailed.

I hope to write a bit more about some of the benefits of a Python
toolkit for building Semantic Web expert systems. I touched a bit on
these in my InfixOWL write-up and presentation, but haven't really put
the whole picture together.

-- Chimezie

Permalink | Leave a comment »

20 Jul 2009 8:00pm GMT

Uche and Chimezie Ogbuji: Using URIs to denote non information artifacts was - Re: Review of new HTTPbis text for 303 See Other from Jonathan Rees on 2009-07-07 (www-tag@w3.org from July 2009)

Okay I finally get the idea of using de-referencable identifiers for non-information artifact 'entities'

http://delicious.com Bookmark this on Delicious - Saved by chimezie to - More about this bookmark

20 Jul 2009 7:00pm GMT

Uche and Chimezie Ogbuji: Recipe 576795: Partitioning a sequence

Distinct partitions of a sequence.

http://delicious.com Bookmark this on Delicious - Saved by chimezie to - More about this bookmark

20 Jul 2009 7:00pm GMT

Uche and Chimezie Ogbuji: Recent scenes from the ISS - The Big Picture - Boston.com

"..Collected here are a handful of photographs of Sarychev Peak Volcano, and more, taken by astronauts aboard the ISS over the past few months."

http://delicious.com Bookmark this on Delicious - Saved by chimezie to - More about this bookmark

20 Jul 2009 7:00pm GMT

Uche and Chimezie Ogbuji: Proust, ooky cable guys, and everything in between

via youtube.com

Hilarious story, "The Cable Guy Took a Dump in My Bathroom, Or, Why I Hate My Parents." by a friend and fellow writer on thenervousbreakdown.com. Kimberly is also an amateur filmmaker, and if you like her story, check out her film, "Why we wax".

Permalink | Leave a comment »

20 Jul 2009 7:00pm GMT

Uche and Chimezie Ogbuji: IBM Press room - 2009-06-30 IBM Research and European Union Provide Software Developers with Performance Gains and Faster Time-To-Market - United States

"..the world's first open source machine learning compiler. The compiler intelligently optimizes applications, translating directly into shorter software development times and bigger performance gains."

http://delicious.com Bookmark this on Delicious - Saved by chimezie to - More about this bookmark

20 Jul 2009 7:00pm GMT

Uche and Chimezie Ogbuji: Chapter 18: Authentication and Authorization — Pylons Book v1.0 documentation

AuthKit documentation

http://delicious.com Bookmark this on Delicious - Saved by chimezie to - More about this bookmark

20 Jul 2009 7:00pm GMT