31 Jul 2010
Planet Python
PyCon: Pycon India - CFP closes today
Today is the last date for the Call for Proposals for Pycon India 2010. We have received an awesome response from the Python community in India so far, with 61 talk proposals submitted so far.
We will be accepting proposals till midnight today. If you haven't submitted a talk yet, rush to the talk submission page and submit at the earliest.
31 Jul 2010 3:56am GMT
PyCon: Pycon India - CFP closes today
Today is the last date for the Call for Proposals for Pycon India 2010. We have received an awesome response from the Python community in India so far, with 61 talk proposals submitted so far.
We will be accepting proposals till midnight today. If you haven't submitted a talk yet, rush to the talk submission page and submit at the earliest.
31 Jul 2010 3:56am GMT
30 Jul 2010
Planet Python
Montreal Python User Group: Pylons presentation is online
Alexandre Bourget's energetic presentation on Pylons is at last online: Pylons, Web development done right.
Your browser doesn't support HTML5. Please use the download link.
If you use Safari and want to use a libre format, install the Xiph QuickTime Component at http://www.xiph.org/quicktime
30 Jul 2010 6:23pm GMT
Montreal Python User Group: Pylons presentation is online
Alexandre Bourget's energetic presentation on Pylons is at last online: Pylons, Web development done right.
Your browser doesn't support HTML5. Please use the download link.
If you use Safari and want to use a libre format, install the Xiph QuickTime Component at http://www.xiph.org/quicktime
30 Jul 2010 6:23pm GMT
Jesse Noller: Miscellanea – Python Sprints, Nasuni, etc.
I've obviously been quiet here on my personal blog - as everyone who reads regularly knows I'm neck-deep in a pretty exciting startup call Nasuni as well as doing other projects, like the PSF Sponsored sprints thing. That combined with twitter means my time for other additional long-form content is minimal. So here's a small roundup of interesting things:
Nasuni
Yup, still running Python and Django! We're actually pretty proud to be a sponsor for DjangoCon 2010 coming up in September - I'll be attending, so I hope to see all the familiar Django faces I know, and meet some new ones.
I've been blogging semi-regularly for the Nasuni blog itself - my posts are focused on product-things more than anything else. Here's a small list of posts which I've done:
- The Road to Release - Feature Previews - this is actually my latest one, and the first in a series where I'll be showing off some of the new features we're adding in the latest release.
- Looking at OpenStack, a Rackspace and NASA initiative - For those of you who don't know, Rackspace and NASA announced OpenStack - the awesome part? It's all python - I had the swift component (which powers Rackspace's cloudfiles system) of OpenStack running pretty quickly. I'd recommend snagging the code from launchpad and taking a look. Swift (the storage component) uses eventlet - and Nova (the compute part) uses Tornado and Twisted.
- Storage Switzerland Test Drives the Filer - This is a response to an article written about the product - I actually used it to preview some of the work going into the next release of the Filer.
- Thanks to Django - This piece goes into some detail about our use of Django, it's one of our ways of saying thanks. I still need to rework it so we can send it over for the Django Success Stories page.
- Thanks to the Supporting Cast - This is an earlier thank you post - but to the other people who have helped out a ton, including Greg Newman, Lincoln Loop, and Revsys.
- The Donut Solution - This was a fun one, mainly to show that yes - we're listening hard to customer feedback, and we're improving/iterating quickly. Also, I get to show off UI improvements.
- Finally - The Nasuni Blog team - this is the rosetta stone for the authors of the blog, describing who we are. I didn't write this piece, but it's good reading to figure out who is who.
If you're interested in Nasuni - or cloud storage in general - I'd encourage you to sign up for the RSS feed. We're trying to keep the information useful outside of "just us" (despite my urge and predilection to churning out completely product-related posts) - and if you ever have feedback, drop us a line.
PSF Sponsored Sprints
The project continues on - we've funded two sprints so far, and have several more coming down the pike. We're always in need of volunteers to help us do things like the manuals and site maintenance/content authoring. Here's some highlights:
- The call for applications is open - The call for applications is open - and now I suspect we won't be closing it. Originally, I thought we'd have to do things in waves of apply-approve. As time has progressed, I no longer think this is the case.
- Montreal Python Packaging sprint wrap up - the wrap up report for our first sprint!
- Europython core sprint report - another wrap up report for the core sprint we provided funds to.
- Just added the locations page - we now have people/companies offering up space for sprinters! Check it out!
- Finally - Sprints at PyOhio - PyOhio is going on this weekend, if you're in the area you should really go check it out! Catherine has gone above and beyond with the entire "become a contributor" effort going on.
Please! If you're thinking about holding a sprint - send us an application! Heck, even if we're not sponsoring it, we'll help promote you via the blog, and the sprint calendar we have up. A little fact? The sprints we've funded so far, and that are on deck for funding are all outside of the US, which is both awesome, and surprising!
PSF Board
Some of you probably know that I'm currently on the board of directors for the PSF - things progress well here, but I mainly wanted to call out the excellent blog Doug Hellmann has been authoring for PSF news. You should really be watching that because yes - we do do things, and hopefully over the next year, we'll be doing more awesome things.
I've actually got a bigger post in the works for what I think the ultimate mission of the PSF is/should be as well as "how do you get money from us" as well. Must find the time!
30 Jul 2010 5:34pm GMT
Jesse Noller: Miscellanea – Python Sprints, Nasuni, etc.
I've obviously been quiet here on my personal blog - as everyone who reads regularly knows I'm neck-deep in a pretty exciting startup call Nasuni as well as doing other projects, like the PSF Sponsored sprints thing. That combined with twitter means my time for other additional long-form content is minimal. So here's a small roundup of interesting things:
Nasuni
Yup, still running Python and Django! We're actually pretty proud to be a sponsor for DjangoCon 2010 coming up in September - I'll be attending, so I hope to see all the familiar Django faces I know, and meet some new ones.
I've been blogging semi-regularly for the Nasuni blog itself - my posts are focused on product-things more than anything else. Here's a small list of posts which I've done:
- The Road to Release - Feature Previews - this is actually my latest one, and the first in a series where I'll be showing off some of the new features we're adding in the latest release.
- Looking at OpenStack, a Rackspace and NASA initiative - For those of you who don't know, Rackspace and NASA announced OpenStack - the awesome part? It's all python - I had the swift component (which powers Rackspace's cloudfiles system) of OpenStack running pretty quickly. I'd recommend snagging the code from launchpad and taking a look. Swift (the storage component) uses eventlet - and Nova (the compute part) uses Tornado and Twisted.
- Storage Switzerland Test Drives the Filer - This is a response to an article written about the product - I actually used it to preview some of the work going into the next release of the Filer.
- Thanks to Django - This piece goes into some detail about our use of Django, it's one of our ways of saying thanks. I still need to rework it so we can send it over for the Django Success Stories page.
- Thanks to the Supporting Cast - This is an earlier thank you post - but to the other people who have helped out a ton, including Greg Newman, Lincoln Loop, and Revsys.
- The Donut Solution - This was a fun one, mainly to show that yes - we're listening hard to customer feedback, and we're improving/iterating quickly. Also, I get to show off UI improvements.
- Finally - The Nasuni Blog team - this is the rosetta stone for the authors of the blog, describing who we are. I didn't write this piece, but it's good reading to figure out who is who.
If you're interested in Nasuni - or cloud storage in general - I'd encourage you to sign up for the RSS feed. We're trying to keep the information useful outside of "just us" (despite my urge and predilection to churning out completely product-related posts) - and if you ever have feedback, drop us a line.
PSF Sponsored Sprints
The project continues on - we've funded two sprints so far, and have several more coming down the pike. We're always in need of volunteers to help us do things like the manuals and site maintenance/content authoring. Here's some highlights:
- The call for applications is open - The call for applications is open - and now I suspect we won't be closing it. Originally, I thought we'd have to do things in waves of apply-approve. As time has progressed, I no longer think this is the case.
- Montreal Python Packaging sprint wrap up - the wrap up report for our first sprint!
- Europython core sprint report - another wrap up report for the core sprint we provided funds to.
- Just added the locations page - we now have people/companies offering up space for sprinters! Check it out!
- Finally - Sprints at PyOhio - PyOhio is going on this weekend, if you're in the area you should really go check it out! Catherine has gone above and beyond with the entire "become a contributor" effort going on.
Please! If you're thinking about holding a sprint - send us an application! Heck, even if we're not sponsoring it, we'll help promote you via the blog, and the sprint calendar we have up. A little fact? The sprints we've funded so far, and that are on deck for funding are all outside of the US, which is both awesome, and surprising!
PSF Board
Some of you probably know that I'm currently on the board of directors for the PSF - things progress well here, but I mainly wanted to call out the excellent blog Doug Hellmann has been authoring for PSF news. You should really be watching that because yes - we do do things, and hopefully over the next year, we'll be doing more awesome things.
I've actually got a bigger post in the works for what I think the ultimate mission of the PSF is/should be as well as "how do you get money from us" as well. Must find the time!
30 Jul 2010 5:34pm GMT
Lightning Fast Shop: Released 0.5.0 beta 4
Today we released LFS 0.5.0 beta 4
What's new?
- Bugfix pages: caching page_view
- Bugfix cart: display correct stock amount within growl message
- Bugfix product_inline: display property title within error message
- Bugfix order_received_mail.html: display the correct selected values of a configurable product
- Bugfix cart: calculation of maximum delivery date
- Bugfix redirect: save redirect url for variants
- Bugfix lfs.page.views: added missing import of Http404
- Bugfix: restrict adding to cart if the product is not deliverable. Issue #37
- Added french translations (Jacques Seite)
- Added get_properties method to OrderItem
- Added optional cached parameter to cart/utils/get_cart_price and cart/utils/get_cart_costs
- Removed javascript which dynamically sets the height of the slots.
- Changed properties management: display name instead of title within left portlet
- Improved lfs.portlet: caching
Further Information
You can find more information and help on following places:
- Official page
- Documentation on PyPI
- Demo
- Releases on PyPI
- Source code on bitbucket.org
- Google Group
- lfsproject on Twitter
- IRC
30 Jul 2010 2:43pm GMT
Lightning Fast Shop: Released 0.5.0 beta 4
Today we released LFS 0.5.0 beta 4
What's new?
- Bugfix pages: caching page_view
- Bugfix cart: display correct stock amount within growl message
- Bugfix product_inline: display property title within error message
- Bugfix order_received_mail.html: display the correct selected values of a configurable product
- Bugfix cart: calculation of maximum delivery date
- Bugfix redirect: save redirect url for variants
- Bugfix lfs.page.views: added missing import of Http404
- Bugfix: restrict adding to cart if the product is not deliverable. Issue #37
- Added french translations (Jacques Seite)
- Added get_properties method to OrderItem
- Added optional cached parameter to cart/utils/get_cart_price and cart/utils/get_cart_costs
- Removed javascript which dynamically sets the height of the slots.
- Changed properties management: display name instead of title within left portlet
- Improved lfs.portlet: caching
Further Information
You can find more information and help on following places:
- Official page
- Documentation on PyPI
- Demo
- Releases on PyPI
- Source code on bitbucket.org
- Google Group
- lfsproject on Twitter
- IRC
30 Jul 2010 2:43pm GMT
Salman Haq: A Simple Experiment with Hookbox
Hookbox is a new Python-based comet server with support for Websockets. It's main features include:
- Support for named channels on which data is published/subscribed.
- Server-side Websocket and WSGI support via the eventlet concurrent networking library.
- Client-side Websocket support via js.io.
- Fall-back to a custom comet protocol when websocket support is not available in the browser.
- A RESTful api to interact with the channels.
- Easy integration with web frameworks like Quixote and Django, again via web hooks.
- A built-in admin interface to view the state of active channels and users.
It has been authored by some of the original people behind Orbited and represents a significant simplification in the process of writing websocket-enabled web apps primarily because it includes its own message broker.
Experiment: Graphic EQ
To better understand Hookbox, I decided to write a web-based graphic EQ which updates itself based on data produced on the server (strictly speaking, a graphic EQ is a control instrument, but my example presents it as a measurement instrument so perhaps calling it a web-based spectrum analyzer would be more appropriate). Checkout the video below to see it in action:
Hookbox Graphic EQ Demo from Salman Haq on Vimeo.
The top-half of the clip shows the equalizer bands dynamically updating in Firefox (v3.6.8) while the bottom-half shows the Firebug pane displaying the live network calls. If you look carefully, you will see that in this particular experiment all the network requests used the CSP protocol and not the websocket protocol. This is expected because websocket support is only available in version 4 or later of Firefox.
The video capture was a bit sluggish so it's worth mentioning that it actually ran much smoother than it appears in the video. This is due to the fact that graphics acceleration is improperly configured on my quad-core box. (Side note: Each time I upgrade to the latest Ubuntu release, my video and printer settings need to be tweaked significantly to recover optimal performance. Since I upgraded to Lucid Lynx, I have not been able get the video performance I would like).
Trials
The experiments were repeated using three data publishing rates: 10 Hz, 100 Hz, and 1000 Hz. I should note that the producer.py process used time.sleep() which is not very precise on systems running a non-realtime kernel which typically lack high-precision timers. At the 10 and 100 Hz rates, the browser was able to keep up with the deluge of data and rendered the bands quite smoothly. At the 1KHz rate, all of my system's cores reached maximum loading and Firefox rendered the bands rather slowly. The memory consumption did not jump between these trials and the network bandwidth increased slightly but never crossed 30 kbps in the steady-state.
These observations suggest that the experiments were CPU-limited rather than memory or bandwidth limited as the data publishing rate was increased. This is fitting with my earlier remark that graphics acceleration on the test system is mis-configured, which means that the display rendering was done by the CPUs rather than the onboard video card and hence the high cpu consumption.
I failed to repeat these experiments with Google Chrome due to this bug which I consequently filed on github. But given that V8 is a faster javascript engine, I would expect Chrome to perform better.
Code
The source code of the experiment is available in my fork of the hookbox project on github (I expect it will be pulled into Michael Carter's main branch soon). It consists of fewer than 100 lines of Python, fewer than 30 lines of Javascript, and took about a couple of hours to get working satisfactorily. That's not bad considering that it was an introductory play-date with Hookbox.
The equalizer widget consists of seven sliders from the jQuery-UI library, lined up vertically alongside each other. To update the sliders, a call back is attached to the hookbox subscription object's "onPublish" event which looks like this:
// see index.html for the full code.
subscription.onPublish = function(frame) {
// adjust each slider from the data values in the frame's payload.
// the payload is expected to be an array of seven integers (one for
// each slider).
$("#eq > span").each(function (index) {
$(this).slider("value", frame.payload[index]);
});
};
The data production and publishing loop is also very simple. Seven random integer samples are produced, and sent via a HTTP POST to the Hookbox server, which then publishes it to all subscribers of that channel:
# see producer.py for the full code.
while True:
# generate seven random data points
# and publish them via the HTTP/REST api at 10 Hz.
# pop = range(0,100) (defined earlier)
# 'values' is a dictionary with some other fields required by hookbox (also defined earlier).
values["payload"] = random.sample(pop, 7)
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
# publish the samples!
resp = urllib2.urlopen(req)
time.sleep(0.1)
The common and most important thing between these two snippets is the "payload" object. In this example, it is an array of seven integers, ranging from 0 to 99 that is produced by the aptly named producer.py which also publishes it via the hookbox server and is received by the javascript running in the browser. The rest of the code in my example is quite self-explanatory and is a good way to learn Hookbox (and even Quixote).
Conclusion
While still in its infancy, Hookbox looks like a very promising project. From a developer's perspective, it allows you to get something up and running very quickly. Its simplicity derives from the web hook api which also concerns me a bit. Using HTTP for inter-process communication, as in my example producer.py communicates with the hookbox server to publish data, doesn't seem like the best choice for low-latency high-throughput applications. Perhaps Hookbox might introduce a web socket interface for inter-process communication so that the data can travel from the producer to the consumer using only one protocol? Overall, I'm quite impressed with Hookbox and really hope that it gains momentum. And lastly, don't forget to read Michael Carter's introduction to the project on cometdaily.com!
30 Jul 2010 2:03pm GMT
Salman Haq: A Simple Experiment with Hookbox
Hookbox is a new Python-based comet server with support for Websockets. It's main features include:
- Support for named channels on which data is published/subscribed.
- Server-side Websocket and WSGI support via the eventlet concurrent networking library.
- Client-side Websocket support via js.io.
- Fall-back to a custom comet protocol when websocket support is not available in the browser.
- A RESTful api to interact with the channels.
- Easy integration with web frameworks like Quixote and Django, again via web hooks.
- A built-in admin interface to view the state of active channels and users.
It has been authored by some of the original people behind Orbited and represents a significant simplification in the process of writing websocket-enabled web apps primarily because it includes its own message broker.
Experiment: Graphic EQ
To better understand Hookbox, I decided to write a web-based graphic EQ which updates itself based on data produced on the server (strictly speaking, a graphic EQ is a control instrument, but my example presents it as a measurement instrument so perhaps calling it a web-based spectrum analyzer would be more appropriate). Checkout the video below to see it in action:
Hookbox Graphic EQ Demo from Salman Haq on Vimeo.
The top-half of the clip shows the equalizer bands dynamically updating in Firefox (v3.6.8) while the bottom-half shows the Firebug pane displaying the live network calls. If you look carefully, you will see that in this particular experiment all the network requests used the CSP protocol and not the websocket protocol. This is expected because websocket support is only available in version 4 or later of Firefox.
The video capture was a bit sluggish so it's worth mentioning that it actually ran much smoother than it appears in the video. This is due to the fact that graphics acceleration is improperly configured on my quad-core box. (Side note: Each time I upgrade to the latest Ubuntu release, my video and printer settings need to be tweaked significantly to recover optimal performance. Since I upgraded to Lucid Lynx, I have not been able get the video performance I would like).
Trials
The experiments were repeated using three data publishing rates: 10 Hz, 100 Hz, and 1000 Hz. I should note that the producer.py process used time.sleep() which is not very precise on systems running a non-realtime kernel which typically lack high-precision timers. At the 10 and 100 Hz rates, the browser was able to keep up with the deluge of data and rendered the bands quite smoothly. At the 1KHz rate, all of my system's cores reached maximum loading and Firefox rendered the bands rather slowly. The memory consumption did not jump between these trials and the network bandwidth increased slightly but never crossed 30 kbps in the steady-state.
These observations suggest that the experiments were CPU-limited rather than memory or bandwidth limited as the data publishing rate was increased. This is fitting with my earlier remark that graphics acceleration on the test system is mis-configured, which means that the display rendering was done by the CPUs rather than the onboard video card and hence the high cpu consumption.
I failed to repeat these experiments with Google Chrome due to this bug which I consequently filed on github. But given that V8 is a faster javascript engine, I would expect Chrome to perform better.
Code
The source code of the experiment is available in my fork of the hookbox project on github (I expect it will be pulled into Michael Carter's main branch soon). It consists of fewer than 100 lines of Python, fewer than 30 lines of Javascript, and took about a couple of hours to get working satisfactorily. That's not bad considering that it was an introductory play-date with Hookbox.
The equalizer widget consists of seven sliders from the jQuery-UI library, lined up vertically alongside each other. To update the sliders, a call back is attached to the hookbox subscription object's "onPublish" event which looks like this:
// see index.html for the full code.
subscription.onPublish = function(frame) {
// adjust each slider from the data values in the frame's payload.
// the payload is expected to be an array of seven integers (one for
// each slider).
$("#eq > span").each(function (index) {
$(this).slider("value", frame.payload[index]);
});
};
The data production and publishing loop is also very simple. Seven random integer samples are produced, and sent via a HTTP POST to the Hookbox server, which then publishes it to all subscribers of that channel:
# see producer.py for the full code.
while True:
# generate seven random data points
# and publish them via the HTTP/REST api at 10 Hz.
# pop = range(0,100) (defined earlier)
# 'values' is a dictionary with some other fields required by hookbox (also defined earlier).
values["payload"] = random.sample(pop, 7)
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
# publish the samples!
resp = urllib2.urlopen(req)
time.sleep(0.1)
The common and most important thing between these two snippets is the "payload" object. In this example, it is an array of seven integers, ranging from 0 to 99 that is produced by the aptly named producer.py which also publishes it via the hookbox server and is received by the javascript running in the browser. The rest of the code in my example is quite self-explanatory and is a good way to learn Hookbox (and even Quixote).
Conclusion
While still in its infancy, Hookbox looks like a very promising project. From a developer's perspective, it allows you to get something up and running very quickly. Its simplicity derives from the web hook api which also concerns me a bit. Using HTTP for inter-process communication, as in my example producer.py communicates with the hookbox server to publish data, doesn't seem like the best choice for low-latency high-throughput applications. Perhaps Hookbox might introduce a web socket interface for inter-process communication so that the data can travel from the producer to the consumer using only one protocol? Overall, I'm quite impressed with Hookbox and really hope that it gains momentum. And lastly, don't forget to read Michael Carter's introduction to the project on cometdaily.com!
30 Jul 2010 2:03pm GMT
Michael Foord: A plugin system for unittest(2)
One of the big problems with unittest, which I have been maintaining for well over a year now, is how hard it is to extend. For a long time I've been saying that my "big task" with unittest was to implement a plugin system. ... [838 words]
30 Jul 2010 12:15pm GMT
Michael Foord: A plugin system for unittest(2)
One of the big problems with unittest, which I have been maintaining for well over a year now, is how hard it is to extend. For a long time I've been saying that my "big task" with unittest was to implement a plugin system. ... [838 words]
30 Jul 2010 12:15pm GMT
Reinout van Rees: Software releases 6: skeletons for easy best practices
This is the last post in my software releases series (at least, the last as I originally planned the series).
In the last 5 articles, you've seen a couple of places where there's some irritating boilerplate. What's supposed to go into your setup.py? How did you start off with a buildout.cfg again? Oh, don't forget the CHANGES.txt and a basic README.txt. The solution for that is something I normally call a skeleton. I mentioned skeletons earlier in a Dutch python user group talk on practical project automation.
A skeleton is a basic project structure with a README, a couple of directories, some python files to start off with and whatever else you want to put in there. You give it some parameters (project name, website address, whatever) and it'll fill in the blanks with those parameters.
Django has a manage.py startapp, but there's a generic python tool called PasteScript. PasteScript's documentation almost exclusively talks about running wsgi applications, but it actually can create project skeletons for you. The best way to get started is to look at ZopeSkel, which has a large collection of skeletons. Download it and see what it can do.
There are two core advantages to using skeletons either for yourself or within your community or within your company:
- Boilerplate reduction. Basic files get created and set up for you. Laziness works to your advantage. "Just start the project from a skeleton" gives you something reasonably solid and neat. The alternative is to just start off with a python file and a sub directory and "remember" to add the readme and so later.
- Best practice. Pour your knowledge into a skeleton. Figure out the way you want to set up your projects once and reap the fruits many times over. You're probably copy-pasting apache config files from one project to the other: why not keep the latest and greatest config file in the skeleton? For me, this is the number one advantage: you've got a place to collect your project setup best practice. And having that place to put it means that more best practice gets attracted!
Want to start your own skeleton? What I'd recommend is to start with a ZopeSkel download and to look at their code and how to set it all up. Then start your own. I worked for Zest software and started "zestskel" there. I worked for The Health Agency afterwards and started "thaskel" there. Now I work at Nelen en Schuurmans so I've started "nensskel" here :-)
Some examples of what I get when I create a new django site with nensskel (after giving it a project name):
- Basic setup.py with project name filled in. Long description is read from the readme and changelog. Some common dependencies are pre-filled.
- buildout.cfg which sets up django, creates apache config files, contains package-versioning best practice setup, contains some commonly used tools, etc.
- Readme, changelog, todo file. Of course with the project name in there.
- Basic apache config file. In here there's also the configuration that's needed for django's static files. And some caching setup. And the wsgi configuration. And the setup needed for django-compressor.
- A directory with the actual django app (empty models.py, views.py and so). A bit like how django's manage.py startapp does it, but now with our own defaults.
- Our own defaults? Yes, for instance the boilerplate needed in settings.py for our django-staticfiles css/js setup. You definitively don't want to type that by hand.
- Pre-created PROJECTNAME/templates/PROJECTNAME, PROJECTNAME/media/PROJECTNAME and PROJECTNAME/fixtures/ directories.
- Ready-made test setup just the way we like it.
So: make it easy to do the right thing. Let laziness work in your favour. Start a skeleton today!
30 Jul 2010 8:24am GMT
Reinout van Rees: Software releases 6: skeletons for easy best practices
This is the last post in my software releases series (at least, the last as I originally planned the series).
In the last 5 articles, you've seen a couple of places where there's some irritating boilerplate. What's supposed to go into your setup.py? How did you start off with a buildout.cfg again? Oh, don't forget the CHANGES.txt and a basic README.txt. The solution for that is something I normally call a skeleton. I mentioned skeletons earlier in a Dutch python user group talk on practical project automation.
A skeleton is a basic project structure with a README, a couple of directories, some python files to start off with and whatever else you want to put in there. You give it some parameters (project name, website address, whatever) and it'll fill in the blanks with those parameters.
Django has a manage.py startapp, but there's a generic python tool called PasteScript. PasteScript's documentation almost exclusively talks about running wsgi applications, but it actually can create project skeletons for you. The best way to get started is to look at ZopeSkel, which has a large collection of skeletons. Download it and see what it can do.
There are two core advantages to using skeletons either for yourself or within your community or within your company:
- Boilerplate reduction. Basic files get created and set up for you. Laziness works to your advantage. "Just start the project from a skeleton" gives you something reasonably solid and neat. The alternative is to just start off with a python file and a sub directory and "remember" to add the readme and so later.
- Best practice. Pour your knowledge into a skeleton. Figure out the way you want to set up your projects once and reap the fruits many times over. You're probably copy-pasting apache config files from one project to the other: why not keep the latest and greatest config file in the skeleton? For me, this is the number one advantage: you've got a place to collect your project setup best practice. And having that place to put it means that more best practice gets attracted!
Want to start your own skeleton? What I'd recommend is to start with a ZopeSkel download and to look at their code and how to set it all up. Then start your own. I worked for Zest software and started "zestskel" there. I worked for The Health Agency afterwards and started "thaskel" there. Now I work at Nelen en Schuurmans so I've started "nensskel" here :-)
Some examples of what I get when I create a new django site with nensskel (after giving it a project name):
- Basic setup.py with project name filled in. Long description is read from the readme and changelog. Some common dependencies are pre-filled.
- buildout.cfg which sets up django, creates apache config files, contains package-versioning best practice setup, contains some commonly used tools, etc.
- Readme, changelog, todo file. Of course with the project name in there.
- Basic apache config file. In here there's also the configuration that's needed for django's static files. And some caching setup. And the wsgi configuration. And the setup needed for django-compressor.
- A directory with the actual django app (empty models.py, views.py and so). A bit like how django's manage.py startapp does it, but now with our own defaults.
- Our own defaults? Yes, for instance the boilerplate needed in settings.py for our django-staticfiles css/js setup. You definitively don't want to type that by hand.
- Pre-created PROJECTNAME/templates/PROJECTNAME, PROJECTNAME/media/PROJECTNAME and PROJECTNAME/fixtures/ directories.
- Ready-made test setup just the way we like it.
So: make it easy to do the right thing. Let laziness work in your favour. Start a skeleton today!
30 Jul 2010 8:24am GMT
Richard Jones: More Cheese Shop goodness
Georg has created a plugin for Sphinx that handles all that dynamic JSON Cheese Shop stuff, and it's called sphinxcontrib-cheeseshop. It's pretty darned cool :-)
Also I made a couple of other cosmetic changes to the Cheese Shop which you may have already noticed but should be apparent from this screenshot fragment:

30 Jul 2010 6:35am GMT
Richard Jones: More Cheese Shop goodness
Georg has created a plugin for Sphinx that handles all that dynamic JSON Cheese Shop stuff, and it's called sphinxcontrib-cheeseshop. It's pretty darned cool :-)
Also I made a couple of other cosmetic changes to the Cheese Shop which you may have already noticed but should be apparent from this screenshot fragment:

30 Jul 2010 6:35am GMT
V.S. Babu: Quick script to maintain a diary
I like to keep my daily notes in a folder in the filesystem with filenames yyyymmdd.otl, using VIM Outliner. Here is a small DOS script to make a file for a day if it doesn't exist and then open it. Name it as diary.cmd and keep in your path.
30 Jul 2010 5:37am GMT
V.S. Babu: Quick script to maintain a diary
I like to keep my daily notes in a folder in the filesystem with filenames yyyymmdd.otl, using VIM Outliner. Here is a small DOS script to make a file for a day if it doesn't exist and then open it. Name it as diary.cmd and keep in your path.
30 Jul 2010 5:37am GMT
Francisco Souza: High level acceptance testing in your PHP applications using Python, Lettuce and Selenium
Usually, in PHP applications, I don't run acceptance tests. Lately, I have been thinking about how can I change this. I am a little experienced with others acceptance tests frameworks, for Python (Lettuce) and Ruby (Cucumber), but there is nothing like Lettuce and/or Cucumber in PHP world. So, what can I do? I can just use Lettuce or Cucumber to test any application, including PHP or Java application. In this blog post, I will show how to use behaviour-driven development (BDD) behind Lettuce to write acceptance tests for PHP applications.
Before start to coding, let's build up our environment, with Python, Lettuce and Selenium. If you use Linux, Mac OS, FreeBSD or any other Unix-like operating system, Python is already installed in your system. So, let's install Lettuce and Selenium.
Installing Lettuce
To install Lettuce from source code, you can clone the repository from github and install it using setuptools:
$ git clone git://github.com/gabrielfalcao/lettuce.git
$ cd lettuce
$ [sudo] python setup.py install
If you use pip or easy_install, you can also install Lettuce with the following commands:
$ [sudo] easy_install lettuce
$ [sudo] pip install lettuce
Installing Selenium
The code used in this blog post was written for Selenium 2.0, wich is still alfa. So, to install Selenium, you need to checkout the SVN repository and run the setup.py install command:
$ svn checkout http://selenium.googlecode.com/svn/trunk/ selenium
$ cd selenium
$ [sudo] python setup.py install
With both Selenium and Lettuce installed, let's code.
Let's code! :)
The example will be very simple: a feature called "Login" with two scenarios: "Successful login" and "Unsuccessful login". See the feature "code" above, and understand it:
Feature: Login
In order to have unlimited powers
As a system anonymous user
I want to login in the system
Scenario Outline: Successful and unsuccessful login
Given that there is a user registered in the system with the username "admin" and the password "123"
When I navigate to the login page
And fill the username field with <login>
And fill the password field with <password>
And click the "Log in" button
Then I see the message <message>
Examples:
| login | password | message |
| admin | 123 | Welcome to admin |
| admin | 1234 | Error. Verify your username and password |
Put this feature file (login.feature) inside a directory called feature in the root of your application (or any other place).
Prepare the terrain and define the steps
According the Lettuce documentation, the second round is to define the steps described in the feature file. Before we define our steps, let's write a terrain.py Python module, which prepares the terrain to run our acceptance tests. The only things we need to do is setup a Selenium webdriver instance to test the content and the behaviour of our login page. Here is the terrain.py code:
from lettuce import before, after, world
from selenium import get_driver, FIREFOX
@before.all
def setup_browser():
world.browser = get_driver(FIREFOX)
world.root_url = 'http://localhost/php_app_test/'
@after.all
def teardown_browser(total):
world.browser.quit()
The terrain.py module should be in the features directory. Now we can define the steps from the scenario in a Python module called login_steps, and put this module inside a directory called step_definitions, in the features directory. Here is the directory tree:

And in this Python module, we define the steps:
# -*- coding: utf-8 -*-
from lettuce import *
@step(u'Given that there is a user registered in the system with the username "admin" and the password "123"')
def do_nothing(step):
'''There is nothing that Lettuce can do to guarantee this sentence'''
pass
@step(u'When I navigate to the login page')
def when_i_navigate_to_the_login_page(step):
login_url = world.root_url + 'login.html'
world.browser.get(login_url)
@step(u'And I fill the username field with (.*)')
def and_i_fill_the_username_field_with_login(step, login):
username_field = world.browser.find_element_by_xpath('//input[@name="username"]')
username_field.send_keys(login)
@step(u'And I fill the password field with (.*)')
def and_i_fill_the_password_field_with_password(step, password):
password_field = world.browser.find_element_by_xpath('//input[@name="password"]')
password_field.send_keys(password)
@step(u'And I click the "(.*)" button')
def and_i_click_the_group1_button(step, button_value):
button = world.browser.find_element_by_xpath('//input[@value="%s"]' %(button_value))
button.click()
@step(u'Then I see the message (.*)')
def then_i_see_the_message_message(step, message):
assert message in world.browser.get_page_source()
If you don't know Python, take a look at some free books (like Dive into Python and How to think like a computer scientist: Learning with Python) and learn it :)
Understanding the step definitions
Let's see more about the step definitions above. In the first step ("Given that there is a user registered in the system with the username "admin" and the password "123""), there is nothing to do: we cannot check and guarantee that there is a user registered in the system (unless we have access to the database, for example).
On the second step ("When I navigate to the login page"), we use the browser provided by Selenium WebDriver to go to the login page, in the URL http://localhost/php_app_test/login.html. You can see how to use the Selenium WebDriver in Selenium WebDriver docs.
The third step ("And fill the username field with <login>") is where we get the username text field by it's name and fill it with the login obtained from this step's sentence. We use the xpath syntax to find the field by it's name, and on the fourth step ("And fill the password field with <password>"), we get the password field by it's name and fill it with the password obtained from the fourth step sentence. The last browser interation is in the sixth step ("And click the "Log in" button"), when we get the "Log in" button and click it.
The last step ("Then I see the message <message>") asserts that the properly message appears on the screen when the user try to login with the right and wrong data.
The PHP code
Now, let's write the PHP code needed to make this specification work. I will not connect to a database and execute a complete login, we will just build a HTML form in the login.html page and make a simple PHP code to show the properly message in each case. The HTML code is here:
<html>
<head>
<meta http-equiv="Content-type" content="text/html; charset=utf-8"/>
<title>Login page</title>
</head>
<body id="">
<form action="login.php" method="post" accept-charset="utf-8">
<label for="id_username">Username: </label><input type="text" name="username" value="" id="id_username"/><br />
<label for="id_password">Password: </label><input type="password" name="password" value="" id="id_password"/>
<input type="submit" name="submit" value="Log in" id=""/>
</form>
</body>
</html>
And the stupid simple PHP code is here:
<?php
if ($_POST['username'] == 'admin' && $_POST['password'] == '123') {
echo 'Welcome to admin';
} else {
echo 'Error. Verify your username and password';
}
That was a really simple example that shows how to write high level tests to PHP applications, with Selenium WebDriver and Lettuce :) You can checkout the application code at Github: http://github.com/franciscosouza/php_app_test.
30 Jul 2010 1:08am GMT
Francisco Souza: High level acceptance testing in your PHP applications using Python, Lettuce and Selenium
Usually, in PHP applications, I don't run acceptance tests. Lately, I have been thinking about how can I change this. I am a little experienced with others acceptance tests frameworks, for Python (Lettuce) and Ruby (Cucumber), but there is nothing like Lettuce and/or Cucumber in PHP world. So, what can I do? I can just use Lettuce or Cucumber to test any application, including PHP or Java application. In this blog post, I will show how to use behaviour-driven development (BDD) behind Lettuce to write acceptance tests for PHP applications.
Before start to coding, let's build up our environment, with Python, Lettuce and Selenium. If you use Linux, Mac OS, FreeBSD or any other Unix-like operating system, Python is already installed in your system. So, let's install Lettuce and Selenium.
Installing Lettuce
To install Lettuce from source code, you can clone the repository from github and install it using setuptools:
$ git clone git://github.com/gabrielfalcao/lettuce.git
$ cd lettuce
$ [sudo] python setup.py install
If you use pip or easy_install, you can also install Lettuce with the following commands:
$ [sudo] easy_install lettuce
$ [sudo] pip install lettuce
Installing Selenium
The code used in this blog post was written for Selenium 2.0, wich is still alfa. So, to install Selenium, you need to checkout the SVN repository and run the setup.py install command:
$ svn checkout http://selenium.googlecode.com/svn/trunk/ selenium
$ cd selenium
$ [sudo] python setup.py install
With both Selenium and Lettuce installed, let's code.
Let's code! :)
The example will be very simple: a feature called "Login" with two scenarios: "Successful login" and "Unsuccessful login". See the feature "code" above, and understand it:
Feature: Login
In order to have unlimited powers
As a system anonymous user
I want to login in the system
Scenario Outline: Successful and unsuccessful login
Given that there is a user registered in the system with the username "admin" and the password "123"
When I navigate to the login page
And fill the username field with <login>
And fill the password field with <password>
And click the "Log in" button
Then I see the message <message>
Examples:
| login | password | message |
| admin | 123 | Welcome to admin |
| admin | 1234 | Error. Verify your username and password |
Put this feature file (login.feature) inside a directory called feature in the root of your application (or any other place).
Prepare the terrain and define the steps
According the Lettuce documentation, the second round is to define the steps described in the feature file. Before we define our steps, let's write a terrain.py Python module, which prepares the terrain to run our acceptance tests. The only things we need to do is setup a Selenium webdriver instance to test the content and the behaviour of our login page. Here is the terrain.py code:
from lettuce import before, after, world
from selenium import get_driver, FIREFOX
@before.all
def setup_browser():
world.browser = get_driver(FIREFOX)
world.root_url = 'http://localhost/php_app_test/'
@after.all
def teardown_browser(total):
world.browser.quit()
The terrain.py module should be in the features directory. Now we can define the steps from the scenario in a Python module called login_steps, and put this module inside a directory called step_definitions, in the features directory. Here is the directory tree:

And in this Python module, we define the steps:
# -*- coding: utf-8 -*-
from lettuce import *
@step(u'Given that there is a user registered in the system with the username "admin" and the password "123"')
def do_nothing(step):
'''There is nothing that Lettuce can do to guarantee this sentence'''
pass
@step(u'When I navigate to the login page')
def when_i_navigate_to_the_login_page(step):
login_url = world.root_url + 'login.html'
world.browser.get(login_url)
@step(u'And I fill the username field with (.*)')
def and_i_fill_the_username_field_with_login(step, login):
username_field = world.browser.find_element_by_xpath('//input[@name="username"]')
username_field.send_keys(login)
@step(u'And I fill the password field with (.*)')
def and_i_fill_the_password_field_with_password(step, password):
password_field = world.browser.find_element_by_xpath('//input[@name="password"]')
password_field.send_keys(password)
@step(u'And I click the "(.*)" button')
def and_i_click_the_group1_button(step, button_value):
button = world.browser.find_element_by_xpath('//input[@value="%s"]' %(button_value))
button.click()
@step(u'Then I see the message (.*)')
def then_i_see_the_message_message(step, message):
assert message in world.browser.get_page_source()
If you don't know Python, take a look at some free books (like Dive into Python and How to think like a computer scientist: Learning with Python) and learn it :)
Understanding the step definitions
Let's see more about the step definitions above. In the first step ("Given that there is a user registered in the system with the username "admin" and the password "123""), there is nothing to do: we cannot check and guarantee that there is a user registered in the system (unless we have access to the database, for example).
On the second step ("When I navigate to the login page"), we use the browser provided by Selenium WebDriver to go to the login page, in the URL http://localhost/php_app_test/login.html. You can see how to use the Selenium WebDriver in Selenium WebDriver docs.
The third step ("And fill the username field with <login>") is where we get the username text field by it's name and fill it with the login obtained from this step's sentence. We use the xpath syntax to find the field by it's name, and on the fourth step ("And fill the password field with <password>"), we get the password field by it's name and fill it with the password obtained from the fourth step sentence. The last browser interation is in the sixth step ("And click the "Log in" button"), when we get the "Log in" button and click it.
The last step ("Then I see the message <message>") asserts that the properly message appears on the screen when the user try to login with the right and wrong data.
The PHP code
Now, let's write the PHP code needed to make this specification work. I will not connect to a database and execute a complete login, we will just build a HTML form in the login.html page and make a simple PHP code to show the properly message in each case. The HTML code is here:
<html>
<head>
<meta http-equiv="Content-type" content="text/html; charset=utf-8"/>
<title>Login page</title>
</head>
<body id="">
<form action="login.php" method="post" accept-charset="utf-8">
<label for="id_username">Username: </label><input type="text" name="username" value="" id="id_username"/><br />
<label for="id_password">Password: </label><input type="password" name="password" value="" id="id_password"/>
<input type="submit" name="submit" value="Log in" id=""/>
</form>
</body>
</html>
And the stupid simple PHP code is here:
<?php
if ($_POST['username'] == 'admin' && $_POST['password'] == '123') {
echo 'Welcome to admin';
} else {
echo 'Error. Verify your username and password';
}
That was a really simple example that shows how to write high level tests to PHP applications, with Selenium WebDriver and Lettuce :) You can checkout the application code at Github: http://github.com/franciscosouza/php_app_test.
30 Jul 2010 1:08am GMT
Francisco Souza: Seleniumless Django applications: using the test client
Some people waste a lot of time testing Django applications only with Selenium. No, I don't think that you should not use Selenium to test Django applications, I just wanna show that nobody needs to use Selenium for all webtests in Django.
Selenium drives a browser, so you can have true test results on navigation, but Selenium is not needed to test a page title or a response status code, for example. Selenium tests are powerful, but slow, so you can preserve them to use when you really need, and use more simple and fast tools to more simple and fast tests.
Django has a powerful tool for web tests: the test client. Using this module, you can make requests and test the response (its content, status code, URL and everything inside a response).
Imagine that you have a Django application that manage all books of the world, and you want to test if the title of the /books page is "All the books of the world", you can use Selenium, but IMHO, you should not use Selenium.
Suppose that there is the following Lettuce feature file:
Feature: List books
In order to know which books are in the system
I want to see the list of books
Scenario: Listing all books in the world
When I go to the /books URL
Then the page title should be "All the books of the world"
If we use Selenium to define the scenario "Listing all books in the world", we should have a terrain.py file with the following content:
from lettuce import before, after, world
from selenium import get_driver, FIREFOX
@before.all
def setup_browser():
world.browser = get_driver(FIREFOX)
@after.all
def teardown_browser(total):
world.browser.quit()
And define the steps definition with the following code:
# -*- coding: utf-8 -*-
from lettuce import step, world
from lettuce.django import django_url
from nose.tools import assert_equals
@step(u'When I go to the /books URL')
def when_i_go_to_the_books_url(step):
world.browser.get(django_url('books'))
@step(u'Then the page title should be "(.*)"')
def then_the_page_title_should_be_group1(step, page_title):
assert_equals(world.browser.get_title(), page_title)
There is nothing difficult here, the steps definition code is very simple and the test itself too, but how many time Lettuce spends running this test? Let's check:
$ time ./manage.py harvest
Django's builtin server is running at 0.0.0.0:8000
Setting up a test database...OK
...
1 feature (1 passed)
1 scenario (1 passed)
2 steps (2 passed)
real 0m7.172s
user 0m0.740s
sys 0m0.270s
About 7 seconds to run a really really really simple and small test. We can earn time testing Django applications using the Django test client. Let's rewrite the test above using Django test client, we just need to change the terrain.py:
from lettuce import before, world
from django.test import client
@before.all
def setup_browser():
world.browser = client.Client()
And the step definitions:
# -*- coding: utf-8 -*-
from lettuce import step, world
from nose.tools import assert_equals
from lxml import html
@step(u'When I go to the /books URL')
def when_i_go_to_the_books_url(step):
world.response = world.browser.get('/books/')
@step(u'Then the page title should be "(.*)"')
def then_the_page_title_should_be_group1(step, page_title):
dom = html.fromstring(world.response.content)
title = dom.xpath('//head/title')
title = title.pop()
assert_equals(title.text, page_title)
The only notice here is that is we have a little much code to test the title of the page. How many time Lettuce spends testing it?
$ time ./manage.py harvest
Django's builtin server is running at 0.0.0.0:8000
Setting up a test database...OK
...
1 feature (1 passed)
1 scenario (1 passed)
2 steps (2 passed)
real 0m3.288s
user 0m0.700s
sys 0m0.130s
About 3 seconds :) We didn't need to open Firefox for visit the page and get it's title, we just use the Django test client that knows how to deal with Django requests, and lxml to deal with the response content. Think about how many seconds you can earn just stopping to open a browser for the simplest tests of the world =P
There is a lot of other stuffs that you can do using the Django Test Client, and I plan to write more about it, but now, just check the official docs and have fun.
P.S.: Another great tool for Django testing is Django sane testing.
Happy testing :)
30 Jul 2010 1:07am GMT
Francisco Souza: Seleniumless Django applications: using the test client
Some people waste a lot of time testing Django applications only with Selenium. No, I don't think that you should not use Selenium to test Django applications, I just wanna show that nobody needs to use Selenium for all webtests in Django.
Selenium drives a browser, so you can have true test results on navigation, but Selenium is not needed to test a page title or a response status code, for example. Selenium tests are powerful, but slow, so you can preserve them to use when you really need, and use more simple and fast tools to more simple and fast tests.
Django has a powerful tool for web tests: the test client. Using this module, you can make requests and test the response (its content, status code, URL and everything inside a response).
Imagine that you have a Django application that manage all books of the world, and you want to test if the title of the /books page is "All the books of the world", you can use Selenium, but IMHO, you should not use Selenium.
Suppose that there is the following Lettuce feature file:
Feature: List books
In order to know which books are in the system
I want to see the list of books
Scenario: Listing all books in the world
When I go to the /books URL
Then the page title should be "All the books of the world"
If we use Selenium to define the scenario "Listing all books in the world", we should have a terrain.py file with the following content:
from lettuce import before, after, world
from selenium import get_driver, FIREFOX
@before.all
def setup_browser():
world.browser = get_driver(FIREFOX)
@after.all
def teardown_browser(total):
world.browser.quit()
And define the steps definition with the following code:
# -*- coding: utf-8 -*-
from lettuce import step, world
from lettuce.django import django_url
from nose.tools import assert_equals
@step(u'When I go to the /books URL')
def when_i_go_to_the_books_url(step):
world.browser.get(django_url('books'))
@step(u'Then the page title should be "(.*)"')
def then_the_page_title_should_be_group1(step, page_title):
assert_equals(world.browser.get_title(), page_title)
There is nothing difficult here, the steps definition code is very simple and the test itself too, but how many time Lettuce spends running this test? Let's check:
$ time ./manage.py harvest
Django's builtin server is running at 0.0.0.0:8000
Setting up a test database...OK
...
1 feature (1 passed)
1 scenario (1 passed)
2 steps (2 passed)
real 0m7.172s
user 0m0.740s
sys 0m0.270s
About 7 seconds to run a really really really simple and small test. We can earn time testing Django applications using the Django test client. Let's rewrite the test above using Django test client, we just need to change the terrain.py:
from lettuce import before, world
from django.test import client
@before.all
def setup_browser():
world.browser = client.Client()
And the step definitions:
# -*- coding: utf-8 -*-
from lettuce import step, world
from nose.tools import assert_equals
from lxml import html
@step(u'When I go to the /books URL')
def when_i_go_to_the_books_url(step):
world.response = world.browser.get('/books/')
@step(u'Then the page title should be "(.*)"')
def then_the_page_title_should_be_group1(step, page_title):
dom = html.fromstring(world.response.content)
title = dom.xpath('//head/title')
title = title.pop()
assert_equals(title.text, page_title)
The only notice here is that is we have a little much code to test the title of the page. How many time Lettuce spends testing it?
$ time ./manage.py harvest
Django's builtin server is running at 0.0.0.0:8000
Setting up a test database...OK
...
1 feature (1 passed)
1 scenario (1 passed)
2 steps (2 passed)
real 0m3.288s
user 0m0.700s
sys 0m0.130s
About 3 seconds :) We didn't need to open Firefox for visit the page and get it's title, we just use the Django test client that knows how to deal with Django requests, and lxml to deal with the response content. Think about how many seconds you can earn just stopping to open a browser for the simplest tests of the world =P
There is a lot of other stuffs that you can do using the Django Test Client, and I plan to write more about it, but now, just check the official docs and have fun.
P.S.: Another great tool for Django testing is Django sane testing.
Happy testing :)
30 Jul 2010 1:07am GMT
29 Jul 2010
Planet Python
Jared Forsyth: CodeTalker by example: test!
This post looks better on my blog
This is the last in a four-part series, in which I demonstrate how to build a CSS parser using CodeTalker:
To get the code for this:
git clone git://github.com/jabapyth/css.git
Test!
No code is complete (and some would say it's broken) without a good amount of testing, and CodeTalker provides a simple way to test your rules individually, to ensure the best coverage.
the tests for our CSS grammar are contained in tests/grammar.py. To run them (ensure you have py.test installed, then) py.test tests/grammar.py. Or just ./setup.py test to run all of them.
The codetalker/testing.py currently only has one function, but it will setup valitation tests for rule parsing.
Here it is in action:
import css.grammar as grammar from codetalker import testing parse_rule = testing.parse_rule(__name__, grammar.grammar) parse_rule(grammar.class_, ( '.one', '.green', '.GReEn', '.div', ), ( 'one', )) parse_rule(grammar.simple_selector, ( # should pass 'div', 'div#some', 'div#one.green', 'div.frown', 'ul.cheese:first-child', 'li.one.two.three', 'a#b.c.d:last-child', ), ( # should fail 'one', 'div# and', 'div. one', ))
When you do that, parse_rule sets up a verification test for each string.
To get your very own factory function, call parse_rule = testing.parse_rule(__name__, the_grammar) (it needs your module's __name__ in order to setup the functions in your global namespace, where py.test can recognize and run them.
Then for each rule you want to test, call parse_rule(the_rule, passing_strings, failing_strings.
This method of incremental testing is really a boon while you are initially creating your grammar - this way, you don't have to complete the grammar before you test parts of it out.
Now, in addition to just grammar testing, you should have some testing of your translation as well, but the nature of those tests depends completely on your specific project.
29 Jul 2010 11:40pm GMT
Jared Forsyth: CodeTalker by example: test!
This post looks better on my blog
This is the last in a four-part series, in which I demonstrate how to build a CSS parser using CodeTalker:
To get the code for this:
git clone git://github.com/jabapyth/css.git
Test!
No code is complete (and some would say it's broken) without a good amount of testing, and CodeTalker provides a simple way to test your rules individually, to ensure the best coverage.
the tests for our CSS grammar are contained in tests/grammar.py. To run them (ensure you have py.test installed, then) py.test tests/grammar.py. Or just ./setup.py test to run all of them.
The codetalker/testing.py currently only has one function, but it will setup valitation tests for rule parsing.
Here it is in action:
import css.grammar as grammar from codetalker import testing parse_rule = testing.parse_rule(__name__, grammar.grammar) parse_rule(grammar.class_, ( '.one', '.green', '.GReEn', '.div', ), ( 'one', )) parse_rule(grammar.simple_selector, ( # should pass 'div', 'div#some', 'div#one.green', 'div.frown', 'ul.cheese:first-child', 'li.one.two.three', 'a#b.c.d:last-child', ), ( # should fail 'one', 'div# and', 'div. one', ))
When you do that, parse_rule sets up a verification test for each string.
To get your very own factory function, call parse_rule = testing.parse_rule(__name__, the_grammar) (it needs your module's __name__ in order to setup the functions in your global namespace, where py.test can recognize and run them.
Then for each rule you want to test, call parse_rule(the_rule, passing_strings, failing_strings.
This method of incremental testing is really a boon while you are initially creating your grammar - this way, you don't have to complete the grammar before you test parts of it out.
Now, in addition to just grammar testing, you should have some testing of your translation as well, but the nature of those tests depends completely on your specific project.
29 Jul 2010 11:40pm GMT
Shannon -jj Behrens: Software Engineering: Premature Featurization
"Premature Featurization" is my term for the tendency to implement a plethora of features before they've even been requested by real users just in case a user might want them.
It's related to creeping featurism. Like premature optimization, it leads to difficult-to-maintain code bloat.
Product managers and managers in general are especially prone to premature featurization. When experienced developers themselves fall prey to premature featurization, they may be suffering from second system effect.
To some degree, the opposite of premature featurization is creeping elegance, which is where an existing piece of code is polished excessively at the expense of other factors such as the schedule or real world features.
It should be pointed out that agile software development aims to prevent premature featurization in that it forces coders to code only those features that the stake holders have requested as they request them. However, it tends to be an enabler for premature featurization in that it allows stake holders to successfully request an unlimited number of features since the programmer doesn't have to design a complete, minimalistic, cohesive design ahead of time.
Whereas time management is occasionally used by project managers to organize a developer's time as he attempts to implement a premature list of features, the goal of task management is to constrain the list of features requested to the minimal set possible.
29 Jul 2010 11:15pm GMT
Shannon -jj Behrens: Software Engineering: Premature Featurization
"Premature Featurization" is my term for the tendency to implement a plethora of features before they've even been requested by real users just in case a user might want them.
It's related to creeping featurism. Like premature optimization, it leads to difficult-to-maintain code bloat.
Product managers and managers in general are especially prone to premature featurization. When experienced developers themselves fall prey to premature featurization, they may be suffering from second system effect.
To some degree, the opposite of premature featurization is creeping elegance, which is where an existing piece of code is polished excessively at the expense of other factors such as the schedule or real world features.
It should be pointed out that agile software development aims to prevent premature featurization in that it forces coders to code only those features that the stake holders have requested as they request them. However, it tends to be an enabler for premature featurization in that it allows stake holders to successfully request an unlimited number of features since the programmer doesn't have to design a complete, minimalistic, cohesive design ahead of time.
Whereas time management is occasionally used by project managers to organize a developer's time as he attempts to implement a premature list of features, the goal of task management is to constrain the list of features requested to the minimal set possible.
29 Jul 2010 11:15pm GMT
Jared Forsyth: CodeTalker by example: translate
This post looks better on my blog
This is the third in a four-part series, in which I demonstrate how to build a CSS parser using CodeTalker:
To get the code for this:
git clone git://github.com/jabapyth/css.git
Translation
[for the impatient, here's the final code ].
Like everything else in CodeTalker, translation is mean to be straightforward; you have an abstract syntax tree, which is cool, but what you really want is...in this case, an object which represents a stylesheet. So you have a StyleSheet class and a RuleSet class, but how do you go from AST to that?
In general, you'll create one translation rule for each grammar rule. Here's the code for our stylesheet rule to refresh your memory.
def style_sheet(rule): rule | ([charset], star(import_), star(section)) rule.astAttrs = { 'charset': charset, 'imports': [import_], 'sections': [section], }
And here's what it takes to translate it:
@t.translates(ast.StyleSheet) def stylesheet(node): ss = StyleSheet() if node.charset is not None: ss.charset = t.translate(node.charset) ss.imports = list((imp.source, imp.media) for imp in node.imports) ss.rules = [] for section in node.sections: if isinstance(section, ast.Ruleset): # right now I'm not dealing # with other kinds of sections (media, page, font_face) rule = t.translate(section) if rule is not None: ss.rules.append(rule) return ss
Create the object, populate it with the AST node, and then return it. And register the function w/ the translator object.
29 Jul 2010 7:40pm GMT
Jared Forsyth: CodeTalker by example: translate
This post looks better on my blog
This is the third in a four-part series, in which I demonstrate how to build a CSS parser using CodeTalker:
To get the code for this:
git clone git://github.com/jabapyth/css.git
Translation
[for the impatient, here's the final code ].
Like everything else in CodeTalker, translation is mean to be straightforward; you have an abstract syntax tree, which is cool, but what you really want is...in this case, an object which represents a stylesheet. So you have a StyleSheet class and a RuleSet class, but how do you go from AST to that?
In general, you'll create one translation rule for each grammar rule. Here's the code for our stylesheet rule to refresh your memory.
def style_sheet(rule): rule | ([charset], star(import_), star(section)) rule.astAttrs = { 'charset': charset, 'imports': [import_], 'sections': [section], }
And here's what it takes to translate it:
@t.translates(ast.StyleSheet) def stylesheet(node): ss = StyleSheet() if node.charset is not None: ss.charset = t.translate(node.charset) ss.imports = list((imp.source, imp.media) for imp in node.imports) ss.rules = [] for section in node.sections: if isinstance(section, ast.Ruleset): # right now I'm not dealing # with other kinds of sections (media, page, font_face) rule = t.translate(section) if rule is not None: ss.rules.append(rule) return ss
Create the object, populate it with the AST node, and then return it. And register the function w/ the translator object.
29 Jul 2010 7:40pm GMT
Randell Benavidez: How to find Django version
There are cases when you would want to know the version of Django that is being used in your Python installation. To do that, go to your Python console and execute the following commands:
>>> import django >>> django.VERSION
In my machine, it showed the following:
(1, 2, 1, 'final', 0)
Related posts:
- Install SQLite Database Browser on Fedora
- Using GAE Testbed with GAEUnit: Testing that email was sent
- Install IDLE on Fedora 11
- Convert CHM to PDF in (Fedora) Linux
- Install setuptools in Fedora 12
29 Jul 2010 5:12pm GMT
Randell Benavidez: How to find Django version
There are cases when you would want to know the version of Django that is being used in your Python installation. To do that, go to your Python console and execute the following commands:
>>> import django >>> django.VERSION
In my machine, it showed the following:
(1, 2, 1, 'final', 0)
Related posts:
- Install SQLite Database Browser on Fedora
- Using GAE Testbed with GAEUnit: Testing that email was sent
- Install IDLE on Fedora 11
- Convert CHM to PDF in (Fedora) Linux
- Install setuptools in Fedora 12
29 Jul 2010 5:12pm GMT
Bryce Verdier: Project Euler: Problem 5
It's that time again. ;)
Question five reads:
2520 is the smallest number that can be divided by each of the numbers from 1 to 10 without any remainder. What is the smallest positive number that is evenly divisible by all of the numbers from 1 to 20?
And here are my solutions (which I hope are correct this time).
Haskell:
-
module Main where
-
-
import Data.List
-
-
then Nothing
-
else Just(x,x - 1)) 20)
-
-
-- solved this by the help on this URL:
-
-- (http)basildoncoder.com/blog/2008/06/10/project-euler-problem-5/
-
-- by increaing the loop from 2 to 2520, problem solves in seconds
As a quick note, yes I know that I could do this much quicker and cleaner with a list comprehension. I decided to use unfoldr because I wanted the experience of working with it. If it wasn't for this little desire my answer would have looked a lot more like my Python answer.
Python:
-
#!/usr/bin/python
-
-
def div_11_to_20(divided):
-
return all([not bool(divided % x) for x in xrange(11,20+1)])
-
-
if __name__ == "__main__":
-
count = 2520
-
while div_11_to_20(count) == False:
-
count += 2520
-
-
print "%s" % count
And finally my Perl solution:
-
#!/usr/bin/perl
-
-
sub divide_11_to_20
-
{
-
my ( $divided ) = @_;
-
-
foreach (11..20)
-
{
-
}
-
-
}
-
-
my $main_count = 2520;
-
while ( !divide_11_to_20($main_count) )
-
{
-
$main_count += 2520;
-
}
-
Run Times:
Haskell: 0m1.302s
Python: 0m0.769s
Perl: 0m0.223s
Observations:
It doesn't surprise me that the Perl solution ends up being the fastest on these run times. I say this because per iteration the Perl solutions has less calculations. Both the Haskell and Python solutions perform 10 divisions per iteration, where as the Perl solution only performs 10 divisions when the correct number is is being divided. Its one of those things where the difference is very small, but will become larger as the number of iterations increases.
29 Jul 2010 5:06pm GMT
Bryce Verdier: Project Euler: Problem 5
It's that time again. ;)
Question five reads:
2520 is the smallest number that can be divided by each of the numbers from 1 to 10 without any remainder. What is the smallest positive number that is evenly divisible by all of the numbers from 1 to 20?
And here are my solutions (which I hope are correct this time).
Haskell:
-
module Main where
-
-
import Data.List
-
-
then Nothing
-
else Just(x,x - 1)) 20)
-
-
-- solved this by the help on this URL:
-
-- (http)basildoncoder.com/blog/2008/06/10/project-euler-problem-5/
-
-- by increaing the loop from 2 to 2520, problem solves in seconds
As a quick note, yes I know that I could do this much quicker and cleaner with a list comprehension. I decided to use unfoldr because I wanted the experience of working with it. If it wasn't for this little desire my answer would have looked a lot more like my Python answer.
Python:
-
#!/usr/bin/python
-
-
def div_11_to_20(divided):
-
return all([not bool(divided % x) for x in xrange(11,20+1)])
-
-
if __name__ == "__main__":
-
count = 2520
-
while div_11_to_20(count) == False:
-
count += 2520
-
-
print "%s" % count
And finally my Perl solution:
-
#!/usr/bin/perl
-
-
sub divide_11_to_20
-
{
-
my ( $divided ) = @_;
-
-
foreach (11..20)
-
{
-
}
-
-
}
-
-
my $main_count = 2520;
-
while ( !divide_11_to_20($main_count) )
-
{
-
$main_count += 2520;
-
}
-
Run Times:
Haskell: 0m1.302s
Python: 0m0.769s
Perl: 0m0.223s
Observations:
It doesn't surprise me that the Perl solution ends up being the fastest on these run times. I say this because per iteration the Perl solutions has less calculations. Both the Haskell and Python solutions perform 10 divisions per iteration, where as the Perl solution only performs 10 divisions when the correct number is is being divided. Its one of those things where the difference is very small, but will become larger as the number of iterations increases.
29 Jul 2010 5:06pm GMT
24 Jul 2010
Planet Python/SoC - 2009 edition
Aaron Meurer: crazy function
In Part 1 of this series of blog posts, I gave what I believed to be the prerequisites to understanding the mathematics behind the Risch Algorithm (aside from a basic understanding of derivatives and integrals from calculus). In this post, I will elaborate on what is meant by "elementary function," a term that is thrown around a lot when talking about Risch integration.
The usual definition of elementary function given in calculus is any function that is a constant, a polynomial, an exponential (,
), a logarithm (
,
), one of the standard trig functions or their inverses (sin, cos, tan, arcsin, arccos, arctan, etc.), and any combination of these functions via addition, subtraction, multiplication, division, taking powers, and composition. Thus, even a function as crazy as
is elementary, by this definition.
But for the rigorous definition of an elementary function, we must take into consideration what field we are working over. Before I get into that, I need some definitions. Suppose that is the field we are working over. You can imagine that
, the field of rational functions in x with rational number coefficients. As with the previous post, imagine
as a function, for example,
. Let
be a differential extension of
. We have not defined this, but it basically means that our derivation
works the same in
as it does in
. You can imagine here that
.
We say that is a primitive over
if
. In other words, the derivative of
is does not contain
, only elements of
. Obviously, by the definition of a derivation (see the last post in the series), any element of
is a primitive over
, because the derivative of any element of a field is again an element of that field (you can see this by the definition of a derivation, also given in the last post). But also if
for some
, then
is a primitive over
, because
.
We say that is a hyperexponential over
if
. Written another way,
for some
. We know from calculus that the functions that satisfy differential equations of the type
are exactly the exponential functions, i.e.,
.
The last class of functions that needs to be considered is algebraic functions. I will not go into depth on algebraic functions, because my work this summer is only on integrating purely transcendental functions. Therefore, the only concern we shall have with algebraic functions in relation to the integration algorithm is to make sure that whatever function we are integrating is not algebraic, because the transcendental algorithms will not be valid if they are. Hopefully in a future post I will be able to discuss the Risch Structure Theorems, which give necessary and sufficient conditions for determing if a Liouvillian function (see next paragraph) is algebraic.
Now, we say that a function is Liouvillian over
if
is algebraic, a primitive, or a hyperexponential over
. For
to be a Liouvillian monomial over
, we have the additional condition that
. This just means that we cannot consider something like
over
as a Liouvillian monomial. Otherwise (I believe) we could run into undecidability problems.
We call a logarithm over
if
for some
, i.e.,
. We call
an exponential over
if
(or
) for some
, i.e.,
. Note the difference between an exponential monomial and a hyperexponential monomial.
We can finally give the rigorous definition of an elementary extension. is an elementary extension of
if there are
such that
and
is elementary over
for all
. An elementary function is any element of an elementary extension of
with the derivation
. A function
has an elementary integral over
if there exists an elementary extension
of
and
such that
, i.e.,
.
Usually, we start with , the field of rational functions in x with rational number coefficients. We then build up an elementary extension one function at a time, with each function either being a logarithm or exponential of what we have already built up, or algebraic over it. As I noted above, we will ignore algebraic functions here. We generally start with
because it is computable (important problems such as the zero equivalence problem or the problem of determining certain field isomorphisms are decidable), but the above definition lets us start with any subfield of
.
Now you may be wondering: we've covered algebraic functions, exponentials and logarithms, and obviously rational functions are elements of , but what about trigonometric functions? Well, from a theoretical stand point, we can make our lives easier by noticing that all the common trigonometric functions can be represented as exponentials and logarithms over
. For example,
. You can see here that all the common trig functions can be represented as complex exponentials or logarithms like this. However, from an algorithmic standpoint, we don't want do convert all trig expressions into complex exponentials and logarithms in order to integrate them. For one thing, our final result will be in terms of complex exponentials and logarithms, not the original functions we started with, and converting them back may or may not be an easy thing to do. Also, aside from the fact that we have different functions than we were expecting, we also will end up with an answer containing
, even if our original integrand did not.
Fortunately, the integrating tangents directly is a solved problem, just like integrating algebraic, exponential, or logarithmic functions is solved. We can't integrate functions like or
directly as monomials like we can with
or
, because the derivatives of sin and cos are not polynomials in their respective selves with coefficients in
. However, we can use a trick or two to integrate them. One way is to rewrite
and proceed to integrate it as a tangent. Another alternative is to write
. This function is algebraic over
, but if we do not already have
in our differential extension, it is transcendental, and we can rewrite it as
(this is used in Bronstein's text, so I believe what I just said is correct, though I haven't verified it with the structure theorems just yet). These both work using the relevant identities for sin too. Of course, there is still the problem of rewriting the final integrand back in terms of sin or cos. Otherwise, you will get something like
instead of
for
. Bronstein doesn't elaborate on this too much in his book, so it is something that I will have to figure out on my own.
The second option I gave above leads nicely into the main point I wanted to make here about elementary functions. Notice that everywhere in the definitions above, things depend on the field we are working in. Therefore, cannot be an elementary extension over
, but it can be over
. Also, the error function, defined as
cannot be an elementary extension over
, but it can over
. In fact this is how we can integrate in terms of some special functions, including the error function: by manually adding
(or whatever) to our differential extension. Therefore, the usual definition of an elementary anti-derivaitve and the above Risch Algorithm definition of an elementary integral coincide only when the extension consists only of elementary functions of the form of the usual definition (note that above, our final fields are
and
, respectively).
Originally, I was also going to talk about Liouville's Theorem in this blog post, but I think it has already gotten long enough (read "I'm getting tired"), so I'll put that off until next time.
24 Jul 2010 3:32am GMT
17 Jul 2010
Planet Python/SoC - 2009 edition
Kurt Smith: freemalloc
Fwrap sucessfully wraps upwards of 97% of the more than 1500 routines in netlib's LAPACK. Oh yeah. Granted this is pretty much all F77 code, but it works.
The remaining 3% are related to a trivial feature, namely dimension(0:N) style array declarations that will be supported very soon.
Much work has gone into the front end recently. The commandline options are taking their final shape and I'll be getting some improved documentation in the README tomorrow. Feel free to take it out for a spin.
To try out fwrap from the mercurial repo, see http://fwrap.sourceforge.net/ for instructions and dependencies (numpy, cython & a "modern-enough" Fortran 90 compiler). Let me know any problems you may have.
Enjoy!
17 Jul 2010 5:28am GMT
Aaron Meurer: asmeurer
After last week's breakthrough, work this week has been very slow. I started working on the Parametric Risch Differential Equation Problem, which is almost identical to the Risch Differential Equation Problem in how it is solved, except there are a few extra steps. Unfortunately, because it is so similar, Bronstein breezes through the description. This is fine for the parts that are the same, but he is a little unclear on how some of the new parts fit in. Also, his pseudocode has a line more or less saying
if r1 = … = rn = 0 then
N = -1
else
N = max(deg(r1), …, deg(rn))
for i from 0 to N
for j from 1 to m
Mij = coefficient(rj, t^i)
where M is a matrix. It is not very clear what this is supposed to mean in the case where N = -1. Obviously, you can't have a a matrix with negative dimensions. Clearly, this means that this particular function doesn't apply somehow in this case, but I am not really even sure where it fits in to the whole algorithm at this point in reading. After reading a few more pages in, it gives a few hints here and there on how it is to be used, but never is it explicitly shown, in pseudocode or otherwise. So for now, I think my best bet is to read ahead and get a fuller understanding of the complete function before I try implementing anything (this is what I had been doing before, but I caught up to myself).
Also, on an unrelated note, I just found out today that I passed my Google Summer of Code midterm evaluation. This means that I will receive half of my stipend for the program (the other half comes after passing the final evaluation at the end of the summer), and that I can continue working on my project in the program.
EDIT:
Later in the text, it runs through an example and says "… , hence M and A are 0 by 0 matrices." So obviously, that is what was meant.
17 Jul 2010 4:38am GMT
12 Jul 2010
Planet Python/SoC - 2009 edition
Kurt Smith: freemalloc
It's been a while. You're wondering what's been happening with fwrap.
Thanks to a sprint at SciPy 2010, fwrap is quickly gaining parity with f2py. Fwrap doesn't yet have interface files working (and it won't for the first release); neither does it support 'common' blocks. I've been focusing on F77-style function & subroutine functionality.
The main Fortran features currently supported:
- All intrinsic scalar datatypes, including with literal integer kind parameters
- Arrays of intrinsic types & literal kind params.
- All dimension declarations (assumed shape, assumed size, explicit shape)
- Runtime array dimension checking
- Automatic type discovery
Fwrap has a pretty good testsuite in place, both unittests & integration tests. They're run frequently to ensure no regressions are introduced.
Kyle Mandli helped improve fwrap's commandline interface during the fwrap sprint at SciPy 2010. Those changesets have yet to be merged with the main repo, but when they are it will be a *big* improvement, and move fwrap much closer to the 0.1 release. He also added logging functionality to fwrap and fparser - another big improvement.
I've been throwing some fairly big F77 (clawpack, lapack) and F90 (a UW-Madison inertial confinement simulation; thanks, Matt!) codebases at fparser to thresh out glaring bugs, and thus far I've been pleased. No major problems, although I'd very much like to figure out how to test the AST generated by fparser for correctness. Not sure how to do that yet.
Anyway, just a quick update. Stay tuned!
12 Jul 2010 5:22pm GMT
Aaron Meurer: asmeurer
So for the first time this summer, I missed my blogging deadline. I have been on vacation for the past few weeks, and have spent a good bit of the last week in the car, driving home. But that's not my excuse. I was on vacation the week before, when I wrote up my lengthy blog post on the Risch Algorithm. My excuse is that I wanted to finish up my integrate_hyperexponential() function before I posted, so I could write about it. Well, I finished it on Thursday (today is Sunday, the post was due Friday), but I ran into unexpected bugs (imagine that) that has postponed it actually working until now. I also ended up doing API changes 3 different times (they are basically incrementally all one change, from supporting only one extension to properly supporting multiple extensions. Look for long commits in my recent commit history in my branch if you are interested).
So here is the function. It integrates exponential functions. You still have to manually create the differential extension, as before. Here are some examples. You can try them in my integration2 branch (I have rebased over Mateusz's latest polys9update. The latest branch is always integrationn, where n is the largest integer available).
Hover over the code and click on the left-most, "view source" icon (a paper icon with < > over it) to view without breaks. Opens in a new window.
In [1]: from sympy.integrals.risch import *
In [2]: var('t1, t')
Out[2]: (t₁, t)
In [3]: r = exp(2*tan(x))*tan(x) + tan(x) + exp(tan(x))
In [4]: r
Out[4]:
2⋅tan(x) tan(x)
ℯ ⋅tan(x) + tan(x) + ℯ
In [5]: rd = r.diff(x)
In [6]: rd
Out[6]:
⎛ 2 ⎞ 2⋅tan(x) 2 ⎛ 2 ⎞ 2⋅tan(x) ⎛ 2 ⎞ tan(x)
1 + ⎝2 + 2⋅tan (x)⎠⋅ℯ ⋅tan(x) + tan (x) + ⎝1 + tan (x)⎠⋅ℯ + ⎝1 + tan (x)⎠⋅ℯ
In [7]: a, d = map(lambda i: Poly(i, t), rd.subs(tan(x), t1).subs(exp(t1), t).as_numer_denom()) # Manually create the extension
In [8]: a
Out[8]: Poly((1 + 2*t1 + t1**2 + 2*t1**3)*t**2 + (1 + t1**2)*t + 1 + t1**2, t, domain='ZZ[t1]')
In [9]: d
Out[9]: Poly(1, t, domain='ZZ')
In [10]: integrate_hyperexponential(a, d, [Poly(1, x), Poly(1 + t1**2, t1), Poly((1 + t1**2)*t, t)], [x, t1, t], [lambda x: exp(tan(x)), tan])
Out[10]:
⎛ ⌠ ⎞
⎜ 2⋅tan(x) ⎮ ⎛ 2 ⎞ tan(x) ⎟
⎜ℯ ⋅tan(x) + ⎮ ⎝1 + tan (x)⎠ dx + ℯ , True⎟
⎝ ⌡ ⎠
We have to manually build up the differential extension ([7]). The first element is , which is already there. Next, we add
, and finally
. The third argument of
integrate_hyperexponential() is what gives these variables their identities: their derivatives. The fourth argument is the list of the extension symbols, and the last argument is a list of the functions for which the symbols stand for, in reverse order (because we have to back substitute in the solution in reverse order).
The unevaluated Integral in the solution is due to the recursive nature of the Risch algorithm. Eventually, an outer function in the algorithm will recursively integrate until it reaches the ground field, . It will also do the proper preparsing automatically as well. The second element of the solution,
True, indicates that the integral is elementary, and thus the given solution is the complete integral of the original integrand, which we can see ().
Another example:
In [1]: from sympy.integrals.risch import *
In [2]: var('t')
Out[2]: (t,)
In [3]: rd = exp(-x**2)
In [4]: rd
Out[4]:
2
-x
ℯ
In [5]: a, d = map(lambda i: Poly(i, t), rd.subs(exp(x**2), t).as_numer_denom())
In [6]: a
Out[6]: Poly(1, t, domain='ZZ')
In [7]: d
Out[7]: Poly(t, t, domain='ZZ')
In [8]: integrate_hyperexponential(a, d, [Poly(1, x), Poly(2*x*t, t)], [x, t], [lambda x: exp(x**2)])
Out[8]: (0, False)
Here the second argument of the solution is False, which indicates that the algorithm has proven that the integral of is not elementary! The first argument 0 indicates that actually it is the integral of
that is not elementary, i.e., the Risch algorithm will reduce an integrand into an integrated function part and non-elementary part. For example:
In [1]: from sympy.integrals.risch import *
In [2]: var('t1, t')
Out[2]: (t₁, t)
In [3]: rd = exp(x)/tan(x) + exp(x)/(1 + exp(x))
In [4]: rd
Out[4]:
x x
ℯ ℯ
────── + ──────
x tan(x)
1 + ℯ
In [5]: a, d = map(lambda i: Poly(i, t), rd.subs(exp(x), t).subs(tan(x), t1).as_numer_denom())
In [6]: a
Out[6]: Poly(t**2 + (1 + t1)*t, t, domain='ZZ[t1]')
In [7]: d
Out[7]: Poly(t1*t + t1, t, domain='ZZ[t1]')
In [8]: integrate_hyperexponential(a, d, [Poly(1, x), Poly(1 + t1**2, t1), Poly(t, t)], [x, t1, t], [exp, tan])
Out[8]:
⎛ ⎛ x⎞ ⎞
⎝log⎝1 + ℯ ⎠, False⎠
This indicates that the integral of is not elementary. That is one advantage that the new algorithm will have over the present one. Currently, the present algorithm just returns an unevaluated Integral for the above
rd, but the new one will be able to return . It will be able to do this even if rd were rewritten as
(notice that this is exactly what
.as_numer_denom() is doing anyway in [5], as you can see in [6] and [7]). Furthermore, it will have actually proven that the remaining is non-elementary. I plan on having some kind of marker in the pretty printed unevaluated
Integral to indicate this. Suggestions on what this should be are welcome.
Finally, the full algorithm appears to be faster (probably asymptotically faster) than the current implementation:
In [1]: from sympy.integrals.risch import *
In [2]: var('t1, t')
Out[2]: (t₁, t)
In [3]: rd = exp(x)*x**4
In [4]: a, d = map(lambda i: Poly(i, t), rd.subs(exp(x), t).as_numer_denom())
In [5]: integrate_hyperexponential(a, d, [Poly(1, x), Poly(t, t)], [x, t], [lambda x: exp(x)])
Out[5]:
⎛ x 4 x x 3 x 2 x ⎞
⎝24⋅ℯ + x ⋅ℯ - 24⋅x⋅ℯ - 4⋅x ⋅ℯ + 12⋅x ⋅ℯ , True⎠
In [6]: %timeit integrate_hyperexponential(a, d, [Poly(1, x), Poly(t, t)], [x, t], [exp])
10 loops, best of 3: 28 ms per loop
In [7]: integrate(rd, x)
Out[7]:
x 4 x x 3 x 2 x
24⋅ℯ + x ⋅ℯ - 24⋅x⋅ℯ - 4⋅x ⋅ℯ + 12⋅x ⋅ℯ
In [8]: %timeit integrate(rd, x)
1 loops, best of 3: 218 ms per loop
Of course, keep in mind that I am timing what will be an internal function against a full function. But if you increase the exponent on x, you find that there is no way the addition of preparsing time (which shouldn't be affected by such a change) will cause it to become as slow as the current integrate(). Like I said, I am pretty sure that it is asymptotic. For example:
In [1]: from sympy.integrals.risch import *
In [2]: var('t1, t')
Out[2]: (t₁, t)
In [3]: rd = exp(x)*x**10
In [4]: a, d = map(lambda i: Poly(i, t), rd.subs(exp(x), t).as_numer_denom())
In [5]: integrate_hyperexponential(a, d, [Poly(1, x), Poly(t, t)], [x, t], [lambda x: exp(x)])
Out[5]:
⎛ x 10 x x 3 x 5 x 7 x 9 x 8 x 6 x 4 x 2 x ⎞
⎝3628800⋅ℯ + x ⋅ℯ - 3628800⋅x⋅ℯ - 604800⋅x ⋅ℯ - 30240⋅x ⋅ℯ - 720⋅x ⋅ℯ - 10⋅x ⋅ℯ + 90⋅x ⋅ℯ + 5040⋅x ⋅ℯ + 151200⋅x ⋅ℯ + 1814400⋅x ⋅ℯ , True⎠
In [6]: %timeit integrate_hyperexponential(a, d, [Poly(1, x), Poly(t, t)], [x, t], [exp])
10 loops, best of 3: 42 ms per loop
In [7]: integrate(rd, x)
Out[7]:
x 10 x x 3 x 5 x 7 x 9 x 8 x 6 x 4 x 2 x
3628800⋅ℯ + x ⋅ℯ - 3628800⋅x⋅ℯ - 604800⋅x ⋅ℯ - 30240⋅x ⋅ℯ - 720⋅x ⋅ℯ - 10⋅x ⋅ℯ + 90⋅x ⋅ℯ + 5040⋅x ⋅ℯ + 151200⋅x ⋅ℯ + 1814400⋅x ⋅ℯ
In [8]: %timeit integrate(rd, x)
1 loops, best of 3: 2.78 s per loop
There is one thing I should mention. I haven't implemented all the cases in rischDE(), which is the subproblem for exponential functions (more on this in a future "The Risch Algorithm" post). So some integrals will fail with a NotImplementedError, indicating that there is a function that I still need to implement to solve the integral:
In [1]: from sympy.integrals.risch import *
In [2]: var('t1, t')
Out[2]: (t₁, t)
In [3]: rd = (exp(x) - x*exp(2*x)*tan(x))/tan(x)
In [4]: a, d = map(lambda i: Poly(i, t), rd.subs(exp(x), t).subs(tan(x), t1).as_numer_denom())
In [5]: a
Out[5]: Poly(-t1*x*t**2 + t, t, domain='ZZ[x,t1]')
In [6]: d
Out[6]: Poly(t1, t, domain='ZZ[t1]')
In [7]: integrate_hyperexponential(a, d, [Poly(1, x), Poly(1 + t1**2, t1), Poly(t, t)], [x, t1, t], [exp, tan])
---------------------------------------------------------------------------
...
NotImplementedError: The ability to solve the parametric logarithmic derivative problem is required to solve this RDE
So feel free to give this a try and let me know what you think. You will have to do the preparsing as I have done above, which means that you also have to be careful that any extension that you make is not the derivative or logarithmic derivative of an element of the field you have already built up. You also cannot use algebraic functions, as I mentioned before, including things like (functions like these are called the logarithmic derivatives of k(t)-radicals, which I will also discuss in a future "The Risch Algorithm" post). If you just use simple extensions like
t1 = tan(x);t=exp(x) like I have above, you won't need to worry about this. Each derivative Poly should be in the variable that it is the derivative of (e.g., start with Poly(1, x), then add Poly(1 + t1**2, t1), Poly(t2*(1 + t1**2), t2), etc.). Everything else should be a Poly in t, the last element of the extension. And in cause you didn't get it, the last extension must be an exponential function.
Also, I didn't have to do it in any of the above examples, but the first and second arguments to integrate_hyperexponential() must be canceled (a, d = a.cancel(d, include=True) will do this for you), or you will get a wrong result! I spent a good day of debugging until I figured this out. The existence of other bugs didn't help.
12 Jul 2010 6:22am GMT
30 Jun 2010
Planet Python/SoC - 2009 edition
Aaron Meurer: asmeurer
My work this week isn't very interesting, even insomuch as my work any week is interesting, so this week I have elected to start a series of blog posts about the Risch Algorithm in general. I will start out with the basics in this post.
Anyone who has taken Calculus knows a handful of heuristics to calculate integrals. u-substitution, partial fractions, integration by parts, trigonometric substitution, and table integration are a few of the more popular ones. These are general enough to work for most integrals that are encountered in problems in Physics, Engineering, and so on, as well as most of those generated by solving differential equations from the same fields. But these fall short in a couple of ways. First off, they are just heuristics. If they fail, it does not mean that no integral exists. This means that they are useless for proving that certain functions, such as do not have integrals, no matter how hard you try to find them. Second, they work for only relatively simple functions. For example, suppose you have a rational function in
and
. An example would be
. We are not interested in integrating this function, but rather in finding it back given its derivative,
. The only method I named above that would come even close to being applicable to this integrand is partial fractions. This requires multivariate partial fraction decomposition (with respect to
and
), and gives
, which brings us no closer to a solution.
The reason that I started with a function and then computed its derivative was to show how easy it is to come up with a very complicated function that has an elementary anti-derivative. Therefore, we see that the methods from calculus are not the ones to use if we want an integration algorithm that is complete. The Risch Integration Algorithm is based on a completely different approach. At its core lies Liouville's Theorem, which gives us the form of any elementary anti-derivative. (I wish to point out at this point that heuristics like this are still useful in a computer algebra system such as SymPy as fast preprocessors to the full integration algorithm).
The Risch Algorithm works by doing polynomial manipulations on the integrand, which is entirely deterministic (non-heuristic), and gives us the power of all the theorems of algebra, allowing us to actually prove that anti-derivatives cannot exist when they don't. To start off, we have to look at derivations. As I said, everything with the Risch Algorithm is looked at algebraically (as opposed to analytically). The first thing to look at is the derivative itself. We define a derivation as any function on a ring
that satisfies two properties:
1. (Sum Rule),
2. (Product Rule)
for any . Furthermore, define the set of constant elements as
. From just these two rules, you can prove all the rules from calculus such as the power rule and the quotient rule. Defining things algebraically lets us avoid analytic problems, such as discontinuities and the need to prove convergence all the time. Another problem from analysis is the multivalue nature of certain functions, namely the natural logarithm. We get around this by defining
as the unique function satisfying
, for
. From this definition we can prove the famous logarithmic identities
and
for logarithmic derivatives, again using only the two rules for a derivation given above. For example,
.
The above definition for the natural logarithm gives the first insight into how the integration algorithm works. We define transcendental functions in terms of their derivatives. So if , then
. We can define all of the trigonometric functions in terms of
and
if we use
, but we can also avoid this. For example, if
, then
because
.
We say that is a monomial over the field
with respect to a derivation
if it satisfies
1. is transcendental over
,
2. .
The first condition is necessary because the we are only going to deal with the trancenental version of the Risch Algorithm (the algebraic case is solved too, but the solution method is quite different, and I am not implementing it this summer). The second condition just says that the derivative of t is a polynomial in t and a rational function in x. The functions I mentioned above all satisfy these properties for . Theorems in analysis show that
,
, and
are all transcendental over
. This is actually the only use of analysis that we make in the integration algorithm. Also, we see that if
,
, and
, then
,
, and
, which are all polynomials in their respective
and rational functions in
. In the algorithm,
is actually a tower of monomial extensions of
, so
is a monomial over
. This allows us to work with functions like
. We can't make
directly because
is not a polynomial in
(it also contains
) . But if we let
be such that
, i.e.,
, then we can let
be such that
, i.e.,
. Remember that the
are all "functions" of x, but there is no need to write
as long as we remember that
, i.e.,
. This is another advantage of using algebraic over analytic methods; it allows us to reduce an integral down to a rational function in the "symbols"
and
. By convention, we make the first extension
such that
, i.e.,
. I will just call it
here instead of
, to avoid confusion.
This is the preparsing that I alluded to in an earlier post that I have not implemented yet. The reason that I haven't implemented it yet is not just because I haven't gotten around to it. We have to be careful when we build up the extension that each element is indeed transcendental over the already built-up field . For example, although it appears transcendental, the function
is really algebraic because it equals
. There are additional requirements, such that each extension is not the derivative of logarithmic derivative of an element of
(see also the example I gave in the previous post). This is the part that I was talking about in my previous post that is not written out as much as the other algorithms in Bronstein's book. So this is algorithmically solved, just like the rest of the Algorithm, but it is non-trivial and may end up being the hardest part of the algorithm for me to implement, just because it will probably require the most figuring out on my part.
So we can see that we can convert a transcendental integral, such as the one above, into a rational function in x and monomial extensions . For example, the above integrand would become
. We then perform certain polynomial manipulations on this integrand, using the fact that
and
. For the transcendental case of the Risch Algorithm, this is similar to the rational function integration that I outlined in this post, and has Liouville's Theorem at its core. This is where I will start off next time.
30 Jun 2010 3:43am GMT
26 Jun 2010
Planet Python/SoC - 2009 edition
Aaron Meurer: asmeurer
I've spend most of this week sitting in a car, so while I have been able to do some work, I haven't had much time to write up a blog post. So, to comply with Ondrej's rule, here is a quick update.
I have been working my way through Bronstein's book. I finished the outer algorithmic layer of the implantation. Basically, the algorithm does polynomials manipulation on the integrand. It first reduces the integrand into smaller integrals, until it gets to an integral where a subproblem must be solved to solve it. The subproblem that must be solved differs depending on the type of the integral. The first one that comes up in Bronstein's text is the Risch Differential Equation, which arises from the integration of exponential functions. (I will explain all of these thing in more detail in a future blog post). At this point, the algorithms begin to recursively depend on each other, requiring me to implement more and more algorithms at a time in order for each to work. To make things worse, a very fundamental set of algorithms are only described in the text, not given in pseudo-code, so I have had to figure those things out. These are algorithms to determine if a differential extension is a derivative or logarithmic derivative of elements that have already been extended. Again, I will explain better in a future post, but the idea is that you replace elements in an integrand with dummy variables, but each element has to be transcendental over the previous elements. So if you have , and you set
and
(
and
), then you cannot make
because
. The ability to determine if an element is a derivative or a logarithmic derivative of an element of the already build differential extension is important not only for building up the extension for the integrand (basically the preparsing), but also for solving some of the cases of the subproblems such as the Risch Differential Equation problem.
So I am still figuring out some of the details on that one. The description in the book is pretty good (this is probably the best written math textbook I have ever seen), but I still have had to figure out some of the mathematical details on paper (which is something I enjoy anyway, but it can be more stressful). Hopefully by the next time I can have some code that is working enough to actually demonstrate solving some complex integrals, (with manual preparsing), and even more excitingly, prove that some non-elementary integrals, such as the classic , are indeed so. And I also hope to have some more explanations on how the Risch algorithm works in future posts.
26 Jun 2010 3:16am GMT
22 Jun 2010
Planet Python/SoC - 2009 edition
Skipper Seabold: Statsmodels: GSoC Week 4 Update
I spent the last week finishing up the paper that I submitted to accompany my talk at the SciPy conference. I am really looking forward to going to Austin and hearing all the great talks (plus I hear the beer is cheap and the food and music are good, which doesn't hurt). In addition to finishing up the paper, I have started to clean up our time series code.
So far this has included finishing the augmented Dickey-Fuller (ADF) test for unit roots. The big time sink here is that the ADF test-statistic has a non-standard distribution in most cases. The ADF test statistic is obtained by running the following regression
One approach to testing for a unit root means testing the t-stat on the coefficient on the lagged level of y. The actual distribution for this statistic, however, is not Student's t. Many software packages use the tables in Fuller (1976, updated to 1996 version or not) in order to get the critical values for the test statistic depending on the sample size. They use linear interpolation for sample sizes not included in the table. The p-values for the obtained test statistic are usually obtained using MacKinnon's (1994) study that estimated regression surfaces of these distributions via Monte Carlo simulation.
While we do use MacKinnon's approximate p-values from the 1994 paper, MacKinnon wrote a note updating this paper in early 2010, which gives new regression surface results for obtaining the critical values. We use these new results for the critical values. Therefore, when using our ADF test, it is advised that if the p-value is close to the reject/accept region then the critical values should be used in place of the p-value to make the ultimate decision.
We can illlustrate the use of ADF. Note that this version is only in my branch and that it is still in the sandbox, even though it has now been tested, because the API and returned results may change. We will demonstrate on a series that we can easily guess is non-stationary, real GDP.
In [2]: from scikits.statsmodels.sandbox.tsa.stattools import adfuller
In [3]: data = sm.datasets.macrodata.load()
In [4]: realgdp = data.data['realgdp']
In [5]: adf = adfuller(realgdp, maxlag=4, autolag=None, regression="ct")
In [6]: adf
Out[6]:
(-1.8566384063254346,
0.67682917510440099,
4,
198,
{'1%': -4.0052351400496136,
'10%': -3.1402115863254525,
'5%': -3.4329000694218998})
The return values are the test statistic, its p-value (the null-hypothesis here is that the series does contain a unit root), the number of lags of the differences used, the number of observations for the regression, and a dictionary containing the critical values at the respective confidence levels. The regression option controls the type of regression (ie., whether to include a constant or a linear or quadratic time trend), and the autolag option has three options for choosing the lag length to help correct for serial correlation in the regression. There are 'AIC', 'BIC', and 't-stat'. The former two choose the lag length that maximizes the infofrmation criterion, the latter chooses the lag length based on the significance of the lag. This starts with maxlag and works its way down. The docstring has more detailed information.
Beyond this, I have been working on an autocorrelation function (acf), a partial autocorrelation function (pacf), and Q-Statistics (Box-Ljung test). Next up for this week is finishing my VAR class with identification schemes. After this, I will work to integrate post-estimation tests into our results classes, most likely using some sort of mix-in classes and attach test containers to the results objects for test results. Then it's off to the SciPy conference. There I will hopefully be participating in the stats sprint, helping out with the docs marathon and discussing what we need for the future of statistics and Python.
MacKinnon, J.G. 1994. "Approximate asymptotic distribution functions for
unit-root and cointegration tests. Journal of Business and Economic
Statistics 12, 167-76.
MacKinnon, J.G. 2010. "Critical Values for Cointegration Tests."
Queen's University, Dept of Economics, Working Papers. Available at
http://ideas.repec.org/p/qed/wpaper/1227.html
22 Jun 2010 11:42am GMT
16 Jun 2010
Planet Python/SoC - 2009 edition
Aaron Meurer: asmeurer
Every once in a while, seemingly really simple Python code does something completely unexpected for me. Look at the following snippet of Python code. This is run straight from the 2.6.5 interpreter, with no other commands executed. Do you notice anything strange?
$python Python 2.6.5 (r265:79359, Mar 24 2010, 01:32:55) [GCC 4.0.1 (Apple Inc. build 5493)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>>; l = lambda i: a[i] >>> l <function at="" 0x39e7f0=""> >>> H = [(1, 2), (3, 4)] >>> [l(0) + l(1) for a in H] [3, 7]
Did you spot it? Here is a hint. Running a different but similar session:
$python Python 2.6.5 (r265:79359, Mar 24 2010, 01:32:55) [GCC 4.0.1 (Apple Inc. build 5493)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> l = lambda i: a[i] >>> l <function at="" 0x39e7f0=""> >>> l(0) Traceback (most recent call last): File "", line 1, in File "", line 1, in NameError: global name 'a' is not defined
Do you see it now? I defined the lambda function l in terms of a without defining first defining a! And furthermore, it just works when a is defined. This is actually independent of the fact that we are working in a list comprehension, as this continuation of the previous session shows:
>>> a = [3, 4, 5] >>> l(0) 3
But I want to expand on the list comprehension example, because there even more bizzare things going on here. Restarting a new session again:
$python Python 2.6.5 (r265:79359, Mar 24 2010, 01:32:55) [GCC 4.0.1 (Apple Inc. build 5493)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> l = lambda i: a[i] >>> H = [(1, 2), (3, 4)] >>> [l(0) + l(1) for a in H] [3, 7] >>> (l(0) + l(1) for a in H) <generator object="" at="" 0x3a4350=""> >>> list((l(0) + l(1) for a in H)) [7, 7]
So, if you are astute and have been using Python for long enough, you should be able to catch what is going on here. If you don't know, here is a hint (continuation of previous session):
>>> a (3, 4)
So, as you may know, in Python 2.6 and earlier, list comprehension index variable "leek" into the local namespace. The strange thing here is that although the list comprehension would reset it, the generator version does not. Well, normally, it does do this:
>>> x = 1 >>> [x for x in range(10)] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] >>> x 9 >>> del x >>> list((x for x in range(10))) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] >>> x Traceback (most recent call last): File "", line 1, in NameError: name 'x' is not defined >>> x = 1 >>> list((x for x in range(10))) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] >>> x 1
So the above bit has something to do with the way the lambda function was defined with the a. By the way, here is what happens with the generator comprehension (is that what these are called?) if a is not defined:
>>> del a >>> list((l(0) + l(1) for a in H)) Traceback (most recent call last): File "", line 1, in File "", line 1, in File "", line 1, in NameError: global name 'a' is not defined
This is how I discovered this. I had defined a lambda function using an variable that was then passed to a list comprehension that used this variable as the index without realizing it. But then I tried converting this into a generator comprehension to see if it would be faster, and got the above error.
Finally, since the "feature" of leaking list comprehension loop variables into the local namespace is going away in Python 3, I expected things to behave at least a little differently in Python 3. I tried the above in a Python 3.1.2 interpreter and got the following:
$python3 Python 3.1.2 (r312:79147, Mar 23 2010, 22:02:05) [GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> l = lambda i: a[i] >>> l <function at="" 0x100585a68=""> >>> H = [(1, 2), (3, 4)] >>> [l(0) + l(1) for a in H] Traceback (most recent call last): File "", line 1, in File "", line 1, in File "", line 1, in NameError: global name 'a' is not defined >>> list((l(0) + l(1) for a in H)) Traceback (most recent call last): File "", line 1, in File "", line 1, in File "", line 1, in NameError: global name 'a' is not defined
So in Python 3, both the list comprehension and the generator comprehensions act the same, which is not too surprising. I guess I should recode that piece of code to make it future proof, although this doesn't seem easy at the moment, and it may require converting a one-liner into a six-liner. If you are interested, the piece of code is here.
So can anyone provide any insight into what is going on with that lambda function? Running it with the -3 switch to python2.6 didn't give any warnings related to it.
Update: As I noted in a comment, I figured out how to make this future-proof. I need to convert it from
def residue_reduce_derivation(H, D, x, t, z):
lambdafunc = lambda i: i*derivation(a[1], D, x, t).as_basic().subs(z, i)/ \
a[1].as_basic().subs(z, i)
return S(sum([RootSum(a[0].as_poly(z), lambdafunc) for a in H]))
to
def residue_reduce_derivation(H, D, x, t, z):
return S(sum((RootSum(a[0].as_poly(z), lambda i: i*derivation(a[1], D, x, t).as_basic().subs(z, i)/ \
a[1].as_basic().subs(z, i)) for a in H)))
Thanks to all the commenters for the explanations.
Also, you may have noticed that I discovered that if you use [code] instead of <code>, you get these nicer code blocks that actually respect indentation! Now I just need to figure out how to make them syntax highlight Python code.
Update 2: [code='py'] colors it! Sweet!
Update 3: I just discovered that SymPy has a Lambda() object that handles this better. In particular, it pretty prints the code, and is what is already being used for RootSum() in the rational function integrator, at least in Mateusz's polys9.
>>> integrate(1/(x**5 + 1), x) log(1 + x)/5 + RootSum(625*_t**4 + 125*_t**3 + 25*_t**2 + 5*_t + 1, Lambda(_t, _t*log(x + 5*_t)))
Still, this has been a very good learning experience.
16 Jun 2010 4:49am GMT
15 Jun 2010
Planet Python/SoC - 2009 edition
Aaron Meurer: asmeurer
These seem to be all the rave these days, so I figured, why not jump on the bandwagon:
Aaron-Meurer:doc aaronmeurer20100615153531(integration$)$git weekreport
Aaron Meurer (20):
Fix some bugs in Poly
Make Poly(sin(x)/x*t, t, domain='EX').clear_denoms() work
Fix integrate to work correctly with heurisch.py
Use more efficient gcdexdiophantine() algorithm
Add support for taking the derivation over the coefficient domain in risch.py
Add (but do not yet use) splitfactor_sqf() in risch.py
Add polynomial_reduce() to risch.py
Add tests for algorithms in risch.py in a new test_risch.py file
Only allow coercion to larger domains
Allow coercion from ZZ(a) to ZZ(a, b)
Fix doctest in new heurisch.py file
Add residue_reduce()
Formatting fixes in docstrings in sympy/polys/algebratools.py
Add includePRS option to resultant functions
Add permute method to DMP
Add a test for the includePRS option of resultant()
Have residue_reduce() make S_i monic
Rewrite polynomial_reduce() non-recursively
Add integrate_hypertangent_polynomial()
Add integrate_nonlinear_no_specials()
15 Jun 2010 9:45pm GMT
14 Jun 2010
Planet Python/SoC - 2009 edition
Skipper Seabold: Statsmodels: GSoC Week 3 Update
[Edit: Formatting should be fixed now. I will not be reformatting old posts though, so that they don't get reposted at Planet SciPy]
Last week was spent mainly ensuring that I pass my comps and remain a PhD student. This week was much more productive for coding. For now, all changes are in my branch and have not been merged to trunk, but I will describe the two big changes.
The first concerns the datasets package. This one is not all that exciting, but suffice it to say that the datasets are now streamlined and use the Bunch pattern to load the data. Thanks, Gaël, for pointing this out. I also rewrote a bit of David's datasets proposal from scikits-learn to reflect the current design of our datasets and thoughts. You can see it here (soon to be on the docs page). We are making an effort to ensure that our datasets are going to be similar to those of scikits-learn.
The second change was an improvement of the fitting of maximum likelihood models and the start of a GenericLikelihoodModel class. Maximum likelihood based models (mainly discrete choice models in the main code base right now) can now be fit using any of the unconstrained solvers from scipy.optimize (Nelder-Mead, BFGS, CG, Newton-CG, Powell) plus Newton-Raphson. To take a simple example to see how it works, we can fit a Probit model.
In [2]: data = sm.datasets.spector.load()
In [3]: data.exog = sm.add_constant(data.exog)
In [4]: res_newton = sm.Probit(data.endog, data.exog).fit(method="newton")
Optimization terminated successfully.
Current function value: 12.818804
Iterations 6
In [5]: res_nm = sm.Probit(data.endog, data.exog).fit(method="nm", maxiter=500)
Optimization terminated successfully.
Current function value: 12.818804
Iterations: 439
Function evaluations: 735
In [6]: res_bfgs = sm.Probit(data.endog, data.exog).fit(method="bfgs")
Optimization terminated successfully.
Current function value: 12.818804
Iterations: 15
Function evaluations: 21
Gradient evaluations: 21
In [7]: res_cg = sm.Probit(data.endog, data.exog).fit(method="cg", maxiter=250)
Optimization terminated successfully.
Current function value: 12.818804
Iterations: 188
Function evaluations: 428
Gradient evaluations: 428
In [8]: res_ncg = sm.Probit(data.endog, data.exog).fit(method="ncg", avextol=1e-8)
Optimization terminated successfully.
Current function value: 12.818804
Iterations: 12
Function evaluations: 14
Gradient evaluations: 12
Hessian evaluations: 12
In [9]: res_powell = sm.Probit(data.endog, data.exog).fit(method="powell", ftol=1e-8)
Optimization terminated successfully.
Current function value: 12.818804
Iterations: 12
Function evaluations: 568
All of the options for the solvers are available and are documented in the fit method. As you can see, some of the default values need to be changed to ensure (accurate) convergence. The Results objects that are returned have two new attributes.
In [10]: res_powell.mle_retvals
Out[10]:
{'converged': True,
'direc': array([[ 7.06629660e-02, -3.07499922e-03, 5.38418734e-01,
-4.19910465e-01],
[ 0.00000000e+00, 1.00000000e+00, 0.00000000e+00,
0.00000000e+00],
[ 1.49194876e+00, -6.64992809e-02, -6.96792443e-03,
-3.22306873e+00],
[ -5.36227277e-02, 1.18544093e-01, -8.75205765e-02,
-2.42149981e+00]]),
'fcalls': 568,
'fopt': 12.818804069990534,
'iterations': 12,
'warnflag': 0}
In [11]: res_powell.mle_settings
Out[11]:
{'callback': None,
'disp': 1,
'fargs': (),
'ftol': 1e-08,
'full_output': 1,
'maxfun': None,
'maxiter': 35,
'optimizer': 'powell',
'retall': 0,
'start_direc': None,
'start_params': [0, 0, 0, 0],
'xtol': 0.0001}
The dict mle_retvals contains all of the values that are returned from the solver if the full_output keyword is True. The dict mle_settings contains all of the arguments passed to the solver, including the defaults so that these can be checked after the fit. Again, all settings and returned values are documented in the fit method and in the results class, respectively.
Lastly, I started a GenericLikelihoodModel class. This is currently unfinished, though the basic idea is laid out. Take again the Probit example above using Lee Spector's educational program data. And assume we didn't have the Probit model from statsmodels. We could use the new GenericLikelihoodModel class. There are two ways (probably more) to proceed. For those comfortable with object oriented programming and inheritance in Python, we could subclass the GenericLikelihoodModel, defining our log-likelihood method.
from scipy import stats
import numpy as np
class MyModel(LLM):
def loglike(self, params):
"""
Probit log-likelihood
"""
q = 2*self.endog - 1
X = self.exog
return np.add.reduce(stats.norm.logcdf(q*np.dot(X,params)))
Now this model could be fit, using any of the methods that only require an objective function, i.e., Nelder-Mead or Powell.
In [44]: res = mod.fit(method="nm", maxiter=500)
Optimization terminated successfully.
Current function value: 12.818804
Iterations: 439
Function evaluations: 735
In [45]: res_nm.params
Out[45]: array([ 1.62580058, 0.05172931, 1.42632242, -7.45229725])
In [46]: res.params
Out[46]: array([ 1.62580058, 0.05172931, 1.42632242, -7.45229725])
The main drawback right now is that all statistics that rely on the covariance of the parameters, etc. will use numeric gradients and Hessians, which can lessen that accuracy of those statistics. This can be overcome by providing score and hessian methods as loglike was provided above. Of course, for more complicated likelihood functions this can soon become cumbersome. We are working towards more accurate numerical differentiation and discussing options for automatic or symbolic differentiation.
The main advantage as opposed to just writing your likelihood and passing it to a solver is that you have all of the (growing number of) statistics and tests available to statsmodels right in the generic model.
I would also like to accommodate those who are less familiar with OOP and inheritance in Python. I haven't quite worked out the final design for how this would go yet. Right now, you could do the following, though I don't think it quite meets the less complicated goal.
In [5]: import scikits.statsmodels as sm
In [6]: from scipy import stats
In [7]: import numpy as np
In [8]:
In [9]: data = sm.datasets.spector.load()
In [10]: data.exog = sm.add_constant(data.exog)
In [11]:
In [12]: def probitloglike(params, endog, exog):
....: """
....: Log likelihood for the probit
....: """
....: q = 2*endog - 1
....: X = exog
....: return np.add.reduce(stats.norm.logcdf(q*np.dot(X,params)))
....:
In [13]: mod = LLM(data.endog, data.exog, loglike=probitloglike)
In [14]: res = mod.fit(method="nm", fargs=(data.endog,data.exog), maxiter=500)
Optimization terminated successfully.
Current function value: 12.818804
Iterations: 439
Function evaluations: 735
In [15]: res.params
Out[15]: array([ 1.62580058, 0.05172931, 1.42632242, -7.45229725])
There are still a few design issues and bugs that need to be worked out with the last example, but the basic idea is there. That's all for now.
14 Jun 2010 12:54pm GMT
11 Jun 2010
Planet Python/SoC - 2009 edition
Aaron Meurer: gcd
So for this week's blog post I will try to explain how the general algorithm for integrating rational functions works. Recall that a rational function is the quotient of two polynomials. We know that using common denominators, we can convert the sum of any number of rational functions into a single quotient, . Also, using polynomial division we can rewrite any rational function as the sum of a polynomial and the quotient of two polynomials such that the degree of the numerator is less than the degree of the denominator (
, with
deg(g)" /> deg(g)" title="deg(r) deg(g)" class="latex" />). Furthermore, we know that the representation of a rational function is not unique. For example,
is the same as
except at the point
, and
is the same as
everywhere. But by using Euclid's algorithm for finding the GCD of polynomials on the numerator and the denominator, along with polynomial division on each, we can cancel all common factors to get a representation that is unique (assuming we expand all factors into one polynomial). Finally, using polynomial division with remainder, we can rewrite any rational function
as
, where
,
,
,
, and
are all polynomials, and the degree of
is less than the degree of
.
We know from calculus that the integral of any rational function consists of three parts: the polynomial part, the rational part, and the logarithmic part (consider arctangents as complex logarithms). The polynomial part is just the integral of above. The rational part is another rational function, and the logarithmic part, which is a sum of logarithms of the form
, where
is an algebraic constant and
is a polynomial (note that if
is a rational function, we can split it into two logarithms of polynomials using the log identities).
To find the rational part, we first need to know about square-free factorizations. An important result in algebra is that any polynomial with rational coefficients can be factored uniquely into irreducible polynomials with rational coefficients, up to multiplication of a non-zero constant and reordering of factors, similar to how any integer can be factored uniquely into primes up to multiplication of 1 and -1 and reordering of factors (technically, it is with coefficients from a unique factorization domain, for which the rationals is a special case, and up to multiplication of a unit, which for rationals is every non-zero constant). A polynomial is square-free if this unique factorization does not have any polynomials with powers greater than 1. Another theorem from algebra tells us that irreducible polynomials over the rationals do not have any repeated roots, and so given this, it is not hard to see that a polynomial being square-free is equivalent to it not having repeated roots.
A square-free factorization of a polynomial is a list of polynomials, , where each
is square-free (in other words,
is the product of all the factors of degree 1,
is the product of all the factors of degree 2, and so on). There is a relatively simple algorithm to compute the square-free factorization of a polynomial, which is based on the fact that
reduces the power of each irreducible factor by 1. For example:

(Sorry for the picture. WordPress code blocks do not work)
It is not too hard to prove this using the product rule on the factorization of P. So you can see that by computing , you can obtain
. Then, by recursively computing
,
,
, … and taking the quotient each time as above, we can find the square-free factors of P.
OK, so we know from partial fraction decompositions we learned in calculus that if we have a rational function of the form , where
is square-free, the integral will be a rational function if
1" /> 1" title="n > 1" class="latex" /> and a logarithm if
. We can use the partial fraction decomposition that is easy to find once we have the square-free factorization of the denominator to rewrite the remaining rational function as a sum of terms of the form
, where
is square-free. Because
is square-free,
, so the Extended Euclidean Algorithm gives us
and
such that
(recall that
is the gcd of
and
if and only if there exist
and
relatively prime to
such that
. This holds true for integers as well as polynomials). Thus we can find
and
such that
. Multiplying through by
,
, which is equal to
. You may notice that the term in the parenthesis is just the derivative of
, so we get
. This is called Hermite Reduction. We can recursively reduce the integral on the right hand side until the
. Note that there are more efficient ways of doing this that do not actually require us to compute the partial fraction decomposition, and there is also a linear version due to Mack (this one is quadratic), and an even more efficient algorithm called the Horowitz-Ostrogradsky Algorithm, that doesn't even require a square-free decomposition.
So when we have finished the Hermite Reduction, we are left with integrating rational functions with purely square-free denominators. We know from calculus that these will have logarithmic integrals, so this is the logarithmic part.
First, we need to look at resultants and PRSs. The resultant of two polynomials is defined as differences of the roots of the two polynomials, i.e., , where
and
are monic polynomials split into linear factors. Clearly, the resultant of two polynomials is 0 if and only if the two polynomials share a root. It is an important result that the resultant of two polynomials can be computed from only their coefficients by taking the determinant of the Sylvester Matrix of the two polynomials. However, it is more efficiently calculated using a polynomial remainder sequence (PRS) (sorry, there doesn't seem to be a Wikipedia article), which in addition to giving the resultant of A and B, also gives a sequence of polynomials with some useful properties that I will discuss below. A polynomial remainder sequence is a generalization of the Euclidian algorithm where in each step, the remainder
is multiplied by a constant
. The Fundamental PRS Theorem shows how to compute specific
such that the resultant can be calculated from the polynomials in the sequence.
Then, if we have , left over from the Hermite Reduction (so
square-free), let
, where
is a new variable, and
be the distinct roots of R. Let
. Then it turns out that the logarithmic part of the integral is just
. This is called the Rothstein-Trager Algorithm.
However, this requires finding the prime factorization of the resultant, which can be avoided if a more efficient algorithm called the Lazard-Rioboo-Trager Algorithm is used. I will talk a little bit about it. It works by using subresultant polynomial reminder sequences.
It turns out that the above will appear in the PRS of
and
. Furthermore, we can use the PRS to immediately find the resultant
, which as we saw, is all we need to compute the logarithmic part.
So that's rational integration. I hope I haven't bored you too much, and that this made at least a little sense. I also hope that it was all correct. Note that this entire algorithm has already been implemented in SymPy, so if you plug a rational function in to integrate(), you should get back a solution. However, I describe it here because the transcendental case of the Risch Algorithm is just a generalization of rational function integration.
As for work updates, I found that the Poly version of the heursitic Risch algorithm was considerably slower than the original version, due to inefficiencies in the way the polynomials are currently represented in SymPy. So I have put that aside, and I have started implementing algorithms from the full algorithm. There's not much to say on that front. It's tedious work. I copy the algorithm from Bronstein's book, then try make sure that it is correct based on the few examples given and from the mathematical background given, and when I'm satisfied, I move on to the next one. Follow my integration branch if you are interested.
In my next post, I'll try to define some terms, like "elementary function," and introduce a little differential algebra, so you can understand a little bit of the nature of the general integration algorithm.
11 Jun 2010 7:39pm GMT
06 Jun 2010
Planet Python/SoC - 2009 edition
Dale Peterson: Animation of the motion of the bicycle
06 Jun 2010 5:41am GMT
Dale Peterson: New VPS
My VPS host just upgraded their servers to Ubuntu 10.04, so I just spent today saving all my old stuff, wiping my VPS, and reinstalling everything. I was using nginx as my webserver previously, but now I'm using lighttpd, and it was pretty easy to setup. I'm still ironing out the bugs, but I was able to get most everything up in running in several hours, so I hope to have the rest dialed in very soon.
06 Jun 2010 1:49am GMT
04 Jun 2010
Planet Python/SoC - 2009 edition
Aaron Meurer: PuDB
So Christian Muise unwittingly just reminded me on IRC that I forgot to mention the main method that I used to learn how the heurisch function works in my last blog post. I usually only use a debugger when I have a really hard bug I need to figure out, when the print statements aren't enough. The reason for this is that the debugger that I had been using, winpdb, is, well, a pain to use. There are so many little bugs, at least in Mac OS X, that it is almost not worth while to use it unless I need to. For example, restarting a script from the debugger doesn't work. If I pass a point that I wanted to see, I have to completely close the winpdb window and restart it from the command line, which takes about half a minute. Also, winpdb uses it's own variant of pdb, which seems to cause more problems than it creates (like bugging me about sympy importing pdb somewhere every time I start debugging.)
But I really wanted to be able to step through the heurisch code to see exactly how it works, because many of the implementation details, such as gathering the components of and expression, will be similar if not exactly the same in the full algorithm. So I started my quest for a better debugger. For me, the ideal debugger is the C debugger in XCode. That debugger has saved me in most of my programming assignments in C. But it is only for C based languages (C, Objective-C, probably C++, …), not Python. So I did a Google search, and it turns out that there is a list of Python debuggers here. So I went through them, and I didn't have to go far. The very first one, pudb, turns out to be awesome!
You can watch this screencast to get a better idea of the features, or even better install it and check them out. The debugger runs in the console, not in some half hacked GUI (half hacked is what any non-Cocoa GUI looks like in Mac OS X). The only down side to this is that you have to use the keyboard to do everything, but it ends up not being too bad. And you can press '?' at any time to see the possible commands.
To install it, just do easy_install pudb. To run it, just create a script of what you want to debug, and do python -m pudb.run my-script.py and it just works! I have a line that says alias pudb='python -m pudb.run' in my .profile, which makes it even easier to run. If you want to set a break point in the code, you can either navigate there from within pudb by pressing 'm', or you add a line that says from pudb import set_trace; set_trace() to the code (if you add the line to your code, you don't even need to create a script. Just execute the code in IPython and when it hits that line, it will load the debugger.
Some cool features:
- IPython console. Just press '!' to go to a console, where you can manipulate variables from the executed namespace, and you can choose an IPython console.
- Very easy to navigate. You just need to know the keys 's', 'n', and 't'.
- View the code from elsewhere than what is being run. Pressing 'm' lets you view all imported modules. You can easily view points on the stack by choosing them.
- If an exception is thrown, it catches it! This may sound obvious for a debugger, but it is one of things that didn't work very well in winpdb. You can view the traceback of the exception, and choose to restart without having to close and reopen the debugger. Actually, it asks you if you want to restart every time the script finishes too, which is also a great improvement over winpdb.
This is what it looks like. Click for a bigger picture:
Some annoyances (in case Andreas Kloeckner reads this):
- The default display for variables is type, which is completely useless. I have to manually go through and change each to str so I can see what the variable is. Is there a way to change this default?
- It asks me every time if I want to use IPython. I always want to use IPython.
- This is might be a Mac OS X Terminal bug, but when I execute a statement that takes a while to run, it doesn't redraw the pudb window until it finishes. This means that stepping through a program "flashes" black from what is above pudb in the window, and if I run a statement that takes forever, I loose the ability to see where it is unless I keyboard interrupt. Fortunately, it catches keyboard interrupts, so I can still see the traceback.
- There is no way to resize the variables window, or to scroll sideways in it. If I want to see what a long variable expression is, I have to go to the IPython console and type it there.
Some of these might be fixable and I just don't know it yet. But even with them, this is still an order of magnitude improvement over winpdb. Now I can actually use the debugger all the time in my coding, instead of just when I have a really tough bug and no other choice.
UPDATE:
The first two were trivial to fix in a fork of the repository (isn't open source awesome?). So if those are bothering you too, check out my branches at http://github.com/asmeurer/PuDB. Maybe if I have some time I will make them global options using environment variables or something and see if Andreas wants to merge them back into the main repo.
As for the second one, I realized that it might be a good thing, because you can see anything that is printed. Still, I would prefer seeing both, if possible (and the black flashes are annoying).
Update 2:
You can resize the side view by pushing +/-, though, there doesn't seem to be a way to, say, make the variables view bigger and the breakpoints view smaller.
04 Jun 2010 8:59pm GMT
Aaron Meurer: asmeurer
So I started writing up a blog post on how rational function integration works, but Ondrej wants a blog post every week by the end of I don't think I would do it justice by rushing to finish it now (read: I'm to lazy to do it). So instead, I'll just give a short post (if that's possible for me) on what I have been doing this week.
I finished up writing doctests for the polynomials module for now (see issue 1949), so now this week I started looking at the integrator. In particular, I went through each of the 40 issues with the Integration label and added them to a test file that I can monitor throughout the summer to see my progress. It is the test_failing_integrals.py file in my Integration branch, where all my work will be going for the foreseeable future. So if you want to follow my work, follow that branch. Here are some observations from those issues:
- integrate() can't handle almost all algebraic integrals (functions with square roots, etc.). It can handle the derivative of arcsin and arcsinh because of special code in heurisch.py, but that's about it. Before I can do any work on the Algebraic Risch Algorithm, I will need to implement the transcendental algorithm, so I think my temporary solution for this may be to add pattern matching heuristics for some of the more common algebraic integrals (anyone know a good integral table?).
- I figured out why integrate hangs forever with some integrals, such as the one in issue 1441. Here is, in a nutshell, how the Heuristic Risch algorithm works: Take the integrand and split it into components. For example, the components of x*cos(x)*sin(x)**2 are [x, cos(x), sin(x)]. Replace each of these components with a dummy variable, so if x = x0, cos(x) = x1, and sin(x) = x2, then the integrand is x0*x1*x2**2. Also, compute the derivative of each component in terms of the dummy variables. So the derivatives of [x0, x1, x2] are [1, -x2, 2*x1*x2]. Then, using these, perform some magic to create some rational functions out of the component dummy variables. Then, create a candidate integral with a bunch of unknowns [A1, A2, …], which will be rational numbers, and a multinomial of the An's and the xn's that should equal 0 if the candidate integral is correct. Then, because the xn's are not 0, and there is also some algebraic independence, you have the the An coefficients of each term must equal 0. So you get a system of linear equations in the An's. You then solve these equations, and plug the values of the An's into the candidate integral to give you the solution, or, if the system is inconsistent, then if cannot find a solution, possibly because there is no elementary one.
Well, that over simplifies a lot of things, but the point I want to make is that the integral from issue 1441 creates a system of ~600 linear equations in ~450 variables, and solving that equation is what causes the integration to hang. Also, as Mateusz, my mentor and the one who wrote the current integration implementation, pointed out, quite a bit of time is spent in the heurisch algorithm doing expansion on large Basic polynomials. When I say Basic polynomials, I mean that they are SymPy expressions, instead of Poly instances. Using Poly should speed things up quite a bit, so my next move will be to convert heurisch() into using Poly wherever applicable.
- There were a few bugs in the rational integration, which I fixed in my branch. The problem was in rational integrals with symbolic coefficients. Because the new polys are able to create polynomials using any expression as a generator, not just symbols, things like Poly(sin(y)*x, x) creates Poly(sin(y)*x, x, domain='ZZ[sin(y)]'). But using the polynomial ring or fraction field creates problems with some things like division, whereas we really only want the domain to be EX (expression domain) in this case. So this was not too difficult to fix, and you can see the fix in my integration branch.
- Some integrals will require some good implementation of special functions such as the hypergeometric function to work. Sometimes, you don't want to know what the non-elementary integral looks like, but you just want to calculate a definite integral. The solution here is to use Meijer-G functions, which are on the list of things to possibly do at the end of the summer if I have time.
- Another bug that I plan on fixing (I haven't done it yet, but I know how to do it and it will be trivial), is this (issue 1888):
In [18]: print integrate(f(x).diff(x)**2, x)
2*D(f(x), x)*f(x)/3 - 2*x*D(f(x), x, x)*f(x)/3 + x*D(f(x), x)**2/3
The problem is in the step where it computes the derivative of the components, it tries to compute the derivative of f(x).diff(x) in terms of a dummy variable, but it reduces to 0 because diff(x2, x) == 0. Thus, it treats f(x).diff(x) like something that has a 0 third derivative, i.e., x**2.
Well that's it. I knew I couldn't make a short blog post :). If you want to help, I have three branches that need review (1, 2, 3), and except for the last one, my work is based on top of the other two, so none of my integration work can be pushed in until those two reviewed positively.
04 Jun 2010 6:45pm GMT













