02 Sep 2014

feedPlanet Python

Graham Dumpleton: Python module search path and mod_wsgi.

When you run the Python interpreter on the command line as an interactive console, if a module to be imported resides in the same directory as you ran the 'python' executable, then it will be found no problems. How can this be though, as we haven't done anything to add the current working directory into 'sys.path', nor has the Python interpreter itself done so? >>> import os, sys>>> os.getcwd()

02 Sep 2014 8:37am GMT

Graham Dumpleton: Python module search path and mod_wsgi.

When you run the Python interpreter on the command line as an interactive console, if a module to be imported resides in the same directory as you ran the 'python' executable, then it will be found no problems. How can this be though, as we haven't done anything to add the current working directory into 'sys.path', nor has the Python interpreter itself done so? >>> import os, sys>>> os.getcwd()

02 Sep 2014 8:37am GMT

Graham Dumpleton: Debugging with pdb when using mod_wsgi.

In the early days of mod_wsgi I made a decision to impose a restriction on the use of stdin and stdout by Python WSGI web applications. My reasoning around this was that if you want to make a WSGI application portable to any WSGI deployment mechanism, then you should not be attempting to use stdin/stdout. This includes either reading or writing to these file objects, or even performing a check on

02 Sep 2014 8:18am GMT

Graham Dumpleton: Debugging with pdb when using mod_wsgi.

In the early days of mod_wsgi I made a decision to impose a restriction on the use of stdin and stdout by Python WSGI web applications. My reasoning around this was that if you want to make a WSGI application portable to any WSGI deployment mechanism, then you should not be attempting to use stdin/stdout. This includes either reading or writing to these file objects, or even performing a check on

02 Sep 2014 8:18am GMT

Graham Dumpleton: What is the current version of mod_wsgi?

If you pick up any Linux distribution, you will most likely come to the conclusion that the newest version of mod_wsgi available is 3.3 or 3.4. Check when those versions were released and you will find: mod_wsgi version 3.3 - released 25th July 2010 mod_wsgi version 3.4 - released 22nd August 2012 Problem is that people look at that and seeing that there are only infrequent releases and nothing

02 Sep 2014 3:20am GMT

Graham Dumpleton: What is the current version of mod_wsgi?

If you pick up any Linux distribution, you will most likely come to the conclusion that the newest version of mod_wsgi available is 3.3 or 3.4. Check when those versions were released and you will find: mod_wsgi version 3.3 - released 25th July 2010 mod_wsgi version 3.4 - released 22nd August 2012 Problem is that people look at that and seeing that there are only infrequent releases and nothing

02 Sep 2014 3:20am GMT

Graham Dumpleton: Using Python virtual environments with mod_wsgi.

You should be using Python virtual environments and if you don't know why you should, maybe you should find out. That said, the use of Python virtual environments was the next topic that came up in my hallway track discussions at DjangoCon US 2014. The pain point here is in part actually of my own creation. This is because although there are better ways of using Python virtual environments with

02 Sep 2014 1:50am GMT

Graham Dumpleton: Using Python virtual environments with mod_wsgi.

You should be using Python virtual environments and if you don't know why you should, maybe you should find out. That said, the use of Python virtual environments was the next topic that came up in my hallway track discussions at DjangoCon US 2014. The pain point here is in part actually of my own creation. This is because although there are better ways of using Python virtual environments with

02 Sep 2014 1:50am GMT

01 Sep 2014

feedPlanet Python

PyPy Development: Python Software Foundation Matching Donations this Month

We're extremely excited to announce that for the month of September, any amount
you donate to PyPy will be match (up to $10,000) by the Python Software
Foundation
.

This includes any of our ongoing fundraisers: NumPyPy, STM, Python3, or our
general fundraising.

Here are some of the things your previous donations have helped accomplish:

You can see a preview of what's coming in our next 2.4 release in the draft
release notes
.

Thank you to all the individuals and companies which have donated so far.

So please, donate today: http://pypy.org/

(Please be aware that the donation progress bars are not live updating, so
don't be afraid if your donation doesn't show up immediately).

01 Sep 2014 7:49pm GMT

PyPy Development: Python Software Foundation Matching Donations this Month

We're extremely excited to announce that for the month of September, any amount
you donate to PyPy will be match (up to $10,000) by the Python Software
Foundation
.

This includes any of our ongoing fundraisers: NumPyPy, STM, Python3, or our
general fundraising.

Here are some of the things your previous donations have helped accomplish:

You can see a preview of what's coming in our next 2.4 release in the draft
release notes
.

Thank you to all the individuals and companies which have donated so far.

So please, donate today: http://pypy.org/

(Please be aware that the donation progress bars are not live updating, so
don't be afraid if your donation doesn't show up immediately).

01 Sep 2014 7:49pm GMT

Carl Trachte: PDF - Removing Pages and Inserting Nested Bookmarks

I blogged before about PyPDF2 and some initial work I had done in response to a request to get a report from Microsoft SQL Server Reporting Services into PDF format. Since then I've had better luck with PyPDF2 using it with Python 3.4. Seldom do I need to make any adjustments to either the PDF file or my Python code to get things to work.

Presented below is the code that is working for me now. The basic gist of it is to strip the blank pages (conveniently SSRS dumps the report with a blank page every other page) from the SSRS PDF dump and reinsert the bookmarks in the right places in a new final document. The report I'm doing is about 30 pages, so having bookmarks is pretty critical for presentation and usability.

The approach I took was to get the bookmarks out of the PDF object model and into a nested dictionary that I could understand and work with easily. To keep the bookmarks in the right order for presentation I used collections.OrderedDict instead of just a regular Python dictionary structure. The code should work for any depth level of nested parent-child PDF bookmarks. My report only goes three or four levels deep, but things can get fairly complex even at that level.

There are a couple artifacts of the actual report I'm doing - the name "comparisonreader" refers to the subject of the report, a comparison of accounting methods' results. I've tried to sanitize the code where appropriate, but missed a thing or two.

It may be a bit overwrought (too much code), but it gets the job done. Thanks for having a look.

#!C:\python34\python

"""
Strip out blank pages and keep bookmarks for
SQL Server SSRS dump of model comparison report (pdf).
"""


import PyPDF2 as pdfimport math
from collections import OrderedDict

INPUTFILE = 'SSRSdump.pdf'
OUTPUTFILE = 'Finalreport.pdf'

OBJECTKEY = '/A'
LISTKEY = '/D'


# Adobe PDF document element keys.
FULLPAGE = '/Fit'
PAGE = '/Page'
PAGES = '/Pages'
ROOT = '/Root'
KIDS = '/Kids'
TITLE = '/Title'


# Python/PDF library types.
NODE = pdf.generic.Destination
CHILD = list


ADDPAGE = 'Adding page {0:d} from SSRS dump to page {1:d} of new document . . .'

# dictionary keys
NAME = 'name'
CHILDREN = 'children'


INDENT = 4 * ' '

ADDEDBOOKMARK = 'Added bookmark {0:s} to parent bookmark {1:s} at depthlevel {2:d}.'

TOPLEVEL = 'TOPLEVEL'

def getpages(comparisonreader):
"""
From a PDF reader object, gets the
page numbers of the odd numbered pages
in the old document (SSRS dump) and
the corresponding page in the final
document.

Returns a generator of two tuples.
"""
# get number of pages then get odd numbered pages
# (even numbered indices)
numpages = comparisonreader.getNumPages()
return ((x, int(x/2)) for x in range(numpages) if x % 2 == 0)


def fixbookmark(bookmark):
"""
bookmark is a PyPDF2 bookmark object.

Side effect function that changes bookmark
page display mode to full page.
"""
# getObject yields a dictionary
props = bookmark.getObject()[OBJECTKEY][LISTKEY][1] = pdf.generic.NameObject(FULLPAGE)
return 0


def matchpage(page, pages):
"""
Find index of page match.

page is a PyPDF2 page object.
pages is the list (PyPDF2 array) of page objects.
Returns integer page index in new (smaller) doc.
"""
originalpageidx = pages.index(page)
return math.floor((originalpageidx + 1)/2)


def pagedict(bookmark, pages):
"""
Creates page dictionary for PyPDF2 bookmark object.

bookmark is a PDF object (dictionary).
pages is a list of PDF page objects (dictionary).
Returns two tuple of a dictionary and
integer page number.
"""
page = matchpage(bookmark[PAGE].getObject(), pages)
title = bookmark[TITLE]
# One bookmark per page per level.
lookupdict = OrderedDict()
lookupdict.update({page:{NAME:title,
CHILDREN:OrderedDict()}})
return lookupdict, page


def recursivepopulater(bookmark, pages):
"""
Fills in child nodes of bookmarks
recursively and returns dictionary.
"""
dictx = OrderedDict()
for pagex in bookmark:
if type(pagex) is NODE:
# get page info and update dictionary with it
lookupdict, page = pagedict(pagex, pages)
dictx.update(lookupdict)
elif type(bookmark) is CHILD:
newdict = OrderedDict()
newdict.update(recursivepopulater(pagex, pages))
dictx[page][CHILDREN].update(newdict)
return dictx


def makenewbookmarks(pages, bookmarks):
"""
Main function to generate bookmark dictionary:

{page number: {name:<name>,
children:[<more bookmarks>]},
and so on.

Returns dictionary.
"""
dictx = OrderedDict()
# top level bookmarks
# it's going to go bookmark, list, bookmark, list, etc.
for bookmark in bookmarks:
if type(bookmark) is NODE:
# get page info and update dictionary with it
lookupdict, page = pagedict(bookmark, pages)
dictx.update(lookupdict)
elif type(bookmark) is CHILD:
dictx[page][CHILDREN] = recursivepopulater(bookmark, pages)
return dictx


def printbookmarkaddition(name, parentname, depthlevel):
"""
Print notification of bookmark addition.

Indentation based on integer depthlevel.
name is the string name of the bookmark.
parentname is the string name of the parent
bookmark.

Side effect function.
"""
args = name, parentname, depthlevel
indent = depthlevel * INDENT
print(indent + ADDEDBOOKMARK.format(*args))


def dealwithbookmarks(comparisonreader, output, bookmarkdict, depthlevel, levelparent=None, parentname=None):
"""
Fix bookmarks so that they are properly
placed in the new document with the blank
pages removed. Recursive side effect function.

comparisonreader is the PDF reader object
for the original document.


output is the PDF writer object for the
final document.


bookmarkdict is a dictionary of bookmarks.

depthlevel is the depth inside the nested
dictionary-list structure (0 is the top).


levelparent is the parent bookmark.

parentname is the name of the parent bookmark.
"""
depthlevel += 1
for pagekeylevel in bookmarkdict:
namelevel = bookmarkdict[pagekeylevel][NAME]
levelparentii = output.addBookmark(namelevel, pagekeylevel, levelparent)
if depthlevel == 0:
parentname = TOPLEVEL
printbookmarkaddition(namelevel, parentname, depthlevel)
fixbookmark(levelparentii)
# dictionary
secondlevel = bookmarkdict[pagekeylevel][CHILDREN]
argsx = comparisonreader, output, secondlevel, depthlevel, levelparentii, namelevel
# Recursive call.
dealwithbookmarks(*argsx)


def cullpages():
"""
Fix SSRS PDF dump by removing blank
pages.
"""
ssrsdump = open(INPUTFILE, 'rb')
finalreport = open(OUTPUTFILE, 'wb')
comparisonreader = pdf.PdfFileReader(ssrsdump)
pageindices = getpages(comparisonreader)
output = pdf.PdfFileWriter()
# add pages from SSRS dump to new pdf doc
for (old, new) in pageindices:
print(ADDPAGE.format(old, new))
pagex = comparisonreader.getPage(old)
output.addPage(pagex)

# Attempt to add bookmarks from original doc
# getOutlines yields a list of nested dictionaries and lists:
# outermost list - starts with parent bookmark (dictionary)
# inner list - starts with child bookmark (dictionary)
# and so on
# The SSRS dump and this list have bookmarks in correct order.
bookmarks = comparisonreader.getOutlines()
# Get page numbers using this methodology (indirect object references)
#
http://stackoverflow.com/questions/1918420/split-a-pdf-based-on-outline
# list of IndirectObject's of pages in order
pages = [pagen.getObject() for pagen in
comparisonreader.trailer[ROOT].getObject()[PAGES].getObject()[KIDS]]
# Bookmarks.
# Top level is list of bookmarks.
# List goes parent bookmark (Destination object)
# child bookmarks (list)
# and so on.
bookmarkdict = makenewbookmarks(pages, bookmarks)
# Initial level of -1 allows increment to 0 at start.
dealwithbookmarks(comparisonreader, output, bookmarkdict, -1)

print('\n\nWriting final report . . .')
output.write(finalreport)
finalreport.close()
ssrsdump.close()
print('\n\nFinished.\n\n')


if __name__ == '__main__':
cullpages()

01 Sep 2014 4:59pm GMT

Carl Trachte: PDF - Removing Pages and Inserting Nested Bookmarks

I blogged before about PyPDF2 and some initial work I had done in response to a request to get a report from Microsoft SQL Server Reporting Services into PDF format. Since then I've had better luck with PyPDF2 using it with Python 3.4. Seldom do I need to make any adjustments to either the PDF file or my Python code to get things to work.

Presented below is the code that is working for me now. The basic gist of it is to strip the blank pages (conveniently SSRS dumps the report with a blank page every other page) from the SSRS PDF dump and reinsert the bookmarks in the right places in a new final document. The report I'm doing is about 30 pages, so having bookmarks is pretty critical for presentation and usability.

The approach I took was to get the bookmarks out of the PDF object model and into a nested dictionary that I could understand and work with easily. To keep the bookmarks in the right order for presentation I used collections.OrderedDict instead of just a regular Python dictionary structure. The code should work for any depth level of nested parent-child PDF bookmarks. My report only goes three or four levels deep, but things can get fairly complex even at that level.

There are a couple artifacts of the actual report I'm doing - the name "comparisonreader" refers to the subject of the report, a comparison of accounting methods' results. I've tried to sanitize the code where appropriate, but missed a thing or two.

It may be a bit overwrought (too much code), but it gets the job done. Thanks for having a look.

#!C:\python34\python

"""
Strip out blank pages and keep bookmarks for
SQL Server SSRS dump of model comparison report (pdf).
"""


import PyPDF2 as pdfimport math
from collections import OrderedDict

INPUTFILE = 'SSRSdump.pdf'
OUTPUTFILE = 'Finalreport.pdf'

OBJECTKEY = '/A'
LISTKEY = '/D'


# Adobe PDF document element keys.
FULLPAGE = '/Fit'
PAGE = '/Page'
PAGES = '/Pages'
ROOT = '/Root'
KIDS = '/Kids'
TITLE = '/Title'


# Python/PDF library types.
NODE = pdf.generic.Destination
CHILD = list


ADDPAGE = 'Adding page {0:d} from SSRS dump to page {1:d} of new document . . .'

# dictionary keys
NAME = 'name'
CHILDREN = 'children'


INDENT = 4 * ' '

ADDEDBOOKMARK = 'Added bookmark {0:s} to parent bookmark {1:s} at depthlevel {2:d}.'

TOPLEVEL = 'TOPLEVEL'

def getpages(comparisonreader):
"""
From a PDF reader object, gets the
page numbers of the odd numbered pages
in the old document (SSRS dump) and
the corresponding page in the final
document.

Returns a generator of two tuples.
"""
# get number of pages then get odd numbered pages
# (even numbered indices)
numpages = comparisonreader.getNumPages()
return ((x, int(x/2)) for x in range(numpages) if x % 2 == 0)


def fixbookmark(bookmark):
"""
bookmark is a PyPDF2 bookmark object.

Side effect function that changes bookmark
page display mode to full page.
"""
# getObject yields a dictionary
props = bookmark.getObject()[OBJECTKEY][LISTKEY][1] = pdf.generic.NameObject(FULLPAGE)
return 0


def matchpage(page, pages):
"""
Find index of page match.

page is a PyPDF2 page object.
pages is the list (PyPDF2 array) of page objects.
Returns integer page index in new (smaller) doc.
"""
originalpageidx = pages.index(page)
return math.floor((originalpageidx + 1)/2)


def pagedict(bookmark, pages):
"""
Creates page dictionary for PyPDF2 bookmark object.

bookmark is a PDF object (dictionary).
pages is a list of PDF page objects (dictionary).
Returns two tuple of a dictionary and
integer page number.
"""
page = matchpage(bookmark[PAGE].getObject(), pages)
title = bookmark[TITLE]
# One bookmark per page per level.
lookupdict = OrderedDict()
lookupdict.update({page:{NAME:title,
CHILDREN:OrderedDict()}})
return lookupdict, page


def recursivepopulater(bookmark, pages):
"""
Fills in child nodes of bookmarks
recursively and returns dictionary.
"""
dictx = OrderedDict()
for pagex in bookmark:
if type(pagex) is NODE:
# get page info and update dictionary with it
lookupdict, page = pagedict(pagex, pages)
dictx.update(lookupdict)
elif type(bookmark) is CHILD:
newdict = OrderedDict()
newdict.update(recursivepopulater(pagex, pages))
dictx[page][CHILDREN].update(newdict)
return dictx


def makenewbookmarks(pages, bookmarks):
"""
Main function to generate bookmark dictionary:

{page number: {name:<name>,
children:[<more bookmarks>]},
and so on.

Returns dictionary.
"""
dictx = OrderedDict()
# top level bookmarks
# it's going to go bookmark, list, bookmark, list, etc.
for bookmark in bookmarks:
if type(bookmark) is NODE:
# get page info and update dictionary with it
lookupdict, page = pagedict(bookmark, pages)
dictx.update(lookupdict)
elif type(bookmark) is CHILD:
dictx[page][CHILDREN] = recursivepopulater(bookmark, pages)
return dictx


def printbookmarkaddition(name, parentname, depthlevel):
"""
Print notification of bookmark addition.

Indentation based on integer depthlevel.
name is the string name of the bookmark.
parentname is the string name of the parent
bookmark.

Side effect function.
"""
args = name, parentname, depthlevel
indent = depthlevel * INDENT
print(indent + ADDEDBOOKMARK.format(*args))


def dealwithbookmarks(comparisonreader, output, bookmarkdict, depthlevel, levelparent=None, parentname=None):
"""
Fix bookmarks so that they are properly
placed in the new document with the blank
pages removed. Recursive side effect function.

comparisonreader is the PDF reader object
for the original document.


output is the PDF writer object for the
final document.


bookmarkdict is a dictionary of bookmarks.

depthlevel is the depth inside the nested
dictionary-list structure (0 is the top).


levelparent is the parent bookmark.

parentname is the name of the parent bookmark.
"""
depthlevel += 1
for pagekeylevel in bookmarkdict:
namelevel = bookmarkdict[pagekeylevel][NAME]
levelparentii = output.addBookmark(namelevel, pagekeylevel, levelparent)
if depthlevel == 0:
parentname = TOPLEVEL
printbookmarkaddition(namelevel, parentname, depthlevel)
fixbookmark(levelparentii)
# dictionary
secondlevel = bookmarkdict[pagekeylevel][CHILDREN]
argsx = comparisonreader, output, secondlevel, depthlevel, levelparentii, namelevel
# Recursive call.
dealwithbookmarks(*argsx)


def cullpages():
"""
Fix SSRS PDF dump by removing blank
pages.
"""
ssrsdump = open(INPUTFILE, 'rb')
finalreport = open(OUTPUTFILE, 'wb')
comparisonreader = pdf.PdfFileReader(ssrsdump)
pageindices = getpages(comparisonreader)
output = pdf.PdfFileWriter()
# add pages from SSRS dump to new pdf doc
for (old, new) in pageindices:
print(ADDPAGE.format(old, new))
pagex = comparisonreader.getPage(old)
output.addPage(pagex)

# Attempt to add bookmarks from original doc
# getOutlines yields a list of nested dictionaries and lists:
# outermost list - starts with parent bookmark (dictionary)
# inner list - starts with child bookmark (dictionary)
# and so on
# The SSRS dump and this list have bookmarks in correct order.
bookmarks = comparisonreader.getOutlines()
# Get page numbers using this methodology (indirect object references)
#
http://stackoverflow.com/questions/1918420/split-a-pdf-based-on-outline
# list of IndirectObject's of pages in order
pages = [pagen.getObject() for pagen in
comparisonreader.trailer[ROOT].getObject()[PAGES].getObject()[KIDS]]
# Bookmarks.
# Top level is list of bookmarks.
# List goes parent bookmark (Destination object)
# child bookmarks (list)
# and so on.
bookmarkdict = makenewbookmarks(pages, bookmarks)
# Initial level of -1 allows increment to 0 at start.
dealwithbookmarks(comparisonreader, output, bookmarkdict, -1)

print('\n\nWriting final report . . .')
output.write(finalreport)
finalreport.close()
ssrsdump.close()
print('\n\nFinished.\n\n')


if __name__ == '__main__':
cullpages()

01 Sep 2014 4:59pm GMT

Graham Dumpleton: Setting LANG and LC_ALL when using mod_wsgi.

So I am at DjangoCon US 2014 and one of the first pain points for using mod_wsgi that came up in discussion at DjangoCon US was the lang and locale settings. These settings influence what the default encoding is for Python when implicitly converting Unicode to byte strings. In other words, they dictate what is going on at the Unicode/bytes boundary. Now this should not really be an issue with

01 Sep 2014 4:42pm GMT

Graham Dumpleton: Setting LANG and LC_ALL when using mod_wsgi.

So I am at DjangoCon US 2014 and one of the first pain points for using mod_wsgi that came up in discussion at DjangoCon US was the lang and locale settings. These settings influence what the default encoding is for Python when implicitly converting Unicode to byte strings. In other words, they dictate what is going on at the Unicode/bytes boundary. Now this should not really be an issue with

01 Sep 2014 4:42pm GMT

Graham Dumpleton: Reporting on the DjangoCon US 2014 hallway track.

I have only been in Portland for a few hours for DjangoCon, and despite some lack of sleep, I already feel that being here is recharging my enthusiasm for working on Open Source, something that has still been sagging a bit lately. I don't wish to return to that dark abyss I was in, so definitely what I need. Now lots of people write up reports on conferences including live noting them, but I

01 Sep 2014 3:34pm GMT

Graham Dumpleton: Reporting on the DjangoCon US 2014 hallway track.

I have only been in Portland for a few hours for DjangoCon, and despite some lack of sleep, I already feel that being here is recharging my enthusiasm for working on Open Source, something that has still been sagging a bit lately. I don't wish to return to that dark abyss I was in, so definitely what I need. Now lots of people write up reports on conferences including live noting them, but I

01 Sep 2014 3:34pm GMT

Python Software Foundation: Matching Donations to PyPy in September!

We're thrilled to announce that we will be matching donations made to the PyPy project for the month of September. For every dollar donated this month, the PSF will also give a dollar, up to a $10,000 total contribution. Head to http://pypy.org/ and view the donation options on the right side of the page, including general funding or a donation targeted to their STM, Py3k, or NumPy efforts.

We've previously given a $10,000 donation to PyPy, and more recently seeded the STM efforts with $5,000. The PyPy project works with the Software Freedom Conservancy to manage fund raising efforts and the usage of the funds, and they'll be the ones notifying us of how you all made your donations. At the end of the month, we'll do our part and chip in to making PyPy even better.

The matching period runs today through the end of September.

01 Sep 2014 1:33pm GMT

Python Software Foundation: Matching Donations to PyPy in September!

We're thrilled to announce that we will be matching donations made to the PyPy project for the month of September. For every dollar donated this month, the PSF will also give a dollar, up to a $10,000 total contribution. Head to http://pypy.org/ and view the donation options on the right side of the page, including general funding or a donation targeted to their STM, Py3k, or NumPy efforts.

We've previously given a $10,000 donation to PyPy, and more recently seeded the STM efforts with $5,000. The PyPy project works with the Software Freedom Conservancy to manage fund raising efforts and the usage of the funds, and they'll be the ones notifying us of how you all made your donations. At the end of the month, we'll do our part and chip in to making PyPy even better.

The matching period runs today through the end of September.

01 Sep 2014 1:33pm GMT

Leonardo Giordani: Python 3 OOP Part 5 - Metaclasses

Previous post

Python 3 OOP Part 4 - Polymorphism

The Type Brothers

The first step into the most intimate secrets of Python objects comes from two components we already met in the first post: class and object. These two things are the very fundamental elements of Python OOP system, so it is worth spending some time to understand how they work and relate each other.

First of all recall that in Python everything is an object, that is everything inherits from object. Thus, object seems to be the deepest thing you can find digging into Python variables. Let's check this

``` python

a = 5 type(a) a.class a.class.bases (,) object.bases () ```

The variable a is an instance of the int class, and this latter inherits from object, which inherits from nothing. This demonstrates that object is at the top of the class hierarchy. However, as you can see, both int and object are called classes (<class 'int'>, <class 'object'>). Indeed, while a is an instance of the int class, int itself is an instance of another class, a class that is instanced to build classes

``` python

type(a) type(int) type(float) type(dict) ```

Since in Python everything is an object, everything is the instance of a class, even classes. Well, type is the class that is instanced to get classes. So remember this: object is the base of every object, type is the class of every type. Sounds puzzling? It is not your fault, don't worry. However, just to strike you with the finishing move, this is what Python is built on

``` python

type(object) type.bases (,) ```

If you are not about to faint at this point chances are that you are Guido van Rossum of one of his friends down at the Python core development team (in this case let me thank you for your beautiful creation). You may get a cup of tea, if you need it.

Jokes apart, at the very base of Python type system there are two things, object and type, which are inseparable. The previous code shows that object is an instance of type, and type inherits from object. Take your time to understand this subtle concept, as it is very important for the upcoming discussion about metaclasses.

When you think you grasped the type/object matter read this and start thinking again

``` python

type(type) ```

The Metaclasses Take Python

You are now familiar with Python classes. You know that a class is used to create an instance, and that the structure of this latter is ruled by the source class and all its parent classes (until you reach object).

Since classes are objects too, you know that a class itself is an instance of a (super)class, and this class is type. That is, as already stated, type is the class that is used to build classes.

So for example you know that a class may be instanced, i.e. it can be called and by calling it you obtain another object that is linked with the class. What prepares the class for being called? What gives the class all its methods? In Python the class in charge of performing such tasks is called metaclass, and type is the default metaclass of all classes.

The point of exposing this structure of Python objects is that you may change the way classes are built. As you know, type is an object, so it can be subclassed just like any other class. Once you get a subclass of type you need to instruct your class to use it as the metaclass instead of type, and you can do this by passing it as the metaclass keyword argument in the class definition.

``` python

class MyType(type): ... pass ... class MySpecialClass(metaclass=MyType): ... pass ... msp = MySpecialClass() type(msp) type(MySpecialClass) type(MyType) ```

Metaclasses 2: Singleton Day

Metaclasses are a very advanced topic in Python, but they have many practical uses. For example, by means of a custom metaclass you may log any time a class is instanced, which can be important for applications that shall keep a low memory usage or have to monitor it.

I am going to show here a very simple example of metaclass, the Singleton. Singleton is a well known design pattern, and many description of it may be found on the Internet. It has also been heavily criticized mostly because its bad behaviour when subclassed, but here I do not want to introduce it for its technological value, but for its simplicity (so please do not question the choice, it is just an example).

Singleton has one purpose: to return the same instance every time it is instanced, like a sort of object-oriented global variable. So we need to build a class that does not work like standard classes, which return a new instance every time they are called.

"Build a class"? This is a task for metaclasses. The following implementation comes from Python 3 Patterns, Recipes and Idioms.

``` python class Singleton(type):

instance = None
def __call__(cls, *args, **kw):
    if not cls.instance:
         cls.instance = super(Singleton, cls).__call__(*args, **kw)
    return cls.instance

```

We are defining a new type, which inherits from type to provide all bells and whistles of Python classes. We override the __call__ method, that is a special method invoked when we call the class, i.e. when we instance it. The new method wraps the original method of type by calling it only when the instance attribute is not set, i.e. the first time the class is instanced, otherwise it just returns the recorded instance. As you can see this is a very basic cache class, the only trick is that it is applied to the creation of instances.

To test the new type we need to define a new class that uses it as its metaclass

``` python

class ASingleton(metaclass=Singleton): ... pass ... a = ASingleton() b = ASingleton() a is b True hex(id(a)) '0xb68030ec' hex(id(b)) '0xb68030ec' ```

By using the is operator we test that the two objects are the very same structure in memory, that is their ids are the same, as explicitly shown. What actually happens is that when you issue a = ASingleton() the ASingleton class runs its __call__() method, which is taken from the Singleton type behind the class. That method recognizes that no instance has been created (Singleton.instance is None) and acts just like any standard class does. When you issue b = ASingleton() the very same things happen, but since Singleton.instance is now different from None its value (the previous instance) is directly returned.

Metaclasses are a very powerful programming tool and leveraging them you can achieve very complex behaviours with a small effort. Their use is a must every time you are actually metaprogramming, that is you are writing code that has to drive the way your code works. Good examples are creational patterns (injecting custom class attributes depending on some configuration), testing, debugging, and performance monitoring.

Coming to Instance

Before introducing you to a very smart use of metaclasses by talking about Abstract Base Classes (read: to save some topics for the next part of this series), I want to dive into the object creation procedure in Python, that is what happens when you instance a class. In the first post this procedure was described only partially, by looking at the __init_() method.

In the first post I recalled the object-oriented concept of constructor, which is a special method of the class that is automatically called when the instance is created. The class may also define a destructor, which is called when the object is destroyed. In languages without a garbage collection mechanism such as C++ the destructor shall be carefully designed. In Python the destructor may be defined through the __del__() method, but it is hardly used.

The constructor mechanism in Python is on the contrary very important, and it is implemented by two methods, instead of just one: __new__() and __init__(). The tasks of the two methods are very clear and distinct: __new__() shall perform actions needed when creating a new instance while __init__ deals with object initialization.

Since in Python you do not need to declare attributes due to its dynamic nature, __new__() is rarely defined by programmers, who may rely on __init__ to perform the majority of the usual tasks. Typical uses of __new__() are very similar to those listed in the previous section, since it allows to trigger some code whenever your class is instanced.

The standard way to override __new__() is

``` python class MyClass():

def __new__(cls, *args, **kwds):
    obj = super().__new__(cls, *args, **kwds)
    [put your code here]
    return obj

```

just like you usually do with __init__(). When your class inherits from object you do not need to call the parent method (object.__init__()), because it is empty, but you need to do it when overriding __new__.

Remember that __new__() is not forced to return an instance of the class in which it is defined, even if you shall have very good reasons to break this behaviour. Anyway, __init__() will be called only if you return an instance of the container class. Please also note that __new__(), unlike __init__(), accepts the class as its first parameter. The name is not important in Python, and you can also call it self, but it is worth using cls to remember that it is not an instance.

Movie Trivia

Section titles come from the following movies: The Blues Brothers (1980), The Muppets Take Manhattan (1984), Terminator 2: Judgement Day (1991), Coming to America (1988).

Sources

You will find a lot of documentation in this Reddit post. Most of the information contained in this series come from those sources.

Feedback

Feel free to use the blog Google+ page to comment the post. The GitHub issues page is the best place to submit corrections.

01 Sep 2014 1:00pm GMT

Leonardo Giordani: Python 3 OOP Part 5 - Metaclasses

Previous post

Python 3 OOP Part 4 - Polymorphism

The Type Brothers

The first step into the most intimate secrets of Python objects comes from two components we already met in the first post: class and object. These two things are the very fundamental elements of Python OOP system, so it is worth spending some time to understand how they work and relate each other.

First of all recall that in Python everything is an object, that is everything inherits from object. Thus, object seems to be the deepest thing you can find digging into Python variables. Let's check this

``` python

a = 5 type(a) a.class a.class.bases (,) object.bases () ```

The variable a is an instance of the int class, and this latter inherits from object, which inherits from nothing. This demonstrates that object is at the top of the class hierarchy. However, as you can see, both int and object are called classes (<class 'int'>, <class 'object'>). Indeed, while a is an instance of the int class, int itself is an instance of another class, a class that is instanced to build classes

``` python

type(a) type(int) type(float) type(dict) ```

Since in Python everything is an object, everything is the instance of a class, even classes. Well, type is the class that is instanced to get classes. So remember this: object is the base of every object, type is the class of every type. Sounds puzzling? It is not your fault, don't worry. However, just to strike you with the finishing move, this is what Python is built on

``` python

type(object) type.bases (,) ```

If you are not about to faint at this point chances are that you are Guido van Rossum of one of his friends down at the Python core development team (in this case let me thank you for your beautiful creation). You may get a cup of tea, if you need it.

Jokes apart, at the very base of Python type system there are two things, object and type, which are inseparable. The previous code shows that object is an instance of type, and type inherits from object. Take your time to understand this subtle concept, as it is very important for the upcoming discussion about metaclasses.

When you think you grasped the type/object matter read this and start thinking again

``` python

type(type) ```

The Metaclasses Take Python

You are now familiar with Python classes. You know that a class is used to create an instance, and that the structure of this latter is ruled by the source class and all its parent classes (until you reach object).

Since classes are objects too, you know that a class itself is an instance of a (super)class, and this class is type. That is, as already stated, type is the class that is used to build classes.

So for example you know that a class may be instanced, i.e. it can be called and by calling it you obtain another object that is linked with the class. What prepares the class for being called? What gives the class all its methods? In Python the class in charge of performing such tasks is called metaclass, and type is the default metaclass of all classes.

The point of exposing this structure of Python objects is that you may change the way classes are built. As you know, type is an object, so it can be subclassed just like any other class. Once you get a subclass of type you need to instruct your class to use it as the metaclass instead of type, and you can do this by passing it as the metaclass keyword argument in the class definition.

``` python

class MyType(type): ... pass ... class MySpecialClass(metaclass=MyType): ... pass ... msp = MySpecialClass() type(msp) type(MySpecialClass) type(MyType) ```

Metaclasses 2: Singleton Day

Metaclasses are a very advanced topic in Python, but they have many practical uses. For example, by means of a custom metaclass you may log any time a class is instanced, which can be important for applications that shall keep a low memory usage or have to monitor it.

I am going to show here a very simple example of metaclass, the Singleton. Singleton is a well known design pattern, and many description of it may be found on the Internet. It has also been heavily criticized mostly because its bad behaviour when subclassed, but here I do not want to introduce it for its technological value, but for its simplicity (so please do not question the choice, it is just an example).

Singleton has one purpose: to return the same instance every time it is instanced, like a sort of object-oriented global variable. So we need to build a class that does not work like standard classes, which return a new instance every time they are called.

"Build a class"? This is a task for metaclasses. The following implementation comes from Python 3 Patterns, Recipes and Idioms.

``` python class Singleton(type):

instance = None
def __call__(cls, *args, **kw):
    if not cls.instance:
         cls.instance = super(Singleton, cls).__call__(*args, **kw)
    return cls.instance

```

We are defining a new type, which inherits from type to provide all bells and whistles of Python classes. We override the __call__ method, that is a special method invoked when we call the class, i.e. when we instance it. The new method wraps the original method of type by calling it only when the instance attribute is not set, i.e. the first time the class is instanced, otherwise it just returns the recorded instance. As you can see this is a very basic cache class, the only trick is that it is applied to the creation of instances.

To test the new type we need to define a new class that uses it as its metaclass

``` python

class ASingleton(metaclass=Singleton): ... pass ... a = ASingleton() b = ASingleton() a is b True hex(id(a)) '0xb68030ec' hex(id(b)) '0xb68030ec' ```

By using the is operator we test that the two objects are the very same structure in memory, that is their ids are the same, as explicitly shown. What actually happens is that when you issue a = ASingleton() the ASingleton class runs its __call__() method, which is taken from the Singleton type behind the class. That method recognizes that no instance has been created (Singleton.instance is None) and acts just like any standard class does. When you issue b = ASingleton() the very same things happen, but since Singleton.instance is now different from None its value (the previous instance) is directly returned.

Metaclasses are a very powerful programming tool and leveraging them you can achieve very complex behaviours with a small effort. Their use is a must every time you are actually metaprogramming, that is you are writing code that has to drive the way your code works. Good examples are creational patterns (injecting custom class attributes depending on some configuration), testing, debugging, and performance monitoring.

Coming to Instance

Before introducing you to a very smart use of metaclasses by talking about Abstract Base Classes (read: to save some topics for the next part of this series), I want to dive into the object creation procedure in Python, that is what happens when you instance a class. In the first post this procedure was described only partially, by looking at the __init_() method.

In the first post I recalled the object-oriented concept of constructor, which is a special method of the class that is automatically called when the instance is created. The class may also define a destructor, which is called when the object is destroyed. In languages without a garbage collection mechanism such as C++ the destructor shall be carefully designed. In Python the destructor may be defined through the __del__() method, but it is hardly used.

The constructor mechanism in Python is on the contrary very important, and it is implemented by two methods, instead of just one: __new__() and __init__(). The tasks of the two methods are very clear and distinct: __new__() shall perform actions needed when creating a new instance while __init__ deals with object initialization.

Since in Python you do not need to declare attributes due to its dynamic nature, __new__() is rarely defined by programmers, who may rely on __init__ to perform the majority of the usual tasks. Typical uses of __new__() are very similar to those listed in the previous section, since it allows to trigger some code whenever your class is instanced.

The standard way to override __new__() is

``` python class MyClass():

def __new__(cls, *args, **kwds):
    obj = super().__new__(cls, *args, **kwds)
    [put your code here]
    return obj

```

just like you usually do with __init__(). When your class inherits from object you do not need to call the parent method (object.__init__()), because it is empty, but you need to do it when overriding __new__.

Remember that __new__() is not forced to return an instance of the class in which it is defined, even if you shall have very good reasons to break this behaviour. Anyway, __init__() will be called only if you return an instance of the container class. Please also note that __new__(), unlike __init__(), accepts the class as its first parameter. The name is not important in Python, and you can also call it self, but it is worth using cls to remember that it is not an instance.

Movie Trivia

Section titles come from the following movies: The Blues Brothers (1980), The Muppets Take Manhattan (1984), Terminator 2: Judgement Day (1991), Coming to America (1988).

Sources

You will find a lot of documentation in this Reddit post. Most of the information contained in this series come from those sources.

Feedback

Feel free to use the blog Google+ page to comment the post. The GitHub issues page is the best place to submit corrections.

01 Sep 2014 1:00pm GMT

Machinalis: Decision tree classifier

Introduction

This post presents a simple but still fully functional Python implementation of a decision tree classifier. It is not aimed to be a tutorial on machine learning, classifications or even decision trees: there are a lot of resources on the web already. The main idea is to provide a Python example implementation for those who are familiar or comfortable with this language.

There are several decision tree algorithms that have been developed over time, each one improving or optimizing something over the predecessor. In this post, the implementation presented corresponds to the first well-known algorithm on the subject: the Iterative Dichotomiser 3 (ID3), developed in 1986 by Ross Quinlan.

For those familiar with the scikit-learn library, its documentation includes a specific section devoted to decision trees. This API provides a production-ready, fully parametric implementation of an optimized version of the CART algorithm.

Implementation

The code is here: https://gist.github.com/cmdelatorre/fd9ee43167f5cc1da130

Basically a tree is represented using Python dicts. A couple of very simple classes that extend dict where created to distinguish between tree or leaf nodes. Also, a namedtuple was defined to match each training sample with its corresponding class. The necessary information_gain and entropy functions where created, their implementations really simple thanks to Python's standard collections lib.

Finally, the main piece of code is the tree-creation method: this is where all the magic happens.

def create_decision_tree(self, training_samples, predicting_features):
    """Recursively, create a desition tree and return the parent node."""

    if not predicting_features:
        # No more predicting features
        default_klass = self.get_most_common_class(training_samples)
        root_node = DecisionTreeLeaf(default_klass)
    else:
        klasses = [sample.klass for sample in training_samples]
        if len(set(klasses)) == 1:
            target_klass = training_samples[0].klass
            root_node = DecisionTreeLeaf(target_klass)
        else:
            best_feature = self.select_best_feature(training_samples,
                                                    predicting_features,
                                                    klasses)
            # Create the node to return and create the sub-tree.
            root_node = DecisionTreeNode(best_feature)
            best_feature_values = {s.sample[best_feature]
                                   for s in training_samples}
            for value in best_feature_values:
                samples = [s for s in training_samples
                           if s.sample[best_feature] == value]
                # Recursively, create a child node.
                child = self.create_decision_tree(samples,
                                                  predicting_features)
                root_node[value] = child
    return root_node

Motivated by the already mentioned scikit-learn library, the algorithm is developed within a class with the following methods:

  • fit(training_samples, known_labels) : Creates the decision tree using the training data.
  • predict(samples) : given a fitted model, predict the label of a new set of data. It returns the learned label for each sample in the given array.
  • score(samples, known_labels) : predicts the labels for the given data samples and contrasts with the truth provided in the known_labels. Returns a score which is a number between 0 (no matches) and 1 (perfect match).

Other than that, the code is pretty much self explanatory. Using the standard Python module collections, the auxiliary methods (select_best_feature, information_gain, entropy) are very concise. The tree is easily implemented using dict:

  • Each node is either a leaf or a branch: If it is a leaf then it represents a class. If it is a branch, then it represents a feature.
  • Each branch has got as many children as possible values has the represented feature.

Then, to classify a given vector X = [f0, ..., fn], starting with the root of the generated tree:

  1. Take the root node (usually a branch, unless X has only one feature, which is not really useful).
  2. Such node will have a related feature, fi, so we check the value of X for the target feature: v = X [ fi]
  3. If node[v] is a leaf, then we assign the leaf's related class to X.
  4. If v is not a key in node[v], then we can't assign a class with the existing tree and we assign a default class (the most probable one).
  5. If node[v] is another branch, we repeat this procedure using the new node as root.

I'll not dig further in the details as this is not supposed to be a tutorial or course on decision trees. Some minimal previous knowledge should be enough to understand the code. In any case, don't hesitate to post your questions or comments.

To keep updated about Machine Learning, Data Processing and Complex Web Development follow us on @machinalis.

01 Sep 2014 12:29pm GMT

Machinalis: Decision tree classifier

Introduction

This post presents a simple but still fully functional Python implementation of a decision tree classifier. It is not aimed to be a tutorial on machine learning, classifications or even decision trees: there are a lot of resources on the web already. The main idea is to provide a Python example implementation for those who are familiar or comfortable with this language.

There are several decision tree algorithms that have been developed over time, each one improving or optimizing something over the predecessor. In this post, the implementation presented corresponds to the first well-known algorithm on the subject: the Iterative Dichotomiser 3 (ID3), developed in 1986 by Ross Quinlan.

For those familiar with the scikit-learn library, its documentation includes a specific section devoted to decision trees. This API provides a production-ready, fully parametric implementation of an optimized version of the CART algorithm.

Implementation

The code is here: https://gist.github.com/cmdelatorre/fd9ee43167f5cc1da130

Basically a tree is represented using Python dicts. A couple of very simple classes that extend dict where created to distinguish between tree or leaf nodes. Also, a namedtuple was defined to match each training sample with its corresponding class. The necessary information_gain and entropy functions where created, their implementations really simple thanks to Python's standard collections lib.

Finally, the main piece of code is the tree-creation method: this is where all the magic happens.

def create_decision_tree(self, training_samples, predicting_features):
    """Recursively, create a desition tree and return the parent node."""

    if not predicting_features:
        # No more predicting features
        default_klass = self.get_most_common_class(training_samples)
        root_node = DecisionTreeLeaf(default_klass)
    else:
        klasses = [sample.klass for sample in training_samples]
        if len(set(klasses)) == 1:
            target_klass = training_samples[0].klass
            root_node = DecisionTreeLeaf(target_klass)
        else:
            best_feature = self.select_best_feature(training_samples,
                                                    predicting_features,
                                                    klasses)
            # Create the node to return and create the sub-tree.
            root_node = DecisionTreeNode(best_feature)
            best_feature_values = {s.sample[best_feature]
                                   for s in training_samples}
            for value in best_feature_values:
                samples = [s for s in training_samples
                           if s.sample[best_feature] == value]
                # Recursively, create a child node.
                child = self.create_decision_tree(samples,
                                                  predicting_features)
                root_node[value] = child
    return root_node

Motivated by the already mentioned scikit-learn library, the algorithm is developed within a class with the following methods:

  • fit(training_samples, known_labels) : Creates the decision tree using the training data.
  • predict(samples) : given a fitted model, predict the label of a new set of data. It returns the learned label for each sample in the given array.
  • score(samples, known_labels) : predicts the labels for the given data samples and contrasts with the truth provided in the known_labels. Returns a score which is a number between 0 (no matches) and 1 (perfect match).

Other than that, the code is pretty much self explanatory. Using the standard Python module collections, the auxiliary methods (select_best_feature, information_gain, entropy) are very concise. The tree is easily implemented using dict:

  • Each node is either a leaf or a branch: If it is a leaf then it represents a class. If it is a branch, then it represents a feature.
  • Each branch has got as many children as possible values has the represented feature.

Then, to classify a given vector X = [f0, ..., fn], starting with the root of the generated tree:

  1. Take the root node (usually a branch, unless X has only one feature, which is not really useful).
  2. Such node will have a related feature, fi, so we check the value of X for the target feature: v = X [ fi]
  3. If node[v] is a leaf, then we assign the leaf's related class to X.
  4. If v is not a key in node[v], then we can't assign a class with the existing tree and we assign a default class (the most probable one).
  5. If node[v] is another branch, we repeat this procedure using the new node as root.

I'll not dig further in the details as this is not supposed to be a tutorial or course on decision trees. Some minimal previous knowledge should be enough to understand the code. In any case, don't hesitate to post your questions or comments.

To keep updated about Machine Learning, Data Processing and Complex Web Development follow us on @machinalis.

01 Sep 2014 12:29pm GMT

Ian Ozsvald: Slides for High Performance Python tutorial at EuroSciPy2014 + Book signing!

Yesterday I taught an excerpt of my 2 day High Performance Python tutorial as a 1.5 hour hands-on lesson at EuroSciPy 2014 in Cambridge with 70 students:

IMG_20140828_155857

We covered profiling (down to line-by-line CPU & memory usage), Cython (pure-py and OpenMP with numpy), Pythran, PyPy and Numba. This is an abridged set of slides from my 2 day tutorial, take a look at those details for the upcoming courses (including an intro to data science) we're running in October.

I'll add the video in here once it is released, the slides are below.

I also got to do a book-signing for our High Performance Python book (co-authored with Micha Gorelick), O'Reilly sent us 20 galley copies to give away. The finished printed book will be available via O'Reilly and Amazon in the next few weeks.

Book signing at EuroSciPy 2014

If you want to hear about our future courses then join our low-volume training announce list. I have a short (no-signup) survey about training needs for Pythonistas in data science, please fill that in to help me figure out what we should be teaching.

I also have a further survey on how companies are using (or not using!) data science, I'll be using the results of this when I keynote at PyConIreland in October, your input will be very useful.

Here are the slides (License: CC By NonCommercial), there's also source on github:


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

01 Sep 2014 9:11am GMT

Ian Ozsvald: Slides for High Performance Python tutorial at EuroSciPy2014 + Book signing!

Yesterday I taught an excerpt of my 2 day High Performance Python tutorial as a 1.5 hour hands-on lesson at EuroSciPy 2014 in Cambridge with 70 students:

IMG_20140828_155857

We covered profiling (down to line-by-line CPU & memory usage), Cython (pure-py and OpenMP with numpy), Pythran, PyPy and Numba. This is an abridged set of slides from my 2 day tutorial, take a look at those details for the upcoming courses (including an intro to data science) we're running in October.

I'll add the video in here once it is released, the slides are below.

I also got to do a book-signing for our High Performance Python book (co-authored with Micha Gorelick), O'Reilly sent us 20 galley copies to give away. The finished printed book will be available via O'Reilly and Amazon in the next few weeks.

Book signing at EuroSciPy 2014

If you want to hear about our future courses then join our low-volume training announce list. I have a short (no-signup) survey about training needs for Pythonistas in data science, please fill that in to help me figure out what we should be teaching.

I also have a further survey on how companies are using (or not using!) data science, I'll be using the results of this when I keynote at PyConIreland in October, your input will be very useful.

Here are the slides (License: CC By NonCommercial), there's also source on github:


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

01 Sep 2014 9:11am GMT

Carl Trachte: Internet Explorer 9 Save Dialog - SendKeys Last Resort

At work we use Internet Explorer 9 on Windows 7 Enterprise. SharePoint is the favored software for filesharing inside organizational groups. Our mine planning office is in the States; the mine operation whose data I work is in a remote, poorly connected location of the world.

Recently Sharepoint was updated to a new version at the mine. The SharePoint server configuration there no longer allows Windows Explorer view or mapping of the site to a Windows drive letter. I've put in a trouble ticket to regain this functionality, but that may take a while if it's possible. Without it it is difficult to automate file retrieval or get more than one file at a time.

In the meantime I've been able to get the text based files over using win32com automation in Python to run Internet Explorer and grab the innerHTML object. innerHTML is essentially the text of the files with tags around it. I rip out the tags, write the text to a file on my harddrive and I'm good to go.

Binary files proved to be more difficult to download. Shown below is a screenshot of the Internet Explorer 9 dialog box that goes by the generic name Notification Bar:

I googled and could nowhere find how this thing fit into the Internet Explorer 9 Document object hierarchy. Then I came upon this colorful exchange between Microsoft Certified MVP's from 2012 that made things a little more clear.
It turns out you can't access the Notification Bar programatically per se. What you can do is activate the specific Internet Explorer window and tab you're interested in, then send keystrokes to get where you want to, click, and download your file.
I'm not a web programmer nor am I a dedicated Windows programmer (I'm actually a geologist). IEC is a small module that wraps some useful functionality - in my case identifying and clicking on the link on the SharePoint page by it's text identifier:
# C Python 2.7
# Internet Explorer module.
import IEC as iec
import time
ie = iec.IEController()
ie.Navigate(<URL of SharePoint page>)
# Give the page time to load (7 seconds).
time.sleep(7)
# I want to download file 11.msr.
ie.ClickLink('11')
# Give 5 seconds for the Notification Bar to show up.
time.sleep(5)
I'm fortunate in that our mine planning vendor, MineSight, ships Python 2.7 and associated win32com packages along with their software (their API's are written for Python). If you don't have win32com and friends installed, they are necessary for this solution.
At this point I've just got to deal with that pesky Internet Explorer 9 Notification Bar. As it turns out, SendKeys makes it doable (although neither elegant nor robust :-( ):
# Activate the SharePoint page.
from win32com.client import Dispatch as dispx
shell = dispx('WScript.Shell')
shell.AppActivate(<name of IE9 tab>)
# Little pause.
time.sleep(0.5)
# Keyboard combination for the Notification Bar selection
# is ALT-N or '%n'
shell.SendKeys('%n', True)
# The Notification Bar goes to "Open" by default.
# You need to tab over to the "Save" button.
shell.SendKeys('{TAB}')
# Another little pause.
time.sleep(0.1)
# Space bar clicks on this control.
shell.SendKeys(' ', True)
The key combinations for accessing the Notification Bar are in Microsoft's documentation here.
One link showing use of SendKeys is a German site (mostly English text) here.
And that's pretty much it. There's another dialog that pops up in Internet Explorer 9 after the file is downloaded. I've been able to blow that off so far and it hasn't gotten in the way as I move to the next download. I give these files (about 300 kb) 15 seconds to download over a slow connection. I may have to adjust that.
This solution is an abomination by any coding/architecture/durability standard. Still, it's the abomination that is getting the job done for the time being.
Thanks for stopping by.

01 Sep 2014 3:36am GMT

Carl Trachte: Internet Explorer 9 Save Dialog - SendKeys Last Resort

At work we use Internet Explorer 9 on Windows 7 Enterprise. SharePoint is the favored software for filesharing inside organizational groups. Our mine planning office is in the States; the mine operation whose data I work is in a remote, poorly connected location of the world.

Recently Sharepoint was updated to a new version at the mine. The SharePoint server configuration there no longer allows Windows Explorer view or mapping of the site to a Windows drive letter. I've put in a trouble ticket to regain this functionality, but that may take a while if it's possible. Without it it is difficult to automate file retrieval or get more than one file at a time.

In the meantime I've been able to get the text based files over using win32com automation in Python to run Internet Explorer and grab the innerHTML object. innerHTML is essentially the text of the files with tags around it. I rip out the tags, write the text to a file on my harddrive and I'm good to go.

Binary files proved to be more difficult to download. Shown below is a screenshot of the Internet Explorer 9 dialog box that goes by the generic name Notification Bar:

I googled and could nowhere find how this thing fit into the Internet Explorer 9 Document object hierarchy. Then I came upon this colorful exchange between Microsoft Certified MVP's from 2012 that made things a little more clear.
It turns out you can't access the Notification Bar programatically per se. What you can do is activate the specific Internet Explorer window and tab you're interested in, then send keystrokes to get where you want to, click, and download your file.
I'm not a web programmer nor am I a dedicated Windows programmer (I'm actually a geologist). IEC is a small module that wraps some useful functionality - in my case identifying and clicking on the link on the SharePoint page by it's text identifier:
# C Python 2.7
# Internet Explorer module.
import IEC as iec
import time
ie = iec.IEController()
ie.Navigate(<URL of SharePoint page>)
# Give the page time to load (7 seconds).
time.sleep(7)
# I want to download file 11.msr.
ie.ClickLink('11')
# Give 5 seconds for the Notification Bar to show up.
time.sleep(5)
I'm fortunate in that our mine planning vendor, MineSight, ships Python 2.7 and associated win32com packages along with their software (their API's are written for Python). If you don't have win32com and friends installed, they are necessary for this solution.
At this point I've just got to deal with that pesky Internet Explorer 9 Notification Bar. As it turns out, SendKeys makes it doable (although neither elegant nor robust :-( ):
# Activate the SharePoint page.
from win32com.client import Dispatch as dispx
shell = dispx('WScript.Shell')
shell.AppActivate(<name of IE9 tab>)
# Little pause.
time.sleep(0.5)
# Keyboard combination for the Notification Bar selection
# is ALT-N or '%n'
shell.SendKeys('%n', True)
# The Notification Bar goes to "Open" by default.
# You need to tab over to the "Save" button.
shell.SendKeys('{TAB}')
# Another little pause.
time.sleep(0.1)
# Space bar clicks on this control.
shell.SendKeys(' ', True)
The key combinations for accessing the Notification Bar are in Microsoft's documentation here.
One link showing use of SendKeys is a German site (mostly English text) here.
And that's pretty much it. There's another dialog that pops up in Internet Explorer 9 after the file is downloaded. I've been able to blow that off so far and it hasn't gotten in the way as I move to the next download. I give these files (about 300 kb) 15 seconds to download over a slow connection. I may have to adjust that.
This solution is an abomination by any coding/architecture/durability standard. Still, it's the abomination that is getting the job done for the time being.
Thanks for stopping by.

01 Sep 2014 3:36am GMT

31 Aug 2014

feedPlanet Python

Varun Nischal: code4Py | Style Context Differences

As per recently created page, the following diff command output representing context differences, needed to be styled; [vagrant@localhost python]$ diff -c A B *** A 2014-08-20 20:13:30.315009258 +0000 --- B 2014-08-20 20:13:39.021009349 +0000 *************** *** 1,6 **** --- 1,9 ---- + typeset -i sum=0 + while read num do printf "%d " ${num} + sum=sum+${num} done … Continue reading

31 Aug 2014 6:29pm GMT

Varun Nischal: code4Py | Style Context Differences

As per recently created page, the following diff command output representing context differences, needed to be styled; [vagrant@localhost python]$ diff -c A B *** A 2014-08-20 20:13:30.315009258 +0000 --- B 2014-08-20 20:13:39.021009349 +0000 *************** *** 1,6 **** --- 1,9 ---- + typeset -i sum=0 + while read num do printf "%d " ${num} + sum=sum+${num} done … Continue reading

31 Aug 2014 6:29pm GMT

Europython: EuroPython 2014 Feedback Form

EuroPython 2014 was a great event and we'd like to learn from you how to make EuroPython 2015 even better. If you attended EuroPython 2014, please take a few moments and fill out our feedback form:

EuroPython 2014 Feedback Form

We will leave the feedback form online for another two weeks and then use the information as basis for the work on EuroPython 2015 and also post a summary of the multiple choice questions (not the comments to protect your privacy) on our website. Many thanks in advance.

Helping with EuroPython 2015

If you would like to help with EuroPython 2015, we invite you to join the EuroPython Society. Membership is free. Just go to our application page and enter your details.

In the coming months, we will start the discussions about the new work group model we've announced at the conference.

Enjoy,
-
EuroPython Society

31 Aug 2014 10:48am GMT

Europython: EuroPython 2014 Feedback Form

EuroPython 2014 was a great event and we'd like to learn from you how to make EuroPython 2015 even better. If you attended EuroPython 2014, please take a few moments and fill out our feedback form:

EuroPython 2014 Feedback Form

We will leave the feedback form online for another two weeks and then use the information as basis for the work on EuroPython 2015 and also post a summary of the multiple choice questions (not the comments to protect your privacy) on our website. Many thanks in advance.

Helping with EuroPython 2015

If you would like to help with EuroPython 2015, we invite you to join the EuroPython Society. Membership is free. Just go to our application page and enter your details.

In the coming months, we will start the discussions about the new work group model we've announced at the conference.

Enjoy,
-
EuroPython Society

31 Aug 2014 10:48am GMT

EuroPython Society: EuroPython 2014 Feedback Form

EuroPython 2014 was a great event and we'd like to learn from you how to make EuroPython 2015 even better. If you attended EuroPython 2014, please take a few moments and fill out our feedback form:

EuroPython 2014 Feedback Form

We will leave the feedback form online for another two weeks and then use the information as basis for the work on EuroPython 2015 and also post a summary of the multiple choice questions (not the comments to protect your privacy) on our website. Many thanks in advance.

Helping with EuroPython 2015

If you would like to help with EuroPython 2015, we invite you to join the EuroPython Society. Membership is free. Just go to our application page and enter your details.

In the coming months, we will start the discussions about the new work group model we've announced at the conference.

Enjoy,
-
EuroPython Society

31 Aug 2014 10:46am GMT

EuroPython Society: EuroPython 2014 Feedback Form

EuroPython 2014 was a great event and we'd like to learn from you how to make EuroPython 2015 even better. If you attended EuroPython 2014, please take a few moments and fill out our feedback form:

EuroPython 2014 Feedback Form

We will leave the feedback form online for another two weeks and then use the information as basis for the work on EuroPython 2015 and also post a summary of the multiple choice questions (not the comments to protect your privacy) on our website. Many thanks in advance.

Helping with EuroPython 2015

If you would like to help with EuroPython 2015, we invite you to join the EuroPython Society. Membership is free. Just go to our application page and enter your details.

In the coming months, we will start the discussions about the new work group model we've announced at the conference.

Enjoy,
-
EuroPython Society

31 Aug 2014 10:46am GMT

11 Oct 2013

feedPython Software Foundation | GSoC'11 Students

Yeswanth Swami: How I kicked off GSoC

Zero to hero

What Prompted me??

I started my third year thinking I should do something that would put me different from the rest and one of my professors suggested me as to why don't I apply for GSoC. I don't know why but I took the suggestion rather seriously, thanks to the bet I had with one of my friend(who is about to complete his MBBS) that whoever earns first will buy the other a "RayBan shades". Well, that's it. I was determined. I started my research early, probably during the start of February(I knew I want to buy my friend, his shades and also buy mine too, in the process).

What experiences I had before??

I started looking at previous years' GSoC projects(having had little experience with Open Source) and started learning how to contribute. I was also very fascinated to the amount of knowledge one could gain just by googling and browsing web pages . I discovered very soon, as to what an immensely great tool , email, through which I could chat with anyone in the open source world and ask seemingly stupid questions and always expect to get a gentle reply back with an answer. Well, that held me spell bound and I knew I want to contribute to Open Source.

How did I begin??

About the middle of March, I discovered that my passion for Python as a programming language increased , after understanding how easy it is as a language. Added to that, my popularity among my fellow classmates increased when I started evangelizing Python(thanks to my seniors for introducing it, I guess I did a decent job popularizing the language). And I started contributing to PSF(Python Software Foundation) , started with a simple bug to fix documentation and slowly my interactivity in IRC increased and I started liking one of the project one of the community member proposed.

A twist in the story??

There I was, still a noob and not knowing how to convince my probable mentor that I could complete the project, given direction. About this juncture, a fellow student(from some university in France) mailed this particular mentor that he was interested in the project . Do, remember, I was part of the mailing list and follow the happenings of it. So, I was furious knowing that I had a competition(having put so much effort) and I was not willing to compromise my project (knowing that this is the one project I actually understood and started researching a little bit too). The other projects require me to have some domain knowledge. I went back to my teachers, seniors, friends and Google and started asking the question , "how would i solve the problem the mentor posted?" . I framed a couple of answers, though very noobish , but at least I could reply the email thread posting my understanding of the problem and how I would solve it and also ask various questions I had in my mind. Well, the mentor replied, immediately to my surprise, and responded back with comments as well as answers to the questions I posed. Again, my nemesis/competitor replied back(he having good knowledge about the problem domain). I knew it was not going to be easy. Hence, I went back again, through all my sources, made further understanding of the problem and posted back again. I guess, about 20 mails in the thread , till we(all three of us) decided we should catch up in IRC and discuss more.

The conclusion:

Well, at IRC , most of senior members from the community were present, and they suggested that they should probably extend the scope of the project(since two students were interested in one project and showed immense passion). Unsurprisingly, over multiple meetings, the project scope was expanded, both the students were given equal important but independent tasks and both the students got opportunity to say they are Google Summer of Code students. Thank goodness, we decided to built the project from scratch giving us more than enough work on our plate.

Movie titles:

1) In the open source world, there is no competition , it is only "COLLABORATION".

2) Why give up, when you can win??

3) From Zero to Hero!!

4) A prodigy in making

p.s. I still owe my friend his shades . *sshole, I am still waiting for him to visit me so that I can buy him his shades and buy mine too. Also, I know its been two years since the story happened, but it is never too late to share, don't you agree??


11 Oct 2013 5:39am GMT