20 Oct 2014

feedPlanet Python

Mike Driscoll: PyWin32 – How to Bring a Window to Front

I recently saw someone asking how to bring a window to the front in Windows and I realized I had had some old unreleased code that might help someone with this task. A long time ago, Tim Golden (and possibly some other fellows on the PyWin32 mailing list) showed me how to make windows come to the front on Windows XP. If you'd like to follow along, you will need to download and install your own copy of PyWin32.

We will need to choose something to bring to the front. I like to use Notepad for testing as I know it will be on every Windows desktop in existence. Open up Notepad and then put some other application's window in front of it.

Now we're ready to look at some code:

import win32gui
 
def windowEnumerationHandler(hwnd, top_windows):
    top_windows.append((hwnd, win32gui.GetWindowText(hwnd)))
 
if __name__ == "__main__":
    results = []
    top_windows = []
    win32gui.EnumWindows(windowEnumerationHandler, top_windows)
    for i in top_windows:
        if "notepad" in i[1].lower():
            print i
            win32gui.ShowWindow(i[0],5)
            win32gui.SetForegroundWindow(i[0])
            break

We only need PyWin32's win32gui module for this little script. We write a little function that takes a window handle and a Python list. Then we call win32gui's EnumWindows method, which takes a callback and an extra argument that is a Python object. According to the documentation, the EnumWindows method "Enumerates all top-level windows on the screen by passing the handle to each window, in turn, to an application-defined callback function". So we pass it our method and it enumerates the windows, passing a handle of each window plus our Python list to our function. It works kind of like a messed up decorator.

Once that's done, your top_windows list will be full of lots of items, most of which you didn't even know were running. You can print that our and inspect your results if you like. It's really quite intereting. But for our purposes, we will skip that and just loop over the list, looking for the word "Notepad". Once we find it, we use win32gui's ShowWindow and SetForegroundWindow methods to bring the application to the foreground.

Note that really need to look for a unique string so that you bring up the right window. What would happen if you had multiple Notepad instance running with different files open? With the current code, you would bring the first Notepad instance that it found forward, which might not be what you want.

You may be wondering why anyone would even want to go to the trouble of doing this in the first place. In my case, I once had a project where I had to bring a certain window to the foreground and enter automate it using SendKeys. It was an ugly piece of brittle code that I wouldn't wish on anyone. Fortunately, there are better tools for that sort of thing nowadays such as pywinauto, but you still might find this code helpful in something esoteric that is thrown your way. Have fun!

20 Oct 2014 10:15pm GMT

Mike Driscoll: PyWin32 – How to Bring a Window to Front

I recently saw someone asking how to bring a window to the front in Windows and I realized I had had some old unreleased code that might help someone with this task. A long time ago, Tim Golden (and possibly some other fellows on the PyWin32 mailing list) showed me how to make windows come to the front on Windows XP. If you'd like to follow along, you will need to download and install your own copy of PyWin32.

We will need to choose something to bring to the front. I like to use Notepad for testing as I know it will be on every Windows desktop in existence. Open up Notepad and then put some other application's window in front of it.

Now we're ready to look at some code:

import win32gui
 
def windowEnumerationHandler(hwnd, top_windows):
    top_windows.append((hwnd, win32gui.GetWindowText(hwnd)))
 
if __name__ == "__main__":
    results = []
    top_windows = []
    win32gui.EnumWindows(windowEnumerationHandler, top_windows)
    for i in top_windows:
        if "notepad" in i[1].lower():
            print i
            win32gui.ShowWindow(i[0],5)
            win32gui.SetForegroundWindow(i[0])
            break

We only need PyWin32's win32gui module for this little script. We write a little function that takes a window handle and a Python list. Then we call win32gui's EnumWindows method, which takes a callback and an extra argument that is a Python object. According to the documentation, the EnumWindows method "Enumerates all top-level windows on the screen by passing the handle to each window, in turn, to an application-defined callback function". So we pass it our method and it enumerates the windows, passing a handle of each window plus our Python list to our function. It works kind of like a messed up decorator.

Once that's done, your top_windows list will be full of lots of items, most of which you didn't even know were running. You can print that our and inspect your results if you like. It's really quite intereting. But for our purposes, we will skip that and just loop over the list, looking for the word "Notepad". Once we find it, we use win32gui's ShowWindow and SetForegroundWindow methods to bring the application to the foreground.

Note that really need to look for a unique string so that you bring up the right window. What would happen if you had multiple Notepad instance running with different files open? With the current code, you would bring the first Notepad instance that it found forward, which might not be what you want.

You may be wondering why anyone would even want to go to the trouble of doing this in the first place. In my case, I once had a project where I had to bring a certain window to the foreground and enter automate it using SendKeys. It was an ugly piece of brittle code that I wouldn't wish on anyone. Fortunately, there are better tools for that sort of thing nowadays such as pywinauto, but you still might find this code helpful in something esoteric that is thrown your way. Have fun!

20 Oct 2014 10:15pm GMT

Carl Trachte: subprocess.Popen() or Abusing a Home-grown Windows Executable

Each month I redo 3D block model interpolations for a series of open pits at a distant mine. Those of you who follow my twitter feed often see me tweet, "The 3D geologic block model interpolation chuggeth . . ." What's going on is that I've got all the processing power maxed out dealing with millions of model blocks and thousands of data points. The machine heats up and with the fan sounds like a DC-9 warming up before flight.

All that said, running everything roughly in parallel is more efficient time-wise than running it sequentially. An hour of chugging is better than four. The way I've been doing this is using the Python (2.7) subprocess module's Popen method, running my five interpolated values in parallel. Our Python programmer Lori originally wrote this to run in sequence for a different set of problems. I bastardized it for my own.

The subprocess part of the code is relatively straightforward. Function startprocess() in my code covers that.

What makes this problem a little more challenging:

1) it's a vendor supplied executable we're dealing with . . . without an API or source . . . that's interactive (you can't feed it the config file path; it asks for it). This results in a number of time.sleep() and <process>.stdin.write() calls that can be brittle.

2) getting the processes started, as I just mentioned, is easy. Finding out when to stop, or kill them, requires knowledge of the app and how it generates output. I've gone for an ugly, but effective check of report file contents.

3) while waiting for the processes to finish their work, I need to know things are working and what's going on. I've accomplished this by reporting the data files' sizes in MB.

4) the executable isn't designed for a centralized code base (typically all scripts are kept in a folder for the specific project or pit), so it only allows about 100 character columns in the file paths sent to it. I've omitted this from my sanitized version of the code, but it made things even messier than they are below. Also, I don't know if all Windows programs do this, but the paths need to be inside quotes - the path kept breaking on the colon (:) when not quoted.

Basically, this is a fairly ugly problem and a script that requires babysitting while it runs. That's OK; it beats the alternative (running it sequentially while watching each run). I've tried to adhere to DRY (don't repeat yourself) as much as possible, but I suspect this could be improved upon.

The reason why I blog it is that I suspect there are other people out there who have to do the same sort of thing with their data. It doesn't have to be a mining problem. It can be anything that requires intensive computation across voluminous data with an executable not designed with a Python API.

Notes:

1) I've omitted the file multirunparameters.py that's in an import statement. It has a bunch of paths and names that are relevant to my project, but not to the reader's programming needs.

2) python 2.7 is listed at the top of the file as "mpython." This is the Python that our mine planning vendor ships that ties into their quite capable Python API. The executable I call with subprocess.Popen() is a Windows executable provided by a consultant independent of the mine planning vendor. It just makes sense to package this interpolation inside the mine planning vendor's multirun (~ batch file) framework as part of an overall working of the 3D geologic block model. The script exits as soon as this part of the batch is complete. I've inserted a 10 second pause at the end just to allow a quick look before it disappears.

#!C:/MineSight/x64/mpython

"""
Interpolate grades with <consultant> program
from text files.
"""


import argparse
import subprocess as subx
import os
import collections as colx

import time
from datetime import datetime as dt


# Lookup file of constants, pit names, assay names, paths, etc.
import multirunparameters as paramsx


parser = argparse.ArgumentParser()
# 4 letter argument like 'kwat'
# Feed in at command line.
parser.add_argument('pit', help='four letter, lower case pit abbreviation (kwat)', type=str)
args = parser.parse_args()
PIT = args.pit


pitdir = paramsx.PATHS[PIT]
pathx = paramsx.BASEPATH.format(pitdir)
controlfilepathx = paramsx.CONTROLFILEPATH.format(pitdir)


timestart = dt.now()
print(timestart)


PROGRAM = 'C:/MSPROJECTS/EOMReconciliation/2014/Multirun/AllPits/consultantprogram.exe'

ENDTEXT = 'END <consultant> REPORT'

# These names are the only real difference between pits.
# Double quote is for subprocess.Popen object's stdin.write method
# - Windows path breaks on colon without quotes.
ASSAY1DRIVER = 'KDriverASSAY1{:s}CBT.csv"'.format(PIT)
ASSAY2DRIVER = 'KDriverASSAY2{:s}CBT.csv"'.format(PIT)
ASSAY3DRIVER = 'KDriverASSAY3_{:s}CBT.csv"'.format(PIT)
ASSAY4DRIVER = 'KDriverASSAY4_{:s}CBT.csv"'.format(PIT)
ASSAY5DRIVER = 'KDriverASSAY5_{:s}CBT.csv"'.format(PIT)


RETCHAR = '\n'

ASSAY1 = 'ASSAY1'
ASSAY2 = 'ASSAY2'
ASSAY3 = 'ASSAY3'
ASSAY4 = 'ASSAY4'
ASSAY5 = 'ASSAY5'


NAME = 'name'
DRFILE = 'driver file'
OUTPUT = 'output'
DATFILE = 'data file'
RPTFILE = 'report file'


# data, report files
ASSAY1K = 'ASSAY1K.csv'
ASSAY1RPT = 'ASSAY1.RPT'

ASSAY2K = 'ASSAY2K.csv'
ASSAY2RPT = 'ASSAY2.RPT'

ASSAY3K = 'ASSAY3K.csv'
ASSAY3RPT = 'ASSAY3.RPT'

ASSAY4K = 'ASSAY4K.csv'
ASSAY4RPT = 'ASSAY4.RPT'

ASSAY5K = 'ASSAY5K.csv'
ASSAY5RPT = 'ASSAY5.RPT'


OUTPUTFMT = '{:s}output.txt'

ASSAYS = {1:{NAME:ASSAY1,
DRFILE:controlfilepathx + ASSAY1DRIVER,
OUTPUT:pathx + OUTPUTFMT.format(ASSAY1),
DATFILE:pathx + ASSAY1K,
RPTFILE:pathx + ASSAY1RPT},
2:{NAME:ASSAY2,
DRFILE:controlfilepathx + ASSAY2DRIVER,
OUTPUT:pathx + OUTPUTFMT.format(ASSAY2),
DATFILE:pathx + ASSAY2K,
RPTFILE:pathx + ASSAY2RPT},
3:{NAME:ASSAY3,
DRFILE:controlfilepathx + ASSAY3DRIVER,
OUTPUT:pathx + OUTPUTFMT.format(ASSAY3),
DATFILE:pathx + ASSAY3K,
RPTFILE:pathx + ASSAY3RPT},
4:{NAME:ASSAY4,
DRFILE:controlfilepathx + ASSAY4DRIVER,
OUTPUT:pathx + OUTPUTFMT.format(ASSAY4),
DATFILE:pathx + ASSAY4K,
RPTFILE:pathx + ASSAY4RPT},
5:{NAME:ASSAY5,
DRFILE:controlfilepathx + ASSAY5DRIVER,
OUTPUT:pathx + OUTPUTFMT.format(ASSAY5),
DATFILE:pathx + ASSAY5K,
RPTFILE:pathx + ASSAY5RPT}}


DELFILE = 'delete file'
INTERP = 'interp'
SLEEP = 'sleep'
MSGDRIVER = 'message driver'
MSGRETCHAR = 'message return character'
FINISHED1 = 'finished one assay'
FINISHEDALL = 'finished all interpolations'
TIMEELAPSED = 'time elapsed'
FILEEXISTS = 'report file exists'
DATSIZE = 'data file size'
DONE = 'number interpolations finished'
DATFILEEXIST = 'data file not yet there'
SIZECHANGE = 'report file changed size'


# for converting to megabyte file size from os.stat()
BITSHIFT = 20

# sleeptime - 5 seconds
SLEEPTIME = 5

FINISHED = 'finished'
RPTFILECHSIZE = """

Report file for {:s}
changed size; killing process . . .

"""

MESGS = {DELFILE:'\n\nDeleting {} . . .\n\n',
INTERP:'\n\nInterpolating {:s} . . .\n\n',
SLEEP:'\nSleeping 2 seconds . . .\n\n',
MSGDRIVER:'\n\nWriting driver file name to stdin . . .\n\n',
MSGRETCHAR:'\n\nWriting retchar to stdin for {:s} . . .\n\n',
FINISHED1:'\n\nFinished {:s}\n\n',
FINISHEDALL:'\n\nFinished interpolation.\n\n',
TIMEELAPSED:'\n\n{:d} elapsed seconds\n\n',
FILEEXISTS:'\n\nReport file for {:s} exists . . .\n\n',
DATSIZE:'\n\nData file size for {:s} is now {:d}MB . . .\n\n',
DONE:'\n\n{:d} out of {:d} assays are finished . . .\n\n',
DATFILEEXIST:"\n\n{:s} doesn't exist yet . . .\n\n",
SIZECHANGE:RPTFILECHSIZE}


def cleanslate():
"""
Delete all output files prior to interpolation
so that their existence can be tracked.
"""
for key in ASSAYS:
files = (ASSAYS[key][DATFILE],
ASSAYS[key][RPTFILE],
ASSAYS[key][OUTPUT])
for filex in files:
print(MESGS[DELFILE].format(filex))
if os.path.exists(filex) and os.path.isfile(filex):
os.remove(filex)
return 0


def startprocess(assay):
"""
Start <consultant program> run for given interpolation.

Return subprocess.Popen object,
file object (output file).
"""
print(MESGS[INTERP].format(ASSAYS[assay][NAME]))
# XXX - I hate time.sleep - hack
# XXX - try to re-route standard output so that
# it's not all jumbled together.
print(MESGS[SLEEP])
time.sleep(2)
# output file for stdout
f = open(ASSAYS[assay][OUTPUT], 'w')
procx = subx.Popen('{0}'.format(PROGRAM), stdin=subx.PIPE, stdout=f)
print(MESGS[SLEEP])
time.sleep(2)
# XXX - problem, starting up Excel CBT 22JUN2014
# Ah - this is what happens when the <software usb licence>
# key is not attached :-(
print(MESGS[MSGDRIVER])
print('\ndriver file = {:s}\n'.format(ASSAYS[assay][DRFILE]))
procx.stdin.write(ASSAYS[assay][DRFILE])
print(MESGS[SLEEP])
time.sleep(2)
# XXX - this is so jacked up -
# no idea what is happening when
print(MESGS[MSGRETCHAR].format(ASSAYS[assay][NAME]))
procx.stdin.write(RETCHAR)
print(MESGS[SLEEP])
time.sleep(2)
print(MESGS[MSGRETCHAR].format(ASSAYS[assay][NAME]))
procx.stdin.write(RETCHAR)
print(MESGS[SLEEP])
time.sleep(2)
return procx, f


def crosslookup(assay):
"""
From assay string, get numeric
key for ASSAYS dictionary.

Returns integer.
"""
for key in ASSAYS:
if assay == ASSAYS[key][NAME]:
return key
return 0


def checkprocess(assay, assaydict):
"""
Check to see if assay
interpolation is finished.

assay is the item in question
(ASSAY1, ASSAY2, etc.).

assaydict is the operating dictionary
for the assay in question.

Returns True if finished.
"""
# Report file indicates process finished.
assaykey = crosslookup(assay)
rptfile = ASSAYS[assaykey][RPTFILE]
datfile = ASSAYS[assaykey][DATFILE]
if os.path.exists(datfile) and os.path.isfile(datfile):
# Report size of file in MB.
datfilesize = os.stat(datfile).st_size >> BITSHIFT
print(MESGS[DATSIZE].format(assay, datfilesize))
else:
# Doesn't exist yet.
print(MESGS[DATFILEEXIST].format(datfile))
if os.path.exists(rptfile) and os.path.isfile(rptfile):
# XXX - not the most efficient way,
# but this checking the file appears
# to work best.
f = open(rptfile, 'r')
txt = f.read()
f.close()
# XXX - hack - gah.
if txt.find(ENDTEXT) > -1:
# looking for change in reportfile size
# or big report file
print(MESGS[SIZECHANGE].format(assay))
print(MESGS[SLEEP])
time.sleep(2)
return True
return False


PROCX = 'process'
OUTPUTFILE = 'output file'


# Keeps track of files and progress of <consultant program>.
opdict = colx.OrderedDict()


# get rid of preexisting files
cleanslate()


# start all five roughly in parallel
# ASSAYS keys are numbers
for key in ASSAYS:
# opdict - ordered with assay names as keys
namex = ASSAYS[key][NAME]
opdict[namex] = {}
assaydict = opdict[namex]
assaydict[PROCX], assaydict[OUTPUTFILE] = startprocess(key)
# Initialize active status of process.
assaydict[FINISHED] = False


# For count.
numassays = len(ASSAYS)
# Loop until all finished.
while True:
# Cycle until done then break.
# Sleep SLEEPTIME seconds at a time and check between.
time.sleep(SLEEPTIME)
# Count.
i = 0
for key in opdict:
assaydict = opdict[key]
if not assaydict[FINISHED]:
status = checkprocess(key, assaydict)
if status:
# kill process when report file changes
opdict[key][PROCX].kill()
assaydict[FINISHED] = True
i += 1
else:
i += 1
print(MESGS[DONE].format(i, numassays))
# all done
if i == numassays:
break


print('\n\nFinished interpolation.\n\n')
timeend = dt.now()
elapsed = timeend - timestart


print(MESGS[TIMEELAPSED].format(elapsed.seconds))
print('\n\n{:d} elapsed minutes\n\n'.format(elapsed.seconds/60))


# Allow quick look at screen.
time.sleep(10)



20 Oct 2014 9:11pm GMT

Carl Trachte: subprocess.Popen() or Abusing a Home-grown Windows Executable

Each month I redo 3D block model interpolations for a series of open pits at a distant mine. Those of you who follow my twitter feed often see me tweet, "The 3D geologic block model interpolation chuggeth . . ." What's going on is that I've got all the processing power maxed out dealing with millions of model blocks and thousands of data points. The machine heats up and with the fan sounds like a DC-9 warming up before flight.

All that said, running everything roughly in parallel is more efficient time-wise than running it sequentially. An hour of chugging is better than four. The way I've been doing this is using the Python (2.7) subprocess module's Popen method, running my five interpolated values in parallel. Our Python programmer Lori originally wrote this to run in sequence for a different set of problems. I bastardized it for my own.

The subprocess part of the code is relatively straightforward. Function startprocess() in my code covers that.

What makes this problem a little more challenging:

1) it's a vendor supplied executable we're dealing with . . . without an API or source . . . that's interactive (you can't feed it the config file path; it asks for it). This results in a number of time.sleep() and <process>.stdin.write() calls that can be brittle.

2) getting the processes started, as I just mentioned, is easy. Finding out when to stop, or kill them, requires knowledge of the app and how it generates output. I've gone for an ugly, but effective check of report file contents.

3) while waiting for the processes to finish their work, I need to know things are working and what's going on. I've accomplished this by reporting the data files' sizes in MB.

4) the executable isn't designed for a centralized code base (typically all scripts are kept in a folder for the specific project or pit), so it only allows about 100 character columns in the file paths sent to it. I've omitted this from my sanitized version of the code, but it made things even messier than they are below. Also, I don't know if all Windows programs do this, but the paths need to be inside quotes - the path kept breaking on the colon (:) when not quoted.

Basically, this is a fairly ugly problem and a script that requires babysitting while it runs. That's OK; it beats the alternative (running it sequentially while watching each run). I've tried to adhere to DRY (don't repeat yourself) as much as possible, but I suspect this could be improved upon.

The reason why I blog it is that I suspect there are other people out there who have to do the same sort of thing with their data. It doesn't have to be a mining problem. It can be anything that requires intensive computation across voluminous data with an executable not designed with a Python API.

Notes:

1) I've omitted the file multirunparameters.py that's in an import statement. It has a bunch of paths and names that are relevant to my project, but not to the reader's programming needs.

2) python 2.7 is listed at the top of the file as "mpython." This is the Python that our mine planning vendor ships that ties into their quite capable Python API. The executable I call with subprocess.Popen() is a Windows executable provided by a consultant independent of the mine planning vendor. It just makes sense to package this interpolation inside the mine planning vendor's multirun (~ batch file) framework as part of an overall working of the 3D geologic block model. The script exits as soon as this part of the batch is complete. I've inserted a 10 second pause at the end just to allow a quick look before it disappears.

#!C:/MineSight/x64/mpython

"""
Interpolate grades with <consultant> program
from text files.
"""


import argparse
import subprocess as subx
import os
import collections as colx

import time
from datetime import datetime as dt


# Lookup file of constants, pit names, assay names, paths, etc.
import multirunparameters as paramsx


parser = argparse.ArgumentParser()
# 4 letter argument like 'kwat'
# Feed in at command line.
parser.add_argument('pit', help='four letter, lower case pit abbreviation (kwat)', type=str)
args = parser.parse_args()
PIT = args.pit


pitdir = paramsx.PATHS[PIT]
pathx = paramsx.BASEPATH.format(pitdir)
controlfilepathx = paramsx.CONTROLFILEPATH.format(pitdir)


timestart = dt.now()
print(timestart)


PROGRAM = 'C:/MSPROJECTS/EOMReconciliation/2014/Multirun/AllPits/consultantprogram.exe'

ENDTEXT = 'END <consultant> REPORT'

# These names are the only real difference between pits.
# Double quote is for subprocess.Popen object's stdin.write method
# - Windows path breaks on colon without quotes.
ASSAY1DRIVER = 'KDriverASSAY1{:s}CBT.csv"'.format(PIT)
ASSAY2DRIVER = 'KDriverASSAY2{:s}CBT.csv"'.format(PIT)
ASSAY3DRIVER = 'KDriverASSAY3_{:s}CBT.csv"'.format(PIT)
ASSAY4DRIVER = 'KDriverASSAY4_{:s}CBT.csv"'.format(PIT)
ASSAY5DRIVER = 'KDriverASSAY5_{:s}CBT.csv"'.format(PIT)


RETCHAR = '\n'

ASSAY1 = 'ASSAY1'
ASSAY2 = 'ASSAY2'
ASSAY3 = 'ASSAY3'
ASSAY4 = 'ASSAY4'
ASSAY5 = 'ASSAY5'


NAME = 'name'
DRFILE = 'driver file'
OUTPUT = 'output'
DATFILE = 'data file'
RPTFILE = 'report file'


# data, report files
ASSAY1K = 'ASSAY1K.csv'
ASSAY1RPT = 'ASSAY1.RPT'

ASSAY2K = 'ASSAY2K.csv'
ASSAY2RPT = 'ASSAY2.RPT'

ASSAY3K = 'ASSAY3K.csv'
ASSAY3RPT = 'ASSAY3.RPT'

ASSAY4K = 'ASSAY4K.csv'
ASSAY4RPT = 'ASSAY4.RPT'

ASSAY5K = 'ASSAY5K.csv'
ASSAY5RPT = 'ASSAY5.RPT'


OUTPUTFMT = '{:s}output.txt'

ASSAYS = {1:{NAME:ASSAY1,
DRFILE:controlfilepathx + ASSAY1DRIVER,
OUTPUT:pathx + OUTPUTFMT.format(ASSAY1),
DATFILE:pathx + ASSAY1K,
RPTFILE:pathx + ASSAY1RPT},
2:{NAME:ASSAY2,
DRFILE:controlfilepathx + ASSAY2DRIVER,
OUTPUT:pathx + OUTPUTFMT.format(ASSAY2),
DATFILE:pathx + ASSAY2K,
RPTFILE:pathx + ASSAY2RPT},
3:{NAME:ASSAY3,
DRFILE:controlfilepathx + ASSAY3DRIVER,
OUTPUT:pathx + OUTPUTFMT.format(ASSAY3),
DATFILE:pathx + ASSAY3K,
RPTFILE:pathx + ASSAY3RPT},
4:{NAME:ASSAY4,
DRFILE:controlfilepathx + ASSAY4DRIVER,
OUTPUT:pathx + OUTPUTFMT.format(ASSAY4),
DATFILE:pathx + ASSAY4K,
RPTFILE:pathx + ASSAY4RPT},
5:{NAME:ASSAY5,
DRFILE:controlfilepathx + ASSAY5DRIVER,
OUTPUT:pathx + OUTPUTFMT.format(ASSAY5),
DATFILE:pathx + ASSAY5K,
RPTFILE:pathx + ASSAY5RPT}}


DELFILE = 'delete file'
INTERP = 'interp'
SLEEP = 'sleep'
MSGDRIVER = 'message driver'
MSGRETCHAR = 'message return character'
FINISHED1 = 'finished one assay'
FINISHEDALL = 'finished all interpolations'
TIMEELAPSED = 'time elapsed'
FILEEXISTS = 'report file exists'
DATSIZE = 'data file size'
DONE = 'number interpolations finished'
DATFILEEXIST = 'data file not yet there'
SIZECHANGE = 'report file changed size'


# for converting to megabyte file size from os.stat()
BITSHIFT = 20

# sleeptime - 5 seconds
SLEEPTIME = 5

FINISHED = 'finished'
RPTFILECHSIZE = """

Report file for {:s}
changed size; killing process . . .

"""

MESGS = {DELFILE:'\n\nDeleting {} . . .\n\n',
INTERP:'\n\nInterpolating {:s} . . .\n\n',
SLEEP:'\nSleeping 2 seconds . . .\n\n',
MSGDRIVER:'\n\nWriting driver file name to stdin . . .\n\n',
MSGRETCHAR:'\n\nWriting retchar to stdin for {:s} . . .\n\n',
FINISHED1:'\n\nFinished {:s}\n\n',
FINISHEDALL:'\n\nFinished interpolation.\n\n',
TIMEELAPSED:'\n\n{:d} elapsed seconds\n\n',
FILEEXISTS:'\n\nReport file for {:s} exists . . .\n\n',
DATSIZE:'\n\nData file size for {:s} is now {:d}MB . . .\n\n',
DONE:'\n\n{:d} out of {:d} assays are finished . . .\n\n',
DATFILEEXIST:"\n\n{:s} doesn't exist yet . . .\n\n",
SIZECHANGE:RPTFILECHSIZE}


def cleanslate():
"""
Delete all output files prior to interpolation
so that their existence can be tracked.
"""
for key in ASSAYS:
files = (ASSAYS[key][DATFILE],
ASSAYS[key][RPTFILE],
ASSAYS[key][OUTPUT])
for filex in files:
print(MESGS[DELFILE].format(filex))
if os.path.exists(filex) and os.path.isfile(filex):
os.remove(filex)
return 0


def startprocess(assay):
"""
Start <consultant program> run for given interpolation.

Return subprocess.Popen object,
file object (output file).
"""
print(MESGS[INTERP].format(ASSAYS[assay][NAME]))
# XXX - I hate time.sleep - hack
# XXX - try to re-route standard output so that
# it's not all jumbled together.
print(MESGS[SLEEP])
time.sleep(2)
# output file for stdout
f = open(ASSAYS[assay][OUTPUT], 'w')
procx = subx.Popen('{0}'.format(PROGRAM), stdin=subx.PIPE, stdout=f)
print(MESGS[SLEEP])
time.sleep(2)
# XXX - problem, starting up Excel CBT 22JUN2014
# Ah - this is what happens when the <software usb licence>
# key is not attached :-(
print(MESGS[MSGDRIVER])
print('\ndriver file = {:s}\n'.format(ASSAYS[assay][DRFILE]))
procx.stdin.write(ASSAYS[assay][DRFILE])
print(MESGS[SLEEP])
time.sleep(2)
# XXX - this is so jacked up -
# no idea what is happening when
print(MESGS[MSGRETCHAR].format(ASSAYS[assay][NAME]))
procx.stdin.write(RETCHAR)
print(MESGS[SLEEP])
time.sleep(2)
print(MESGS[MSGRETCHAR].format(ASSAYS[assay][NAME]))
procx.stdin.write(RETCHAR)
print(MESGS[SLEEP])
time.sleep(2)
return procx, f


def crosslookup(assay):
"""
From assay string, get numeric
key for ASSAYS dictionary.

Returns integer.
"""
for key in ASSAYS:
if assay == ASSAYS[key][NAME]:
return key
return 0


def checkprocess(assay, assaydict):
"""
Check to see if assay
interpolation is finished.

assay is the item in question
(ASSAY1, ASSAY2, etc.).

assaydict is the operating dictionary
for the assay in question.

Returns True if finished.
"""
# Report file indicates process finished.
assaykey = crosslookup(assay)
rptfile = ASSAYS[assaykey][RPTFILE]
datfile = ASSAYS[assaykey][DATFILE]
if os.path.exists(datfile) and os.path.isfile(datfile):
# Report size of file in MB.
datfilesize = os.stat(datfile).st_size >> BITSHIFT
print(MESGS[DATSIZE].format(assay, datfilesize))
else:
# Doesn't exist yet.
print(MESGS[DATFILEEXIST].format(datfile))
if os.path.exists(rptfile) and os.path.isfile(rptfile):
# XXX - not the most efficient way,
# but this checking the file appears
# to work best.
f = open(rptfile, 'r')
txt = f.read()
f.close()
# XXX - hack - gah.
if txt.find(ENDTEXT) > -1:
# looking for change in reportfile size
# or big report file
print(MESGS[SIZECHANGE].format(assay))
print(MESGS[SLEEP])
time.sleep(2)
return True
return False


PROCX = 'process'
OUTPUTFILE = 'output file'


# Keeps track of files and progress of <consultant program>.
opdict = colx.OrderedDict()


# get rid of preexisting files
cleanslate()


# start all five roughly in parallel
# ASSAYS keys are numbers
for key in ASSAYS:
# opdict - ordered with assay names as keys
namex = ASSAYS[key][NAME]
opdict[namex] = {}
assaydict = opdict[namex]
assaydict[PROCX], assaydict[OUTPUTFILE] = startprocess(key)
# Initialize active status of process.
assaydict[FINISHED] = False


# For count.
numassays = len(ASSAYS)
# Loop until all finished.
while True:
# Cycle until done then break.
# Sleep SLEEPTIME seconds at a time and check between.
time.sleep(SLEEPTIME)
# Count.
i = 0
for key in opdict:
assaydict = opdict[key]
if not assaydict[FINISHED]:
status = checkprocess(key, assaydict)
if status:
# kill process when report file changes
opdict[key][PROCX].kill()
assaydict[FINISHED] = True
i += 1
else:
i += 1
print(MESGS[DONE].format(i, numassays))
# all done
if i == numassays:
break


print('\n\nFinished interpolation.\n\n')
timeend = dt.now()
elapsed = timeend - timestart


print(MESGS[TIMEELAPSED].format(elapsed.seconds))
print('\n\n{:d} elapsed minutes\n\n'.format(elapsed.seconds/60))


# Allow quick look at screen.
time.sleep(10)



20 Oct 2014 9:11pm GMT

Tryton News: New Tryton release 3.4

We are proud to announce the 3.4 release of Tryton.

In addition to the usual improvements of existing features for users and developers, this release has seen a lot of work done on the accounting part.

Of course, migration from previous series is fully supported with the obvious exception of the ldap_connection module which was removed.

Major changes in graphical user interface

  • The search of relation record has been re-worked to take advantage of the auto-completion. The search box of the pop-up window is filled with the text entered in the widget.

  • The search/open button of the Many2One widget is now inside the entry box and the create button is removed in favor of auto-completion actions or pop-up button. This change allow to harmonize the size of all widgets inside a form.

    many2one button inside
  • A new image widget is available on list/tree view.

    widget image tree
  • The client can now perform a pre-validation before executing a button action. The validation is based on a domain and so the offending fields can be highlighted and focused instead of having an error message pop-up.

  • The selection label are now available in addition of the internal value for the export data (CSV) functionality.

  • The export data window is now predefined with the fields of the current view. This gives a fast way to export what you see.

  • The predefined export can now be replaced directly with a new selection of fields. This eases the process of creating such predefined exportation.

  • It is now possible to re-order the list of the exported fields using drag and drop.

  • The range operator of the search box is now including on both endpoints. It appears to be less astonishing behavior for users even if the previous behavior including-excluding had some practical advantages.

  • The client loads now plug-ins defined in the user local directory (~/.config/tryton/x.y/plugins).

Major changes on the server side

  • A new Mixin MatchMixin is introduced. It allows to implement a common pattern in Tryton to find records that match certain values.
  • Another UnionMixin is also introduced. It allows to define a ModelSQL which is the UNION of some ModelSQL's.
  • Actually, Tryton doesn't update a record defined in a XML file if this one has been modified outside the XML. Now, it is possible to find those records and force the update to get the record synchronised with the XML.
  • A Python descriptor has been added to the Selection field. It allows to define an attribute on a Model which will contains the selection label of the record. It is planned to update all the reports to use such descriptor instead of hard-coded values.
  • A new configuration file format is introduced for the server. It is easily extendable to be used by modules. For example, the ldap_authentication module starts using it in replacement of the removed ldap_connection.
  • It is now possible to give a logging configuration files to setup the server logging. This file uses the Python logging configuration format.
  • The context defined on relation fields are now used to instantiate the target.
  • The SQL clause for a domain on a field can be now customized using a domain_<field> method. This method allows in some cases a more efficient SQL query. The method is designed to support joins.
  • The access rights has been reworked to be active only on RPC calls. With this design, Tryton follows the principle of checking input on the border of the application. So it is no more required to switch to the root user when calling methods requiring some specific access rights as far as it is not from an RPC call.

Modules

Account

  • A new wizard to help reconcile all accounts has been added. It loops over each account and party and makes a proposal of lines to reconcile if it could find one. This really speeds up the reconciliation task.

    reoncilie wizard
  • There is also another new wizard to ease the creation of cancellation moves. The wizard also reconciles automatically the line with the cancelled sibling.

  • A new option Party Required on account has been added. This option makes the party required for move lines of this account and forbids it for others.

Account Invoice

  • It is now possible to configure which tax rounding to use. There are two ways implemented: per document and per line. The default stays per document.

Account Payment

  • It is now possible to change a succeeded payment to failed.

Account Payment SEPA

  • The scheme Business to Business is supported for direct debit.
  • The mandate receives now a default unique identification using a configured sequence.
  • The module supports now the bank to customer debit/credit notification message (CAMT.054).
  • A report to print a standard form for mandate has been added.

Account Statement

  • It is now possible to order the statement lines and to give them a number. With those features, it is easier to reproduce the same layout of a bank statement.
  • A report for statement has been added. For example, it can be used when using the statement for check deposit.
  • A validation method can be defined on the statement journal. The available methods are: Balance, Amount and Number of Lines. This helps to uses the statement for different purposes like bank statement or check deposit.

Account Stock Continental/Anglo-Saxon

  • The method is now defined on the fiscal year instead of being globally activated on module installation.

Country

  • It is now possible to store zip code per country. A script is provided to load zip codes from GeoNames.

LDAP Authentication

  • The module ldap_connection has been replaced by an entry in the configuration file of trytond.

Party

  • The new zip code from the module country is used to auto-complete zip and city field on address.

Purchase

  • The Confirmed state has been split into Confirmed and Processing, just like the Sale workflow.

Sale Supply Drop Shipment

  • The management of exception on drop shipment is propagated from the sale to the purchase.

New modules

  • The Account Payment Clearing module allows to generate clearing account move when a payment has succeeded between the receivable/payable account to a clearing account. The clearing account will be reconciled later by the statement.

Proteus

Proteus is a library to access Tryton like a client.

  • It is now possible to run reports. It is useful for testing them.
  • A new duplicate method is added which is similar to the copy menu entry of the client.

20 Oct 2014 6:00pm GMT

Tryton News: New Tryton release 3.4

We are proud to announce the 3.4 release of Tryton.

In addition to the usual improvements of existing features for users and developers, this release has seen a lot of work done on the accounting part.

Of course, migration from previous series is fully supported with the obvious exception of the ldap_connection module which was removed.

Major changes in graphical user interface

  • The search of relation record has been re-worked to take advantage of the auto-completion. The search box of the pop-up window is filled with the text entered in the widget.

  • The search/open button of the Many2One widget is now inside the entry box and the create button is removed in favor of auto-completion actions or pop-up button. This change allow to harmonize the size of all widgets inside a form.

    many2one button inside
  • A new image widget is available on list/tree view.

    widget image tree
  • The client can now perform a pre-validation before executing a button action. The validation is based on a domain and so the offending fields can be highlighted and focused instead of having an error message pop-up.

  • The selection label are now available in addition of the internal value for the export data (CSV) functionality.

  • The export data window is now predefined with the fields of the current view. This gives a fast way to export what you see.

  • The predefined export can now be replaced directly with a new selection of fields. This eases the process of creating such predefined exportation.

  • It is now possible to re-order the list of the exported fields using drag and drop.

  • The range operator of the search box is now including on both endpoints. It appears to be less astonishing behavior for users even if the previous behavior including-excluding had some practical advantages.

  • The client loads now plug-ins defined in the user local directory (~/.config/tryton/x.y/plugins).

Major changes on the server side

  • A new Mixin MatchMixin is introduced. It allows to implement a common pattern in Tryton to find records that match certain values.
  • Another UnionMixin is also introduced. It allows to define a ModelSQL which is the UNION of some ModelSQL's.
  • Actually, Tryton doesn't update a record defined in a XML file if this one has been modified outside the XML. Now, it is possible to find those records and force the update to get the record synchronised with the XML.
  • A Python descriptor has been added to the Selection field. It allows to define an attribute on a Model which will contains the selection label of the record. It is planned to update all the reports to use such descriptor instead of hard-coded values.
  • A new configuration file format is introduced for the server. It is easily extendable to be used by modules. For example, the ldap_authentication module starts using it in replacement of the removed ldap_connection.
  • It is now possible to give a logging configuration files to setup the server logging. This file uses the Python logging configuration format.
  • The context defined on relation fields are now used to instantiate the target.
  • The SQL clause for a domain on a field can be now customized using a domain_<field> method. This method allows in some cases a more efficient SQL query. The method is designed to support joins.
  • The access rights has been reworked to be active only on RPC calls. With this design, Tryton follows the principle of checking input on the border of the application. So it is no more required to switch to the root user when calling methods requiring some specific access rights as far as it is not from an RPC call.

Modules

Account

  • A new wizard to help reconcile all accounts has been added. It loops over each account and party and makes a proposal of lines to reconcile if it could find one. This really speeds up the reconciliation task.

    reoncilie wizard
  • There is also another new wizard to ease the creation of cancellation moves. The wizard also reconciles automatically the line with the cancelled sibling.

  • A new option Party Required on account has been added. This option makes the party required for move lines of this account and forbids it for others.

Account Invoice

  • It is now possible to configure which tax rounding to use. There are two ways implemented: per document and per line. The default stays per document.

Account Payment

  • It is now possible to change a succeeded payment to failed.

Account Payment SEPA

  • The scheme Business to Business is supported for direct debit.
  • The mandate receives now a default unique identification using a configured sequence.
  • The module supports now the bank to customer debit/credit notification message (CAMT.054).
  • A report to print a standard form for mandate has been added.

Account Statement

  • It is now possible to order the statement lines and to give them a number. With those features, it is easier to reproduce the same layout of a bank statement.
  • A report for statement has been added. For example, it can be used when using the statement for check deposit.
  • A validation method can be defined on the statement journal. The available methods are: Balance, Amount and Number of Lines. This helps to uses the statement for different purposes like bank statement or check deposit.

Account Stock Continental/Anglo-Saxon

  • The method is now defined on the fiscal year instead of being globally activated on module installation.

Country

  • It is now possible to store zip code per country. A script is provided to load zip codes from GeoNames.

LDAP Authentication

  • The module ldap_connection has been replaced by an entry in the configuration file of trytond.

Party

  • The new zip code from the module country is used to auto-complete zip and city field on address.

Purchase

  • The Confirmed state has been split into Confirmed and Processing, just like the Sale workflow.

Sale Supply Drop Shipment

  • The management of exception on drop shipment is propagated from the sale to the purchase.

New modules

  • The Account Payment Clearing module allows to generate clearing account move when a payment has succeeded between the receivable/payable account to a clearing account. The clearing account will be reconciled later by the statement.

Proteus

Proteus is a library to access Tryton like a client.

  • It is now possible to run reports. It is useful for testing them.
  • A new duplicate method is added which is similar to the copy menu entry of the client.

20 Oct 2014 6:00pm GMT

Calvin Spealman: The Problem with Coders' Technology Focus

Coders focus on code. Coders focus on toolchains and development practices. Coders focus on commits and line counts. Coders focus on code, but we don't focus as well on people.


We need to take a step back and remember why we write code, or possibly re-evaluate why we write code. Many of us might be doing it for the wrong reasons. Maybe you don't think there can be a wrong reason, and I'm not entirely sure. What I am certain of is that some reasons to code lend themselves to certain attitudes and weights about the code and other motivations might mandate that you take yourself more or less seriously.


We're taking the wrong motivations seriously and we're not giving enough attention and weight to the reasons for code that we should.


The most valid and important reason we can code is not what hackers think it is. A good hack isn't good for its own sake. No programming language or tool is inherently better than another. The technical merits of the approach or of the individual are not the most important factors to consider.


Our impact on people is the only thing that truly matters.


Twitter isn't great because they developed amazing distributed services internally to support the load requirements of their service, but because they connect millions of voices across the globe.


RSS isn't great because it encapsulates content in an easily parseable format for client software to consume, but because it connects writers to the readers who care most about their thoughts and feelings and ideas.


The amazing rendering tools built in-house by the likes of Disney aren't amazing because of their attention to physical based light simulations and the effort required to coordinate the massive render farms churning out frames for new big budget films, but for their ability to tell wonderful stories that touch people.


The next time you find yourself on a forum chastising someone for writing their website in PHP, pause and ask yourself why that was the more important question to ask them than "Does this fulfill something important to you or your users?"


When you are reviewing code and want to stop a merge because you disagree with a technical approach, take a step back and ask yourself if the changes have a positive impact on the people your product serves.


Every time you find yourself valuing the technical contributions of team mates and community members, make sure those contributions translate into enriching and fulfilling the lives of that community and your workplace, before the technical needs.


Nothing that is important can be so without being important for people first.

20 Oct 2014 3:30pm GMT

Calvin Spealman: The Problem with Coders' Technology Focus

Coders focus on code. Coders focus on toolchains and development practices. Coders focus on commits and line counts. Coders focus on code, but we don't focus as well on people.


We need to take a step back and remember why we write code, or possibly re-evaluate why we write code. Many of us might be doing it for the wrong reasons. Maybe you don't think there can be a wrong reason, and I'm not entirely sure. What I am certain of is that some reasons to code lend themselves to certain attitudes and weights about the code and other motivations might mandate that you take yourself more or less seriously.


We're taking the wrong motivations seriously and we're not giving enough attention and weight to the reasons for code that we should.


The most valid and important reason we can code is not what hackers think it is. A good hack isn't good for its own sake. No programming language or tool is inherently better than another. The technical merits of the approach or of the individual are not the most important factors to consider.


Our impact on people is the only thing that truly matters.


Twitter isn't great because they developed amazing distributed services internally to support the load requirements of their service, but because they connect millions of voices across the globe.


RSS isn't great because it encapsulates content in an easily parseable format for client software to consume, but because it connects writers to the readers who care most about their thoughts and feelings and ideas.


The amazing rendering tools built in-house by the likes of Disney aren't amazing because of their attention to physical based light simulations and the effort required to coordinate the massive render farms churning out frames for new big budget films, but for their ability to tell wonderful stories that touch people.


The next time you find yourself on a forum chastising someone for writing their website in PHP, pause and ask yourself why that was the more important question to ask them than "Does this fulfill something important to you or your users?"


When you are reviewing code and want to stop a merge because you disagree with a technical approach, take a step back and ask yourself if the changes have a positive impact on the people your product serves.


Every time you find yourself valuing the technical contributions of team mates and community members, make sure those contributions translate into enriching and fulfilling the lives of that community and your workplace, before the technical needs.


Nothing that is important can be so without being important for people first.

20 Oct 2014 3:30pm GMT

Mike Driscoll: PyDev of the Week: Facundo Batista

This week we have Facundo Batista (@facundobatista) joining us.

facundobatista

He is a Python Core developer from Argentina. If you happen to speak Spanish, then you might enjoy his blog. Let's spend some time getting to know Facundo!

Can you tell us a little about yourself (hobbies, education, etc):

I'm a specialist in the Python programming language. With an experience
in it of more than 8 years, I'm Core Developer of the language, and
member by merit of the Python Software Foundation. Also, received the
2009 Community Service Award for organizing PyCon Argentina and the
Argentinian Python community as well as contributions to the standard
library and work in translating the Python documentation.

I gave talks in the main Python conferences in Argentina and other
countries (United States and Europe). In general, I have a strong
experience in distributed collaborative experience, being involved in
FLOSS development, working with people around the globe, for more than
10 years.

Worked as Telecommunication Engineer in Movistar and Ericsson, and as
Python expert in Cyclelogic (Developer in Chief) and Canonical
(Technical Leader, current position).

Also love playing tennis, have a one year kid that is a wonderful little
person, and enjoy taking photos.

Why did you start using Python?

I needed to process some logs server-side, when I was working in
Movistar ~14 years ago.

Servers were running SunOS (!). I knew C and other languages not really
suited to do that task. I learned and used Perl for some months, until I
found Python and fell in love.

What other programming languages do you know and which is your favorite?

I have experience and worked with (although I won't be able to use them
nowadays without some re-learning) COBOL, Clipper, Basic, C, C++, Java
and Perl.

My favourite of course is Python ;)

What projects are you working on now?

I'm actively working on three projects:

Which Python libraries are your favorite (core or 3rd party)?

I really love the itertools core lib. And of course the decimal one,
that I wrote ;).

Regarding external libs, I'm a fan of Twisted, and these days I use a
lot BeautifulSoup.

Is there anything else you'd like to say?

Thanks for the interview!

20 Oct 2014 12:30pm GMT

Mike Driscoll: PyDev of the Week: Facundo Batista

This week we have Facundo Batista (@facundobatista) joining us.

facundobatista

He is a Python Core developer from Argentina. If you happen to speak Spanish, then you might enjoy his blog. Let's spend some time getting to know Facundo!

Can you tell us a little about yourself (hobbies, education, etc):

I'm a specialist in the Python programming language. With an experience
in it of more than 8 years, I'm Core Developer of the language, and
member by merit of the Python Software Foundation. Also, received the
2009 Community Service Award for organizing PyCon Argentina and the
Argentinian Python community as well as contributions to the standard
library and work in translating the Python documentation.

I gave talks in the main Python conferences in Argentina and other
countries (United States and Europe). In general, I have a strong
experience in distributed collaborative experience, being involved in
FLOSS development, working with people around the globe, for more than
10 years.

Worked as Telecommunication Engineer in Movistar and Ericsson, and as
Python expert in Cyclelogic (Developer in Chief) and Canonical
(Technical Leader, current position).

Also love playing tennis, have a one year kid that is a wonderful little
person, and enjoy taking photos.

Why did you start using Python?

I needed to process some logs server-side, when I was working in
Movistar ~14 years ago.

Servers were running SunOS (!). I knew C and other languages not really
suited to do that task. I learned and used Perl for some months, until I
found Python and fell in love.

What other programming languages do you know and which is your favorite?

I have experience and worked with (although I won't be able to use them
nowadays without some re-learning) COBOL, Clipper, Basic, C, C++, Java
and Perl.

My favourite of course is Python ;)

What projects are you working on now?

I'm actively working on three projects:

Which Python libraries are your favorite (core or 3rd party)?

I really love the itertools core lib. And of course the decimal one,
that I wrote ;).

Regarding external libs, I'm a fan of Twisted, and these days I use a
lot BeautifulSoup.

Is there anything else you'd like to say?

Thanks for the interview!

20 Oct 2014 12:30pm GMT

Peter Bengtsson: django-html-validator

In action
A couple of weeks ago we had accidentally broken our production server (for a particular report) because of broken HTML. It was an unclosed tag which rendered everything after that tag to just plain white. Our comprehensive test suite failed to notice it because it didn't look at details like that. And when it was tested manually we simply missed the conditional situation when it was caused. Neither good excuses. So it got me thinking how can we incorporate HTML (html5 in particular) validation into our test suite.

So I wrote a little gist and used it a bit on a couple of projects and was quite pleased with the results. But I thought this might be something worthwhile to keep around for future projects or for other people who can't just copy-n-paste a gist.

With that in mind I put together a little package with a README and a setup.py and now you can use it too.

There are however some caveats. Especially if you intend to run it as part of your test suite.

Caveat number 1

You can't flood htmlvalidator.nu. Well, you can I guess. It would be really evil of you and kittens will die. If you have a test suite that does things like response = self.client.get(reverse('myapp:myview')) and there are many tests you might be causing an obscene amount of HTTP traffic to them. Which brings us on to...

Caveat number 2

The htmlvalidator.nu site is written in Java and it's open source. You can basically download their validator and point django-html-validator to it locally. Basically the way it works is java -jar vnu.jar myfile.html. However, it's slow. Like really slow. It takes about 2 seconds to run just one modest HTML file. So, you need to be patient.

20 Oct 2014 4:48am GMT

Peter Bengtsson: django-html-validator

In action
A couple of weeks ago we had accidentally broken our production server (for a particular report) because of broken HTML. It was an unclosed tag which rendered everything after that tag to just plain white. Our comprehensive test suite failed to notice it because it didn't look at details like that. And when it was tested manually we simply missed the conditional situation when it was caused. Neither good excuses. So it got me thinking how can we incorporate HTML (html5 in particular) validation into our test suite.

So I wrote a little gist and used it a bit on a couple of projects and was quite pleased with the results. But I thought this might be something worthwhile to keep around for future projects or for other people who can't just copy-n-paste a gist.

With that in mind I put together a little package with a README and a setup.py and now you can use it too.

There are however some caveats. Especially if you intend to run it as part of your test suite.

Caveat number 1

You can't flood htmlvalidator.nu. Well, you can I guess. It would be really evil of you and kittens will die. If you have a test suite that does things like response = self.client.get(reverse('myapp:myview')) and there are many tests you might be causing an obscene amount of HTTP traffic to them. Which brings us on to...

Caveat number 2

The htmlvalidator.nu site is written in Java and it's open source. You can basically download their validator and point django-html-validator to it locally. Basically the way it works is java -jar vnu.jar myfile.html. However, it's slow. Like really slow. It takes about 2 seconds to run just one modest HTML file. So, you need to be patient.

20 Oct 2014 4:48am GMT

Vasudev Ram: Published my first presentation on SpeakerDeck - using Python

By Vasudev Ram


SpeakerDeck is an online presentation service roughly like SlideShare. SpeakerDeck seems to have been created by Github Inc.

I just published my first presentation on SpeakerDeck. It is a quickstart tutorial for the vi editor. Note: vi, not vim. I had written the tutorial some years ago, when vim was not so widely used, and vi was the most common text editor on Unix systems.

About the tutorial:

I first wrote this vi quickstart tutorial for some friends at a company where I worked. They were Windows and network system administrators without prior Unix experience, and had been tasked with managing some Unix servers that the company had bought for client work. Since I had a Unix background, they asked me to create a quick tutorial on vi for them, which I did.

Later on, after learning the basics of vi from it, and spending some days using vi to edit Unix configuration files, write small shell scripts, etc., they told me that they had found the tutorial useful in getting up to speed on vi quickly.

So, some time later, I thought of publishing it, and sent an article proposal to Linux For You magazine (an Indian print magazine about Linux and open source software). The proposal was accepted and the article was published.

About generating the tutorial as PDF and uploading it to SpeakerDeck:

The original vi quickstart tutorial was in text format. Last year I wrote XMLtoPDFBook (as an application of xtopdf, my Python toolkit for PDF creation), which allows the user to create simple PDF e-books from XML files. So I converted the vi tutorial to XML format (*) and used it to test XMLtoPDFBook. I therefore had the tutorial available in PDF format.

(*) All you have to do for that - i.e. to convert a text file to the XML format supported by XMLtoPDFBook - is to insert each chapter's text as a <chapter> element in the XML file. Then give the XML file as the input to XMLtoPDFBook, and you're done.

SpeakerDeck requires that presentations be uploaded in PDF format. It then converts them to slides. So I thought it would be a good test of SpeakerDeck and/or xtopdf, to upload this PDF generated by xtopdf to SpeakerDeck, and see how the result turned out. I did that today. Then I viewed the resulting SpeakerDeck presentation. It was good to see that the conversion turned out well, AFAICT. All pages seem to have got converted correctly into slides.

The presentation can be viewed here:

A vi quickstart tutorial

If you prefer plain text to presentations, you can read the vi quickstart tutorial here.

- Vasudev Ram - Dancing Bison EnterprisesClick here to signup for email notifications about new products and services from Vasudev Ram. Contact Page

Share |
Vasudev Ram

20 Oct 2014 1:33am GMT

Vasudev Ram: Published my first presentation on SpeakerDeck - using Python

By Vasudev Ram


SpeakerDeck is an online presentation service roughly like SlideShare. SpeakerDeck seems to have been created by Github Inc.

I just published my first presentation on SpeakerDeck. It is a quickstart tutorial for the vi editor. Note: vi, not vim. I had written the tutorial some years ago, when vim was not so widely used, and vi was the most common text editor on Unix systems.

About the tutorial:

I first wrote this vi quickstart tutorial for some friends at a company where I worked. They were Windows and network system administrators without prior Unix experience, and had been tasked with managing some Unix servers that the company had bought for client work. Since I had a Unix background, they asked me to create a quick tutorial on vi for them, which I did.

Later on, after learning the basics of vi from it, and spending some days using vi to edit Unix configuration files, write small shell scripts, etc., they told me that they had found the tutorial useful in getting up to speed on vi quickly.

So, some time later, I thought of publishing it, and sent an article proposal to Linux For You magazine (an Indian print magazine about Linux and open source software). The proposal was accepted and the article was published.

About generating the tutorial as PDF and uploading it to SpeakerDeck:

The original vi quickstart tutorial was in text format. Last year I wrote XMLtoPDFBook (as an application of xtopdf, my Python toolkit for PDF creation), which allows the user to create simple PDF e-books from XML files. So I converted the vi tutorial to XML format (*) and used it to test XMLtoPDFBook. I therefore had the tutorial available in PDF format.

(*) All you have to do for that - i.e. to convert a text file to the XML format supported by XMLtoPDFBook - is to insert each chapter's text as a <chapter> element in the XML file. Then give the XML file as the input to XMLtoPDFBook, and you're done.

SpeakerDeck requires that presentations be uploaded in PDF format. It then converts them to slides. So I thought it would be a good test of SpeakerDeck and/or xtopdf, to upload this PDF generated by xtopdf to SpeakerDeck, and see how the result turned out. I did that today. Then I viewed the resulting SpeakerDeck presentation. It was good to see that the conversion turned out well, AFAICT. All pages seem to have got converted correctly into slides.

The presentation can be viewed here:

A vi quickstart tutorial

If you prefer plain text to presentations, you can read the vi quickstart tutorial here.

- Vasudev Ram - Dancing Bison EnterprisesClick here to signup for email notifications about new products and services from Vasudev Ram. Contact Page

Share |
Vasudev Ram

20 Oct 2014 1:33am GMT

19 Oct 2014

feedPlanet Python

Rob Galanakis: PracticalMayaPython: RuntimeError: Internal C++ object (PySide.QtGui.QStatusBar) already deleted.

TLDR: If you get that error for the code on page 163, see the fix at https://github.com/rgalanakis/practicalmayapython/pull/2/files

In August, reader Ric Williams noted:

I'm running Maya 2015 with Windows 7 64-bit. On page 163 when we open the GUI using a Shelf button, the GUI status bar does not work, (it works when called from outside Maya). This RuntimeError appears: RuntimeError: Internal C++ object (PySide.QtGui.QStatusBar) already deleted.

I no longer have Maya installed so I couldn't debug it, but reader ragingViking (sorry, don't know your real name!) contributed a fix to the book's GitHub repository. You can see the fix here: https://github.com/rgalanakis/practicalmayapython/pull/2/files
And you can see the issue which has more explanation here: https://github.com/rgalanakis/practicalmayapython/issues/1

Thanks again to Ric and ragingViking. I did my best to test the code with various versions of Maya but definitely missed some things (especially those which required manual testing). If you find any other problems, please don't hesitate to send me an email!

19 Oct 2014 5:30pm GMT

Rob Galanakis: PracticalMayaPython: RuntimeError: Internal C++ object (PySide.QtGui.QStatusBar) already deleted.

TLDR: If you get that error for the code on page 163, see the fix at https://github.com/rgalanakis/practicalmayapython/pull/2/files

In August, reader Ric Williams noted:

I'm running Maya 2015 with Windows 7 64-bit. On page 163 when we open the GUI using a Shelf button, the GUI status bar does not work, (it works when called from outside Maya). This RuntimeError appears: RuntimeError: Internal C++ object (PySide.QtGui.QStatusBar) already deleted.

I no longer have Maya installed so I couldn't debug it, but reader ragingViking (sorry, don't know your real name!) contributed a fix to the book's GitHub repository. You can see the fix here: https://github.com/rgalanakis/practicalmayapython/pull/2/files
And you can see the issue which has more explanation here: https://github.com/rgalanakis/practicalmayapython/issues/1

Thanks again to Ric and ragingViking. I did my best to test the code with various versions of Maya but definitely missed some things (especially those which required manual testing). If you find any other problems, please don't hesitate to send me an email!

19 Oct 2014 5:30pm GMT

Calvin Spealman: Farewell to XMLHttpRequest; Long live window.fetch!

The days of XMLHttpRequest and weird choices for capitalization are numbered. WhatWG has a spec ready for some time now to replace it: window.fetch. Just look at how easy it is.



Just look at how nicer that is all with soon-to-be native APIs. But you don't have to wait, because there is a polyfill available. Start using the Fetch Polyfill in your projects today.

19 Oct 2014 2:04pm GMT

Calvin Spealman: Farewell to XMLHttpRequest; Long live window.fetch!

The days of XMLHttpRequest and weird choices for capitalization are numbered. WhatWG has a spec ready for some time now to replace it: window.fetch. Just look at how easy it is.



Just look at how nicer that is all with soon-to-be native APIs. But you don't have to wait, because there is a polyfill available. Start using the Fetch Polyfill in your projects today.

19 Oct 2014 2:04pm GMT

BangPypers: October 2014 Meetup report

The October Bangpypers meetup happened at the InMobi office near Bellandur. Sanketsaurav facilitated the workshop. Krace helped the meetup as a volunteer.

The workshop was focussed on build REST Service using TastyPie. The workshop started at 10.25 and wen t to till 1.45. Sanket explained all the concepts required to build REST service in depth with examples.

There were about 35 participants for the workshop.

Find the content of the workshop here

Special thanks to Iliyas Shirol for being a great help in organizing the event at InMobi.

We also have a mailing list where discussion happens about Python.

19 Oct 2014 10:57am GMT

BangPypers: October 2014 Meetup report

The October Bangpypers meetup happened at the InMobi office near Bellandur. Sanketsaurav facilitated the workshop. Krace helped the meetup as a volunteer.

The workshop was focussed on build REST Service using TastyPie. The workshop started at 10.25 and wen t to till 1.45. Sanket explained all the concepts required to build REST service in depth with examples.

There were about 35 participants for the workshop.

Find the content of the workshop here

Special thanks to Iliyas Shirol for being a great help in organizing the event at InMobi.

We also have a mailing list where discussion happens about Python.

19 Oct 2014 10:57am GMT

Python Diary: Building a CARDIAC Assembler in Python

In my last article I gave a fully working example of a CARDIAC CPU simulator in Python. This article will continue down the path of creating a build toolchain for the Cardiac. Due to how the Cardiac accepts data from the outside world, namely in the form of a deck of cards or plain numbers, developing an assembler was a bit more difficult than a traditional bytecode assembler. At least when I build a binary assembler, I assemble the bytecodes in-memory. This ensures that all memory pointers are correctly pointing to the proper memory addresses, and also eliminates errors due to how information is stored. This is similar to how the MS-DOS DEBUG.EXE worked to build .COM files. It was simple and yet very effective. This tends to be the model I normally follow, as it makes debugging the virtual machine very easy. This is how I built up my simple-cpu project, and now I am in the process of enabling dynamic memory relocation to better enable the loading binary files. I also recently added a full memory map interface to it as well, so now memory mapped I/O is possible, and I plan on using that to allow my binaries to directly draw onto an SDL surface. Anyways, onward to the Cardiac Assembler, as that is the real reason your here today!

As of this post, you can view all the source code for this post and the last one on my Python Experiments Bitbucket repo.

The assembler is going to use just the standard Python library modules, so no fancy GNU Bison or YACC here. First, let's begin by defining what our language will look like, it's overall syntax. Here is deck1.txt turned into an Assembly program:

# This deck1.txt converted over to ASM format.
# Lines which start with a hash are comments.

# Here we declare our variables in memory for use.
var count = 0
var i = 9

# This is needed to add the special Cardiac bootstrap code.
bootstrap

# Program begins here.
CLA
STO $count
label loop
CLA $i
TAC $endloop
OUT $count
CLA $count
ADD
STO $count
CLA $i
SUB
STO $i
JMP $loop
label endloop
HRS

# End of program.
end

To make developing programs for the CARDIAC easier, the assembler is going to introduce the concept of variable storage, and labels. The first few lines here that begin with var are basically setting aside memory addresses to store these variables in question. The bootstrap statement is more of a Macro as it just writes out a basic bootstrap loader for use on CARDIAC. There is also an alternative bootloader, which uses a second stage boot program to load in your code. This has the advantage of being able to load in very large programs without needing a very large deck. I'd recommend playing around with both options when assembling your programs. The bootloader was custom built code made by me, here is an assembly listing of it:

INP 2
JMP 0

INP 89
INP 1
INP 90
CLA 89
INP 91
ADD
INP 92
STO 89
INP 93
CLA 98
INP 94
SUB
INP 95
STO 98
INP 96
TAC 3
INP 97
JMP 89
INP 98
INP 13
INP 2
JMP 89

The bootloader is passed in the size of the deck to load, and loops over each card to load in your program, this means that your input program will no longer need to have lots of address markings. It can also be used to load in other data into a specific memory address. I am planning on updating the bootloader code to also act like a subroutine, so you could call it to load additional data into memory. Okay, so onto the actual Assembler now.

from cmd import Cmd
from cStringIO import StringIO
import shlex, sys

class AsmError(Exception):
    pass

class Assembler(Cmd):
    """
    I am sure this could be done better with say GNU Bison or Yacc,
    but that's more complicated than needed for a simple assembler.
    """
    op_map = {
        'inp': 0,
        'cla': 1,
        'add': 2,
        'tac': 3,
        'sft': 4,
        'out': 5,
        'sto': 6,
        'sub': 7,
        'jmp': 8,
        'hrs': 9,
    }
    padding = '00'
    def configure(self):
        self.start = self.addr = None
        self.pc = 0 #: Allows us to keep track of the program pointer.
        self.var_map = {} #: This is used to keep track of variables.
        self.labels = {} #: Stores the label names, and where they point to.
        self.buffer = StringIO() #: This is our buffer where we will store the CARDIAC deck
        self.size = 0
    def emptyline(self):
        """ This is requried due to how the Python Cmd module works... """
        pass
    def unknown_command(self, line):
        self.stdout.write('*** Unknown syntax: %s\n'%line)
    @property
    def ptr(self):
        """ This will always give the proper pointer in the deck. """
        if self.addr:
            return self.addr
        return self.pc
    def write_cards(self, *card_list):
        """ Helper fuction to make life easier. """
        for card in card_list:
            self.buffer.write('%s\n' % card)
        self.pc += len(card_list) #: Increment the program pointer.
    def write_addr(self):
        """ This method will only write out the address if we're in that mode. """
        if self.addr:
            self.write_cards('0%s' % self.addr)
            self.addr +=1
        self.size +=1

The first thing to notice here, is that the assembler is a subclass of Cmd, which is a command line parser. I choose to use a command-line parser and it makes things easier and can also be used interactively if I need to debug or later want to enable that feature. The first thing we declare in this class, is a dictionary called op_map, which maps our op codes to actual byte codes. This makes it efficient to write the output and also add new opcodes in the future. The first method here, configure() could have well been an override of __init__(), but I choose to just have it a separate method that I can call manually. The next 2 methods are specific to the command parser, and tell it how to handle empty lines and invalid commands. The ptr property added here is to make it transparent to the code the proper location in the deck at runtime. This is required due to how the Cardiac bootstraps code. You cannot just load in RAW code, you need to bootstrap your deck of cards, and the deck itself needs to keep track at which memory locations each of it's operations are being placed in. Since I offer 2 modes of assembly here, and also support RAW assembly, there are different ways the exact memory pointer is obtained for runtime variables and jumps. The next 2 methods are for writing specific card types, one being a normal list of cards, while the other being an address card. The address cards are only used if you assemble using bootstrap, the original Cardiac bootstrap program.

Since this is a command processor Python subclass, next we are going to put in some commands that our source code can use to perform specific tasks:

    def do_exit(self, args):
        return True
    def do_bootstrap(self, args):
        """ Places some basic bootstrap code in. """
        self.addr = 10
        self.start = self.addr #: Updates all required address variables.
        self.write_cards('002', '800')
        self.pc = self.start
    def do_bootloader(self, args):
        """ Places a Cardiac compatible bootloader in. """
        self.write_cards('002', '800') 
        addr = 89 #: This is the start address of the bootloader code.
        for card in ('001', '189', '200', '689', '198', '700', '698', '301', '889', 'SIZE'):
            self.write_cards('0%s' % addr, card)
            addr+=1
        self.write_cards('002', '889')
        self.pc = 1
    def do_var(self, args):
        """ Creates a named variable reference in memory, a simple pointer. """
        s = shlex.split(args)
        if len(s) != 3 or s[1] != '=':
            raise AsmError('Incorrect format of the "var" statement.')
        if s[0] in self.var_map:
            raise AsmError('Variable has been declared twice!')
        value = int(s[2])
        self.var_map[s[0]] = '000'+str(value)
    def do_label(self, args):
        """ Creates a named label reference in memory, a simple pointer. """
        if args == '':
            raise AsmError('Incorrect format of the "label" statement.')
        if args in self.labels:
            raise AsmError('Label has been declared twice!')
        ptr = '00'+str(self.ptr)
        self.labels[args] = ptr[-2:]

Okay, there's a fair amount to explain with this code. Firstly, the two first methods here, bootstrap and bootloader are obviously Cardiac specific, and it's doubtful that a homemade Assembler will even need such things. The bootstrap() method sets up some variables required for that mode of assembly, namely the address variables used to keep track which memory location we are at. We also write 2 basic cards, the bootstrap that Cardiac needs to load your program in a real or simulated Cardiac. The next method, bootloader is a bit more complex, it writes the same initial bootstrap required by all cardiac programs, but also writes out a second stage bootloader program which is responsible for loading in your program. Once I get to the guide about creating a compiler, the compiler will automatically choose which mode of assembly will be used based on the program being compiled.

The next two very important methods, do_var, and do_label are responsible for controlling the variable and label system of the assembler. They make it easy to set a side a memory address for use as a variable, and to place in a label within your code so that you may JMP to it. If these didn't exist, you would be forced to keep track of memory addresses yourself, and if you update your program, you will also need to update the memory addresses... So, variables and labels keep track of all that for you, and the memory addresses are generated automatically during assembly time. Both of these methods should be fairly easy to understand what they are doing. They are storing the variable and label name into a hash along with the needed metadata.

    def default(self, line):
        """ This method is the actual lexical parser for the assembly. """
        if line.startswith('#'):
            return
        s = shlex.split(line)
        op, arg = s[0].lower(), '00'
        if len(s) == 2:
            if s[1].startswith('$'):
                arg = '*%s' % s[1][1:]
            else:
                arg = self.padding+s[1]
                arg = arg[-2:]
        if op in self.op_map:
            self.write_addr()
            self.write_cards('%s%s' % (self.op_map[op], arg))
        else:
            self.unknown_command(line)
    def do_end(self, args):
        """ Finalizes your code. """
        for var in self.var_map:
            ptr = self.padding+str(self.ptr)
            self.write_addr()
            self.write_cards(self.var_map[var][-3:])
            self.labels[var] = ptr[-2:]
        if self.start:
            self.write_cards('002', '8%s' % self.start)
        self.buffer.seek(0)
        buf = StringIO()
        for card in self.buffer.readlines():
            if card[1] == '*': #: We have a label.
                card = '%s%s\n' % (card[0], self.labels[card[2:-1]])
            elif card[:4] == 'SIZE':
                card = '000'+str(self.size-1)
                card = '%s\n' % card[-3:]
            buf.write(card)
        buf.seek(0)
        print ''.join(buf.readlines())
        return True

Now onto the bread and butter of the assembler, these two methods here do most of the heavy lifting during the assembly process. First lets talk about default(), this method is called by the command-line parser when it cannot find the command the end-user attempted to call. Makes sense? So, here is where we place the code to check if any of those lines are assembly instructions and process them accordingly. First we filter out any comments, any line that starts with a hash is a commented line, simple as that. Next, we use the help of shlex module to parse the actual line. This module splits all the tokens into separate array elements, it even works with quotation marks, so take this line for example: echo "Example echo command" will result in only 2 array elements, the first being echo, and the second being the string. In the next line we set some needed defaults for Cardiac cards, so if there is a missing argument it will still work fine on the Cardiac. It is here that we also check the arguments to see if we are using any variables or labels and mark them accordingly in the buffer for the second pass.

The last and final method here do_end is when you place end in your code at the very end. This method is essentially the last pass, and which generates the final assembled code. It gathers together all the variables, and generates a table of pointers to them. Then we write out the required cards to run the program the Cardiac just loaded, if we are using the original bootstrap. The bootloader doesn't use this, the bootloader manages that during runtime. Then we seek the buffer to the beginning and being our last pass through the assembled data. We check for any memory pointer references and fill in the holes as needed in the output code. If using the bootloader, it needs to be passed the size of the program code to be loaded, and this is where we fill that in so that the bootloader can locate the variable. Then finally, we display the assembled Cardiac code to standard output.

You can find all the source for both the simulator and this assembler on my Python Experiments BitBucket page, along with an example deck that counts to 10 in both Assembly and Cardiac ready format. The code assembled by this assembler should run without error on either a real cardiac or a simulated one. You can find a simulator made in JavaScript by a Drexel University professor by following the links through this blog page here: Developing Upwards: CARDIAC: The Cardboard Computer.

Hopefully this guide/article was helpful and inspiring to you, and I hope to have the final part of this guide/toolchain ready for consumption soon, the guide on how to build a Compiler for the Cardiac computer.

19 Oct 2014 10:44am GMT

Python Diary: Building a CARDIAC Assembler in Python

In my last article I gave a fully working example of a CARDIAC CPU simulator in Python. This article will continue down the path of creating a build toolchain for the Cardiac. Due to how the Cardiac accepts data from the outside world, namely in the form of a deck of cards or plain numbers, developing an assembler was a bit more difficult than a traditional bytecode assembler. At least when I build a binary assembler, I assemble the bytecodes in-memory. This ensures that all memory pointers are correctly pointing to the proper memory addresses, and also eliminates errors due to how information is stored. This is similar to how the MS-DOS DEBUG.EXE worked to build .COM files. It was simple and yet very effective. This tends to be the model I normally follow, as it makes debugging the virtual machine very easy. This is how I built up my simple-cpu project, and now I am in the process of enabling dynamic memory relocation to better enable the loading binary files. I also recently added a full memory map interface to it as well, so now memory mapped I/O is possible, and I plan on using that to allow my binaries to directly draw onto an SDL surface. Anyways, onward to the Cardiac Assembler, as that is the real reason your here today!

As of this post, you can view all the source code for this post and the last one on my Python Experiments Bitbucket repo.

The assembler is going to use just the standard Python library modules, so no fancy GNU Bison or YACC here. First, let's begin by defining what our language will look like, it's overall syntax. Here is deck1.txt turned into an Assembly program:

# This deck1.txt converted over to ASM format.
# Lines which start with a hash are comments.

# Here we declare our variables in memory for use.
var count = 0
var i = 9

# This is needed to add the special Cardiac bootstrap code.
bootstrap

# Program begins here.
CLA
STO $count
label loop
CLA $i
TAC $endloop
OUT $count
CLA $count
ADD
STO $count
CLA $i
SUB
STO $i
JMP $loop
label endloop
HRS

# End of program.
end

To make developing programs for the CARDIAC easier, the assembler is going to introduce the concept of variable storage, and labels. The first few lines here that begin with var are basically setting aside memory addresses to store these variables in question. The bootstrap statement is more of a Macro as it just writes out a basic bootstrap loader for use on CARDIAC. There is also an alternative bootloader, which uses a second stage boot program to load in your code. This has the advantage of being able to load in very large programs without needing a very large deck. I'd recommend playing around with both options when assembling your programs. The bootloader was custom built code made by me, here is an assembly listing of it:

INP 2
JMP 0

INP 89
INP 1
INP 90
CLA 89
INP 91
ADD
INP 92
STO 89
INP 93
CLA 98
INP 94
SUB
INP 95
STO 98
INP 96
TAC 3
INP 97
JMP 89
INP 98
INP 13
INP 2
JMP 89

The bootloader is passed in the size of the deck to load, and loops over each card to load in your program, this means that your input program will no longer need to have lots of address markings. It can also be used to load in other data into a specific memory address. I am planning on updating the bootloader code to also act like a subroutine, so you could call it to load additional data into memory. Okay, so onto the actual Assembler now.

from cmd import Cmd
from cStringIO import StringIO
import shlex, sys

class AsmError(Exception):
    pass

class Assembler(Cmd):
    """
    I am sure this could be done better with say GNU Bison or Yacc,
    but that's more complicated than needed for a simple assembler.
    """
    op_map = {
        'inp': 0,
        'cla': 1,
        'add': 2,
        'tac': 3,
        'sft': 4,
        'out': 5,
        'sto': 6,
        'sub': 7,
        'jmp': 8,
        'hrs': 9,
    }
    padding = '00'
    def configure(self):
        self.start = self.addr = None
        self.pc = 0 #: Allows us to keep track of the program pointer.
        self.var_map = {} #: This is used to keep track of variables.
        self.labels = {} #: Stores the label names, and where they point to.
        self.buffer = StringIO() #: This is our buffer where we will store the CARDIAC deck
        self.size = 0
    def emptyline(self):
        """ This is requried due to how the Python Cmd module works... """
        pass
    def unknown_command(self, line):
        self.stdout.write('*** Unknown syntax: %s\n'%line)
    @property
    def ptr(self):
        """ This will always give the proper pointer in the deck. """
        if self.addr:
            return self.addr
        return self.pc
    def write_cards(self, *card_list):
        """ Helper fuction to make life easier. """
        for card in card_list:
            self.buffer.write('%s\n' % card)
        self.pc += len(card_list) #: Increment the program pointer.
    def write_addr(self):
        """ This method will only write out the address if we're in that mode. """
        if self.addr:
            self.write_cards('0%s' % self.addr)
            self.addr +=1
        self.size +=1

The first thing to notice here, is that the assembler is a subclass of Cmd, which is a command line parser. I choose to use a command-line parser and it makes things easier and can also be used interactively if I need to debug or later want to enable that feature. The first thing we declare in this class, is a dictionary called op_map, which maps our op codes to actual byte codes. This makes it efficient to write the output and also add new opcodes in the future. The first method here, configure() could have well been an override of __init__(), but I choose to just have it a separate method that I can call manually. The next 2 methods are specific to the command parser, and tell it how to handle empty lines and invalid commands. The ptr property added here is to make it transparent to the code the proper location in the deck at runtime. This is required due to how the Cardiac bootstraps code. You cannot just load in RAW code, you need to bootstrap your deck of cards, and the deck itself needs to keep track at which memory locations each of it's operations are being placed in. Since I offer 2 modes of assembly here, and also support RAW assembly, there are different ways the exact memory pointer is obtained for runtime variables and jumps. The next 2 methods are for writing specific card types, one being a normal list of cards, while the other being an address card. The address cards are only used if you assemble using bootstrap, the original Cardiac bootstrap program.

Since this is a command processor Python subclass, next we are going to put in some commands that our source code can use to perform specific tasks:

    def do_exit(self, args):
        return True
    def do_bootstrap(self, args):
        """ Places some basic bootstrap code in. """
        self.addr = 10
        self.start = self.addr #: Updates all required address variables.
        self.write_cards('002', '800')
        self.pc = self.start
    def do_bootloader(self, args):
        """ Places a Cardiac compatible bootloader in. """
        self.write_cards('002', '800') 
        addr = 89 #: This is the start address of the bootloader code.
        for card in ('001', '189', '200', '689', '198', '700', '698', '301', '889', 'SIZE'):
            self.write_cards('0%s' % addr, card)
            addr+=1
        self.write_cards('002', '889')
        self.pc = 1
    def do_var(self, args):
        """ Creates a named variable reference in memory, a simple pointer. """
        s = shlex.split(args)
        if len(s) != 3 or s[1] != '=':
            raise AsmError('Incorrect format of the "var" statement.')
        if s[0] in self.var_map:
            raise AsmError('Variable has been declared twice!')
        value = int(s[2])
        self.var_map[s[0]] = '000'+str(value)
    def do_label(self, args):
        """ Creates a named label reference in memory, a simple pointer. """
        if args == '':
            raise AsmError('Incorrect format of the "label" statement.')
        if args in self.labels:
            raise AsmError('Label has been declared twice!')
        ptr = '00'+str(self.ptr)
        self.labels[args] = ptr[-2:]

Okay, there's a fair amount to explain with this code. Firstly, the two first methods here, bootstrap and bootloader are obviously Cardiac specific, and it's doubtful that a homemade Assembler will even need such things. The bootstrap() method sets up some variables required for that mode of assembly, namely the address variables used to keep track which memory location we are at. We also write 2 basic cards, the bootstrap that Cardiac needs to load your program in a real or simulated Cardiac. The next method, bootloader is a bit more complex, it writes the same initial bootstrap required by all cardiac programs, but also writes out a second stage bootloader program which is responsible for loading in your program. Once I get to the guide about creating a compiler, the compiler will automatically choose which mode of assembly will be used based on the program being compiled.

The next two very important methods, do_var, and do_label are responsible for controlling the variable and label system of the assembler. They make it easy to set a side a memory address for use as a variable, and to place in a label within your code so that you may JMP to it. If these didn't exist, you would be forced to keep track of memory addresses yourself, and if you update your program, you will also need to update the memory addresses... So, variables and labels keep track of all that for you, and the memory addresses are generated automatically during assembly time. Both of these methods should be fairly easy to understand what they are doing. They are storing the variable and label name into a hash along with the needed metadata.

    def default(self, line):
        """ This method is the actual lexical parser for the assembly. """
        if line.startswith('#'):
            return
        s = shlex.split(line)
        op, arg = s[0].lower(), '00'
        if len(s) == 2:
            if s[1].startswith('$'):
                arg = '*%s' % s[1][1:]
            else:
                arg = self.padding+s[1]
                arg = arg[-2:]
        if op in self.op_map:
            self.write_addr()
            self.write_cards('%s%s' % (self.op_map[op], arg))
        else:
            self.unknown_command(line)
    def do_end(self, args):
        """ Finalizes your code. """
        for var in self.var_map:
            ptr = self.padding+str(self.ptr)
            self.write_addr()
            self.write_cards(self.var_map[var][-3:])
            self.labels[var] = ptr[-2:]
        if self.start:
            self.write_cards('002', '8%s' % self.start)
        self.buffer.seek(0)
        buf = StringIO()
        for card in self.buffer.readlines():
            if card[1] == '*': #: We have a label.
                card = '%s%s\n' % (card[0], self.labels[card[2:-1]])
            elif card[:4] == 'SIZE':
                card = '000'+str(self.size-1)
                card = '%s\n' % card[-3:]
            buf.write(card)
        buf.seek(0)
        print ''.join(buf.readlines())
        return True

Now onto the bread and butter of the assembler, these two methods here do most of the heavy lifting during the assembly process. First lets talk about default(), this method is called by the command-line parser when it cannot find the command the end-user attempted to call. Makes sense? So, here is where we place the code to check if any of those lines are assembly instructions and process them accordingly. First we filter out any comments, any line that starts with a hash is a commented line, simple as that. Next, we use the help of shlex module to parse the actual line. This module splits all the tokens into separate array elements, it even works with quotation marks, so take this line for example: echo "Example echo command" will result in only 2 array elements, the first being echo, and the second being the string. In the next line we set some needed defaults for Cardiac cards, so if there is a missing argument it will still work fine on the Cardiac. It is here that we also check the arguments to see if we are using any variables or labels and mark them accordingly in the buffer for the second pass.

The last and final method here do_end is when you place end in your code at the very end. This method is essentially the last pass, and which generates the final assembled code. It gathers together all the variables, and generates a table of pointers to them. Then we write out the required cards to run the program the Cardiac just loaded, if we are using the original bootstrap. The bootloader doesn't use this, the bootloader manages that during runtime. Then we seek the buffer to the beginning and being our last pass through the assembled data. We check for any memory pointer references and fill in the holes as needed in the output code. If using the bootloader, it needs to be passed the size of the program code to be loaded, and this is where we fill that in so that the bootloader can locate the variable. Then finally, we display the assembled Cardiac code to standard output.

You can find all the source for both the simulator and this assembler on my Python Experiments BitBucket page, along with an example deck that counts to 10 in both Assembly and Cardiac ready format. The code assembled by this assembler should run without error on either a real cardiac or a simulated one. You can find a simulator made in JavaScript by a Drexel University professor by following the links through this blog page here: Developing Upwards: CARDIAC: The Cardboard Computer.

Hopefully this guide/article was helpful and inspiring to you, and I hope to have the final part of this guide/toolchain ready for consumption soon, the guide on how to build a Compiler for the Cardiac computer.

19 Oct 2014 10:44am GMT

18 Oct 2014

feedPlanet Python

Salim Fadhley: My newest project: Getting code onto Pyboards the easy way

My latest python project is a way of making Mycro Python "pyboards" easier to use. Pyboards are small microcomputers which can execute code written in Micropython, which is a Python3 compatible language. It has the same grammar as cPython3 and includes a small subset of the standard library. They are an amazing invention because they open […]

18 Oct 2014 11:04pm GMT

Salim Fadhley: My newest project: Getting code onto Pyboards the easy way

My latest python project is a way of making Mycro Python "pyboards" easier to use. Pyboards are small microcomputers which can execute code written in Micropython, which is a Python3 compatible language. It has the same grammar as cPython3 and includes a small subset of the standard library. They are an amazing invention because they open […]

18 Oct 2014 11:04pm GMT

17 Oct 2014

feedPlanet Python

David Szotten: Strace to the rescue

Recently I was debugging a strange error when using the eventlet support in the new the new coverage 4.0 alpha. The issue manifested itself as a network connection problem: Name or service not known. This was confusing, since the host it was trying to connect to was localhost. How can it fail to resolve localhost?! Switching off the eventlet tracing, the problem went away.

After banging my head against this for a few days, I finally remembered a tool a rarely think to pull out: strace.

There's an excellent blog post showing the basics of strace by Chad Fowler, The Magic of Strace. After tracing my test process, I could easily search the output for my error message:

11045 write(2, "2014-10-15 09:16:48,348 [ERROR] py2neo.packages.httpstream.http: !!! NetworkAddressError: Name or service not known", 127) = 127

and a few lines above lay the solution to my mystery:

11045 open("/etc/hosts", O_RDONLY|O_CLOEXEC) = -1 EMFILE (Too many open files)

It turns out the eventlet tracer was causing my code to leak file descriptors, (a problem I'm still investigating), eventually hitting my relatively low ulimit. Bumping the limit in /etc/security/limits.conf, the problem disappeared!

I must remember to reach for strace sooner when trying to debug odd system behaviours.

17 Oct 2014 11:00pm GMT

David Szotten: Strace to the rescue

Recently I was debugging a strange error when using the eventlet support in the new the new coverage 4.0 alpha. The issue manifested itself as a network connection problem: Name or service not known. This was confusing, since the host it was trying to connect to was localhost. How can it fail to resolve localhost?! Switching off the eventlet tracing, the problem went away.

After banging my head against this for a few days, I finally remembered a tool a rarely think to pull out: strace.

There's an excellent blog post showing the basics of strace by Chad Fowler, The Magic of Strace. After tracing my test process, I could easily search the output for my error message:

11045 write(2, "2014-10-15 09:16:48,348 [ERROR] py2neo.packages.httpstream.http: !!! NetworkAddressError: Name or service not known", 127) = 127

and a few lines above lay the solution to my mystery:

11045 open("/etc/hosts", O_RDONLY|O_CLOEXEC) = -1 EMFILE (Too many open files)

It turns out the eventlet tracer was causing my code to leak file descriptors, (a problem I'm still investigating), eventually hitting my relatively low ulimit. Bumping the limit in /etc/security/limits.conf, the problem disappeared!

I must remember to reach for strace sooner when trying to debug odd system behaviours.

17 Oct 2014 11:00pm GMT

PyCharm: Second PyCharm 4 EAP: IPython notebook, Attach to process and more

Having announced the first Early Access Preview build of PyCharm 4 almost a month ago, today we're eager to let you know that the second PyCharm 4 EAP build 139.113 is ready for your evaluation. Please download it for your platform from our EAP page.

Just as always, this EAP build can be used for 30 days starting from its release date and it does not require any license.

The most exciting announcement of this fresh preview and the whole upcoming release of PyCharm 4 is that the IPython notebook functionality is now fully supported in PyCharm!
It has been one of the most top voted feature requests in PyCharm's public tracker for quite a long time and now we're proud to introduce this brand new integration to you.

Note that the IPython Notebook integration is available in both PyCharm Community Edition and PyCharm Professional Edition.

ipynb

Now with PyCharm you can perform all the usual IPython notebook actions with *.ipynb files. Basically everything that you got used to with the ordinary IPython notebook is now supported inside PyCharm: PyCharm recognizes different types of cells and evaluates them independently. You can delete cells or edit previously evaluated ones. Also you can output matplotlib plots or images:

ipynb2

When editing code inside cells, PyCharm provides well-know intelligent code completion as if it were an ordinary Python file. You can also get quick documentation and perform all other usual actions that can be done in PyCharm.
So with this integration we have great news - now you can get the best of both PyCharm and IPython Notebook using them together!
Please give it a try, and give us your feedback prior to the final release of Pycharm 4.

Stay tuned for future blog posts with detailed descriptions of this great feature!

Introducing a new feature - Attach to process

Another great feature of the second PyCharm 4 preview build is that Pycharm's debugger can now attach to a process!

Note: the "attach to process" functionality is available in both PyCharm Community Edition and PyCharm Professional Edition

With PyCharm 4 you can now connect a debugger with any running python process and debug in the attached mode. All you need is go to Tools | Attach to Process.
PyCharm will show you the list of running python processes on a system. Just select the one you want to connect to and click OK:

debugger1

From this point you can use the debugger as usual - setting breakpoints, stepping into/over, pausing and resuming the process, evaluating variables and expressions, and changing the runtime context:

debugger

Currently we support the attach to process only for Windows and Linux platforms. Hopefully we'll add the support for Mac OS with the next EAP.
Also please note that on most Linux machines, attaching to a process is disabled by default. In order to enable it on a system level, please do

echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope

If you want it permanently, please edit /etc/sysctl.d/10-ptrace.conf (works for ubuntu) and change the line:

kernel.yama.ptrace_scope = 1

to read:

kernel.yama.ptrace_scope = 0

Check ptrace configuration for your Linux distribution accordingly.

Better package management

The other good news is the improved package management subsystem. It got smarter and now recognizes unmet package requirements better. It also has a better UI - showing progress on package installation and a Choose Packages to Install dialog:

packmanagement

In case of errors, PyCharm now has better reports that include suggested solutions:

Screenshot from 2014-10-16 20:44:05

Another good thing is that the JSON support now comes bundled with PyCharm 4 in both Community Edition and Professional Edition. That means JSON is now supported on a platform level and has separate code style and appearance settings as well as its own code inspections, etc.:

JSON

And finally one more useful feature that comes from the Intellij platform. The Code Style settings now offers a new option: Detect and use existing file indents for editing (enabled by default):

codestyle

This new option lets Pycharm detect certain Code Style settings (such as Use Tab character and Indent size) in the currently edited file on the fly. It means that even if the file has its code style different from your current settings, they will still be preserved.
Now you don't need to worry about losing the formatting that is specific to certain files in your project.

That's not all as this build has many other improvements and bug fixes - for example, improved Django 1.7 code insight. So we urge you to check the fixed issues and compare this build to the previous one!

Please give PyCharm 4 EAP a try before its official release and please report any bugs and feature request to our issue tracker.

Develop with Pleasure!
-PyCharm team

17 Oct 2014 5:21pm GMT

PyCharm: Second PyCharm 4 EAP: IPython notebook, Attach to process and more

Having announced the first Early Access Preview build of PyCharm 4 almost a month ago, today we're eager to let you know that the second PyCharm 4 EAP build 139.113 is ready for your evaluation. Please download it for your platform from our EAP page.

Just as always, this EAP build can be used for 30 days starting from its release date and it does not require any license.

The most exciting announcement of this fresh preview and the whole upcoming release of PyCharm 4 is that the IPython notebook functionality is now fully supported in PyCharm!
It has been one of the most top voted feature requests in PyCharm's public tracker for quite a long time and now we're proud to introduce this brand new integration to you.

Note that the IPython Notebook integration is available in both PyCharm Community Edition and PyCharm Professional Edition.

ipynb

Now with PyCharm you can perform all the usual IPython notebook actions with *.ipynb files. Basically everything that you got used to with the ordinary IPython notebook is now supported inside PyCharm: PyCharm recognizes different types of cells and evaluates them independently. You can delete cells or edit previously evaluated ones. Also you can output matplotlib plots or images:

ipynb2

When editing code inside cells, PyCharm provides well-know intelligent code completion as if it were an ordinary Python file. You can also get quick documentation and perform all other usual actions that can be done in PyCharm.
So with this integration we have great news - now you can get the best of both PyCharm and IPython Notebook using them together!
Please give it a try, and give us your feedback prior to the final release of Pycharm 4.

Stay tuned for future blog posts with detailed descriptions of this great feature!

Introducing a new feature - Attach to process

Another great feature of the second PyCharm 4 preview build is that Pycharm's debugger can now attach to a process!

Note: the "attach to process" functionality is available in both PyCharm Community Edition and PyCharm Professional Edition

With PyCharm 4 you can now connect a debugger with any running python process and debug in the attached mode. All you need is go to Tools | Attach to Process.
PyCharm will show you the list of running python processes on a system. Just select the one you want to connect to and click OK:

debugger1

From this point you can use the debugger as usual - setting breakpoints, stepping into/over, pausing and resuming the process, evaluating variables and expressions, and changing the runtime context:

debugger

Currently we support the attach to process only for Windows and Linux platforms. Hopefully we'll add the support for Mac OS with the next EAP.
Also please note that on most Linux machines, attaching to a process is disabled by default. In order to enable it on a system level, please do

echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope

If you want it permanently, please edit /etc/sysctl.d/10-ptrace.conf (works for ubuntu) and change the line:

kernel.yama.ptrace_scope = 1

to read:

kernel.yama.ptrace_scope = 0

Check ptrace configuration for your Linux distribution accordingly.

Better package management

The other good news is the improved package management subsystem. It got smarter and now recognizes unmet package requirements better. It also has a better UI - showing progress on package installation and a Choose Packages to Install dialog:

packmanagement

In case of errors, PyCharm now has better reports that include suggested solutions:

Screenshot from 2014-10-16 20:44:05

Another good thing is that the JSON support now comes bundled with PyCharm 4 in both Community Edition and Professional Edition. That means JSON is now supported on a platform level and has separate code style and appearance settings as well as its own code inspections, etc.:

JSON

And finally one more useful feature that comes from the Intellij platform. The Code Style settings now offers a new option: Detect and use existing file indents for editing (enabled by default):

codestyle

This new option lets Pycharm detect certain Code Style settings (such as Use Tab character and Indent size) in the currently edited file on the fly. It means that even if the file has its code style different from your current settings, they will still be preserved.
Now you don't need to worry about losing the formatting that is specific to certain files in your project.

That's not all as this build has many other improvements and bug fixes - for example, improved Django 1.7 code insight. So we urge you to check the fixed issues and compare this build to the previous one!

Please give PyCharm 4 EAP a try before its official release and please report any bugs and feature request to our issue tracker.

Develop with Pleasure!
-PyCharm team

17 Oct 2014 5:21pm GMT

Andrew Dalke: MACCS key 44

The MACCS 166 keys are one of the mainstay fingerprints of cheminformatics, especially regarding molecular similarity. It's rather odd, really, since they were developed for substructure screening and not similarity. I suppose that Jaccard would agree that any relatively diverse feature vector can likely be used to measure similarity, whether it be Alpine biomes or chemical structures.

Here's a bit of dirty laundry that you'll not read in the literature. There are a lot of MACCS implementations, and they don't agree fully with each other. The differences are likely small, but as far as I can tell, no one has really investigated if it's a problem, or noted that it might be a problem.

I'll structure the explanation around key #44. What is definition for key 44?

To start, there is no publication describing the MACCS 166 public keys. All of the citations for it either say a variation of "MDL did it" or cite the 2002 paper which reoptimized the keys for similarity ([PDF]). Thing is, just about everyone uses the "unoptimized" definitions, so this is, technically, the wrong citation. (Why do people use it? Tradition, perhaps, or because it feels better to have a real citation rather than a nebulous one.)

Instead, the definitions appear to have come from ISIS/Base, and have been passed around from person to person through informal means. I haven't used the MDL software and can't verify the source myself. There's a relatively recent whitepaper from Accelrys titled "The Keys to Understanding MDL Keyset Technology" which says they are defined in the file "eksfil.dat". A Google search finds 8 results for "eksfil.dat". All are tied to that white paper. The PDF has creation and modification dates of 31 August 2011, and Archive.org first saw that URL on 11 October 2011.

It's easy to see that the reoptimization fingerprint is not the same as the 166 keys that everyone uses. You'll find that many places say that key 44 is defined as "OTHER". Table 5 of the reoptimization paper has an entry for '"other" atom type', but there's nothing which assigns it to key 44. You can't even try to infer some sort of implicit ordering because the previous entry in table 5 is "isotope", which is key 1 in the MACCS 166 keys, and two entries later is "halogen", which is key 134.

If you cite Durant, Leland, Henry, and Nourse (2002) as your reference to the MACCS 166 bit public keys then you are doing your readers a disservice. Those define different fingerprints than you used. Just go ahead and cite "MACCS keys. MDL Information Systems" and if the reviewer complains that it's a bad citation, point them to this essay and ask them for the correct one. Then tell me what they said. If Accelrys complains then they need to suggest the correct citation and put it in their white paper. Even better would be a formal publication and a validation suite. (I can dream, can't I?)

In practice, many people use the MACCS keys as interpreted by the implementers of some piece of software. I used "interepreted by" because "implemented by" is too strong. There are ambiguities in the definition, mistakes in the implementations, and differences in chemical interpretation, compounded by a lack of any sort of comprehensive validation suite.

Let's take key 44, "OTHER". Remember how the definition comes from an internal MDL data file? What does "OTHER" mean? RDKit defines it as '?' in MACCSkeys.py to indicate that it has no definition for that key. That line has a commit date of 2006-05-06. RDKit's lack of a definition is notable because Open Babel, CDK, a user contributed implementation for ChemAxon and many others reuse the RDKit SMARTS definitions. All of them omit key 44.

Others have implemented key 44. TJ O'Donnell, in "Design and Use of Relational Databases in Chemistry" (2009) defines it as the SMARTS [!#6!#7!#8!#15!#16!#9!#17!#35]. MayaChemTools defines it in code as an atom with element number in "1|6|7|8|9|14|15|16|17|35|53". (See _IsOtherAtom.)

These are the ones where I have access to the source and could investigate without much effort.

Both the whitepaper and the reoptimization paper define what "other" means, and the whitepaper does so specifically in the context of the MACCS 166 keys. It says:

"Other" atoms include any atoms other than H, C, N, O, Si, P, S, F, Cl, Br, and I, and is abbreviated "Z".

This appears definite and final. Going back to the three different implementation geneologies, RDKit and its many spinoffs don't have a definition so by definition isn't correct. O'Donnell's is close, but the SMARTS pattern omits hydrogen, silicon, and iodine. And MayaChemTools gets it exactly correct.

Good job, Manish Sud!

Are these MACCS variations really a problem?

No. Not really. Well, maybe. It depends on who you are.

When used for similarity, a dead bit just makes things more similar because there are fewer ways to distinguish between molecules. In this case too, key 44 is rare. Only a handful of molecules contain "other" atoms (like the gold in auranofin) so when characterizing a database it's likely fine.

You don't need to trust my own gut feeling. You can read the RDKit documentation and see "The MACCS keys were critically evaluated and compared to other MACCS implementations in Q3 2008. In cases where the public keys are fully defined, things looked pretty good."

Okay, so you're hesistent about the keys which aren't "fully defined"? No need to despair. Roger Sayle ported the RDKit patterns (and without key 44) over to ChemAxon, and reported:

This work is heavily based upon the previous implementation by Miklos Vargyas, and the SMARTS definitions developed and refined by Greg Landrum and Andrew Dalke. This implementation achieves ~65% on the standard Briem and Lessel benchmark, i.e. almost identical to the expected value for MACCS keys reported in the literature by MDL and others.

(NB: All I did was proofread the RDKit SMARTS and find a few places that needed fixing.)

The MACCS 166 keys are a blunt tool, designed for substructure search and repurposed for similarity more because it was already present and easy to generate. 2D similarity search is another blunt tool. That's not to say they are horrible or worthless! A rock is a blunt tool for making an ax, but we used stone axes quite effectively throughout the Neolith.

Just don't treat the MACCS 166 keys as a good luck charm, or as some sort of arcane relic passed down by the ancients. There are limitations in the definition and limitations in the implementation. Different tools will give different answers, and if you don't understand your tools they may turn on you.

And when you write a paper, be honest to your readers. If you are using the RDKit implementation of the MACCS keys or derived version in another toolkit (and assuming they haven't been changed since I wrote this essay), point out that you are only using 164 of those 166 bits.

Homework assignment

For a warmup exerecise, what is the other unimplemented bit in the RDKit MACCS definition?

For your homework assignment, use two different programs to compute the MACCS keys for a large data set and see 1) how many bits are different? (eg, sum of the Manhattan distance between the fingerprints for each record, or come up with a better measure), 2) how many times does the nearest neighbor change?, and 3) (bonus points) characterize how often those differences are because of differences in how to interpret a key and how often it's because of different toolkit aromaticity/chemistry perception methods.

I expect a paper in a journal by the end of next year. :).

(Then again, for all I know this is one of those negative results papers that's so hard to publish. "9 different MACCS key implementations produce identical MACCS keys!" doesn't sound exciting, does it?)

17 Oct 2014 12:00pm GMT

Andrew Dalke: MACCS key 44

The MACCS 166 keys are one of the mainstay fingerprints of cheminformatics, especially regarding molecular similarity. It's rather odd, really, since they were developed for substructure screening and not similarity. I suppose that Jaccard would agree that any relatively diverse feature vector can likely be used to measure similarity, whether it be Alpine biomes or chemical structures.

Here's a bit of dirty laundry that you'll not read in the literature. There are a lot of MACCS implementations, and they don't agree fully with each other. The differences are likely small, but as far as I can tell, no one has really investigated if it's a problem, or noted that it might be a problem.

I'll structure the explanation around key #44. What is definition for key 44?

To start, there is no publication describing the MACCS 166 public keys. All of the citations for it either say a variation of "MDL did it" or cite the 2002 paper which reoptimized the keys for similarity ([PDF]). Thing is, just about everyone uses the "unoptimized" definitions, so this is, technically, the wrong citation. (Why do people use it? Tradition, perhaps, or because it feels better to have a real citation rather than a nebulous one.)

Instead, the definitions appear to have come from ISIS/Base, and have been passed around from person to person through informal means. I haven't used the MDL software and can't verify the source myself. There's a relatively recent whitepaper from Accelrys titled "The Keys to Understanding MDL Keyset Technology" which says they are defined in the file "eksfil.dat". A Google search finds 8 results for "eksfil.dat". All are tied to that white paper. The PDF has creation and modification dates of 31 August 2011, and Archive.org first saw that URL on 11 October 2011.

It's easy to see that the reoptimization fingerprint is not the same as the 166 keys that everyone uses. You'll find that many places say that key 44 is defined as "OTHER". Table 5 of the reoptimization paper has an entry for '"other" atom type', but there's nothing which assigns it to key 44. You can't even try to infer some sort of implicit ordering because the previous entry in table 5 is "isotope", which is key 1 in the MACCS 166 keys, and two entries later is "halogen", which is key 134.

If you cite Durant, Leland, Henry, and Nourse (2002) as your reference to the MACCS 166 bit public keys then you are doing your readers a disservice. Those define different fingerprints than you used. Just go ahead and cite "MACCS keys. MDL Information Systems" and if the reviewer complains that it's a bad citation, point them to this essay and ask them for the correct one. Then tell me what they said. If Accelrys complains then they need to suggest the correct citation and put it in their white paper. Even better would be a formal publication and a validation suite. (I can dream, can't I?)

In practice, many people use the MACCS keys as interpreted by the implementers of some piece of software. I used "interepreted by" because "implemented by" is too strong. There are ambiguities in the definition, mistakes in the implementations, and differences in chemical interpretation, compounded by a lack of any sort of comprehensive validation suite.

Let's take key 44, "OTHER". Remember how the definition comes from an internal MDL data file? What does "OTHER" mean? RDKit defines it as '?' in MACCSkeys.py to indicate that it has no definition for that key. That line has a commit date of 2006-05-06. RDKit's lack of a definition is notable because Open Babel, CDK, a user contributed implementation for ChemAxon and many others reuse the RDKit SMARTS definitions. All of them omit key 44.

Others have implemented key 44. TJ O'Donnell, in "Design and Use of Relational Databases in Chemistry" (2009) defines it as the SMARTS [!#6!#7!#8!#15!#16!#9!#17!#35]. MayaChemTools defines it in code as an atom with element number in "1|6|7|8|9|14|15|16|17|35|53". (See _IsOtherAtom.)

These are the ones where I have access to the source and could investigate without much effort.

Both the whitepaper and the reoptimization paper define what "other" means, and the whitepaper does so specifically in the context of the MACCS 166 keys. It says:

"Other" atoms include any atoms other than H, C, N, O, Si, P, S, F, Cl, Br, and I, and is abbreviated "Z".

This appears definite and final. Going back to the three different implementation geneologies, RDKit and its many spinoffs don't have a definition so by definition isn't correct. O'Donnell's is close, but the SMARTS pattern omits hydrogen, silicon, and iodine. And MayaChemTools gets it exactly correct.

Good job, Manish Sud!

Are these MACCS variations really a problem?

No. Not really. Well, maybe. It depends on who you are.

When used for similarity, a dead bit just makes things more similar because there are fewer ways to distinguish between molecules. In this case too, key 44 is rare. Only a handful of molecules contain "other" atoms (like the gold in auranofin) so when characterizing a database it's likely fine.

You don't need to trust my own gut feeling. You can read the RDKit documentation and see "The MACCS keys were critically evaluated and compared to other MACCS implementations in Q3 2008. In cases where the public keys are fully defined, things looked pretty good."

Okay, so you're hesistent about the keys which aren't "fully defined"? No need to despair. Roger Sayle ported the RDKit patterns (and without key 44) over to ChemAxon, and reported:

This work is heavily based upon the previous implementation by Miklos Vargyas, and the SMARTS definitions developed and refined by Greg Landrum and Andrew Dalke. This implementation achieves ~65% on the standard Briem and Lessel benchmark, i.e. almost identical to the expected value for MACCS keys reported in the literature by MDL and others.

(NB: All I did was proofread the RDKit SMARTS and find a few places that needed fixing.)

The MACCS 166 keys are a blunt tool, designed for substructure search and repurposed for similarity more because it was already present and easy to generate. 2D similarity search is another blunt tool. That's not to say they are horrible or worthless! A rock is a blunt tool for making an ax, but we used stone axes quite effectively throughout the Neolith.

Just don't treat the MACCS 166 keys as a good luck charm, or as some sort of arcane relic passed down by the ancients. There are limitations in the definition and limitations in the implementation. Different tools will give different answers, and if you don't understand your tools they may turn on you.

And when you write a paper, be honest to your readers. If you are using the RDKit implementation of the MACCS keys or derived version in another toolkit (and assuming they haven't been changed since I wrote this essay), point out that you are only using 164 of those 166 bits.

Homework assignment

For a warmup exerecise, what is the other unimplemented bit in the RDKit MACCS definition?

For your homework assignment, use two different programs to compute the MACCS keys for a large data set and see 1) how many bits are different? (eg, sum of the Manhattan distance between the fingerprints for each record, or come up with a better measure), 2) how many times does the nearest neighbor change?, and 3) (bonus points) characterize how often those differences are because of differences in how to interpret a key and how often it's because of different toolkit aromaticity/chemistry perception methods.

I expect a paper in a journal by the end of next year. :).

(Then again, for all I know this is one of those negative results papers that's so hard to publish. "9 different MACCS key implementations produce identical MACCS keys!" doesn't sound exciting, does it?)

17 Oct 2014 12:00pm GMT

16 Oct 2014

feedPlanet Python

Omaha Python Users Group: October 15 Meeting Notes

Here are links to some of the topics discussed at the meeting. Thanks for everyone's participation.

http://exercism.io/

http://pandas.pydata.org/

http://ipython.org/

http://pbpython.com/simple-data-analysis.html and http://www.gregreda.com/2013/10/26/intro-to-pandas-data-structures/ for some examples.

Django Debug Toolbar

16 Oct 2014 11:15pm GMT

Omaha Python Users Group: October 15 Meeting Notes

Here are links to some of the topics discussed at the meeting. Thanks for everyone's participation.

http://exercism.io/

http://pandas.pydata.org/

http://ipython.org/

http://pbpython.com/simple-data-analysis.html and http://www.gregreda.com/2013/10/26/intro-to-pandas-data-structures/ for some examples.

Django Debug Toolbar

16 Oct 2014 11:15pm GMT

10 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: King Willams Town Bahnhof

Gestern musste ich morgens zur Station nach KWT um unsere Rerservierten Bustickets für die Weihnachtsferien in Capetown abzuholen. Der Bahnhof selber ist seit Dezember aus kostengründen ohne Zugverbindung - aber Translux und co - die langdistanzbusse haben dort ihre Büros.


Größere Kartenansicht




© benste CC NC SA

10 Nov 2011 10:57am GMT

09 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein

Niemand ist besorgt um so was - mit dem Auto fährt man einfach durch, und in der City - nahe Gnobie- "ne das ist erst gefährlich wenn die Feuerwehr da ist" - 30min später auf dem Rückweg war die Feuerwehr da.




© benste CC NC SA

09 Nov 2011 8:25pm GMT

08 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Brai Party

Brai = Grillabend o.ä.

Die möchte gern Techniker beim Flicken ihrer SpeakOn / Klinke Stecker Verzweigungen...

Die Damen "Mamas" der Siedlung bei der offiziellen Eröffnungsrede

Auch wenn weniger Leute da waren als erwartet, Laute Musik und viele Leute ...

Und natürlich ein Feuer mit echtem Holz zum Grillen.

© benste CC NC SA

08 Nov 2011 2:30pm GMT

07 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Lumanyano Primary

One of our missions was bringing Katja's Linux Server back to her room. While doing that we saw her new decoration.

Björn, Simphiwe carried the PC to Katja's school


© benste CC NC SA

07 Nov 2011 2:00pm GMT

06 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Nelisa Haircut

Today I went with Björn to Needs Camp to Visit Katja's guest family for a special Party. First of all we visited some friends of Nelisa - yeah the one I'm working with in Quigney - Katja's guest fathers sister - who did her a haircut.

African Women usually get their hair done by arranging extensions and not like Europeans just cutting some hair.

In between she looked like this...

And then she was done - looks amazing considering the amount of hair she had last week - doesn't it ?

© benste CC NC SA

06 Nov 2011 7:45pm GMT

05 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Mein Samstag

Irgendwie viel mir heute auf das ich meine Blogposts mal ein bischen umstrukturieren muss - wenn ich immer nur von neuen Plätzen berichte, dann müsste ich ja eine Rundreise machen. Hier also mal ein paar Sachen aus meinem heutigen Alltag.

Erst einmal vorweg, Samstag zählt zumindest für uns Voluntäre zu den freien Tagen.

Dieses Wochenende sind nur Rommel und ich auf der Farm - Katja und Björn sind ja mittlerweile in ihren Einsatzstellen, und meine Mitbewohner Kyle und Jonathan sind zu Hause in Grahamstown - sowie auch Sipho der in Dimbaza wohnt.
Robin, die Frau von Rommel ist in Woodie Cape - schon seit Donnerstag um da ein paar Sachen zur erledigen.
Naja wie dem auch sei heute morgen haben wir uns erstmal ein gemeinsames Weetbix/Müsli Frühstück gegönnt und haben uns dann auf den Weg nach East London gemacht. 2 Sachen waren auf der Checkliste Vodacom, Ethienne (Imobilienmakler) außerdem auf dem Rückweg die fehlenden Dinge nach NeedsCamp bringen.

Nachdem wir gerade auf der Dirtroad losgefahren sind mussten wir feststellen das wir die Sachen für Needscamp und Ethienne nicht eingepackt hatten aber die Pumpe für die Wasserversorgung im Auto hatten.

Also sind wir in EastLondon ersteinmal nach Farmerama - nein nicht das onlinespiel farmville - sondern einen Laden mit ganz vielen Sachen für eine Farm - in Berea einem nördlichen Stadteil gefahren.

In Farmerama haben wir uns dann beraten lassen für einen Schnellverschluss der uns das leben mit der Pumpe leichter machen soll und außerdem eine leichtere Pumpe zur Reperatur gebracht, damit es nicht immer so ein großer Aufwand ist, wenn mal wieder das Wasser ausgegangen ist.

Fego Caffé ist in der Hemmingways Mall, dort mussten wir und PIN und PUK einer unserer Datensimcards geben lassen, da bei der PIN Abfrage leider ein zahlendreher unterlaufen ist. Naja auf jeden Fall speichern die Shops in Südafrika so sensible Daten wie eine PUK - die im Prinzip zugang zu einem gesperrten Phone verschafft.

Im Cafe hat Rommel dann ein paar online Transaktionen mit dem 3G Modem durchgeführt, welches ja jetzt wieder funktionierte - und übrigens mittlerweile in Ubuntu meinem Linuxsystem perfekt klappt.

Nebenbei bin ich nach 8ta gegangen um dort etwas über deren neue Deals zu erfahren, da wir in einigen von Hilltops Centern Internet anbieten wollen. Das Bild zeigt die Abdeckung UMTS in NeedsCamp Katjas Ort. 8ta ist ein neuer Telefonanbieter von Telkom, nachdem Vodafone sich Telkoms anteile an Vodacom gekauft hat müssen die komplett neu aufbauen.
Wir haben uns dazu entschieden mal eine kostenlose Prepaidkarte zu testen zu organisieren, denn wer weis wie genau die Karte oben ist ... Bevor man einen noch so billigen Deal für 24 Monate signed sollte man wissen obs geht.

Danach gings nach Checkers in Vincent, gesucht wurden zwei Hotplates für WoodyCape - R 129.00 eine - also ca. 12€ für eine zweigeteilte Kochplatte.
Wie man sieht im Hintergrund gibts schon Weihnachtsdeko - Anfang November und das in Südafrika bei sonnig warmen min- 25°C

Mittagessen haben wir uns bei einem Pakistanischen Curry Imbiss gegönnt - sehr empfehlenswert !
Naja und nachdem wir dann vor ner Stunde oder so zurück gekommen sind habe ich noch den Kühlschrank geputzt den ich heute morgen zum defrosten einfach nach draußen gestellt hatte. Jetzt ist der auch mal wieder sauber und ohne 3m dicke Eisschicht...

Morgen ... ja darüber werde ich gesondert berichten ... aber vermutlich erst am Montag, denn dann bin ich nochmal wieder in Quigney(East London) und habe kostenloses Internet.

© benste CC NC SA

05 Nov 2011 4:33pm GMT

31 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Sterkspruit Computer Center

Sterkspruit is one of Hilltops Computer Centres in the far north of Eastern Cape. On the trip to J'burg we've used the opportunity to take a look at the centre.

Pupils in the big classroom


The Trainer


School in Countryside


Adult Class in the Afternoon


"Town"


© benste CC NC SA

31 Oct 2011 4:58pm GMT

Benedict Stein: Technical Issues

What are you doing in an internet cafe if your ADSL and Faxline has been discontinued before months end. Well my idea was sitting outside and eating some ice cream.
At least it's sunny and not as rainy as on the weekend.


© benste CC NC SA

31 Oct 2011 3:11pm GMT

30 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Nellis Restaurant

For those who are traveling through Zastron - there is a very nice Restaurant which is serving delicious food at reasanable prices.
In addition they're selling home made juices jams and honey.




interior


home made specialities - the shop in the shop


the Bar


© benste CC NC SA

30 Oct 2011 4:47pm GMT

29 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: The way back from J'burg

Having the 10 - 12h trip from J'burg back to ELS I was able to take a lot of pcitures including these different roadsides

Plain Street


Orange River in its beginngings (near Lesotho)


Zastron Anglican Church


The Bridge in Between "Free State" and Eastern Cape next to Zastron


my new Background ;)


If you listen to GoogleMaps you'll end up traveling 50km of gravel road - as it was just renewed we didn't have that many problems and saved 1h compared to going the official way with all it's constructions sites




Freeway


getting dark


© benste CC NC SA

29 Oct 2011 4:23pm GMT

28 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Wie funktioniert eigentlich eine Baustelle ?

Klar einiges mag anders sein, vieles aber gleich - aber ein in Deutschland täglich übliches Bild einer Straßenbaustelle - wie läuft das eigentlich in Südafrika ?

Ersteinmal vorweg - NEIN keine Ureinwohner die mit den Händen graben - auch wenn hier mehr Manpower genutzt wird - sind sie fleißig mit Technologie am arbeiten.

Eine ganz normale "Bundesstraße"


und wie sie erweitert wird


gaaaanz viele LKWs


denn hier wird eine Seite über einen langen Abschnitt komplett gesperrt, so das eine Ampelschaltung mit hier 45 Minuten Wartezeit entsteht


Aber wenigstens scheinen die ihren Spaß zu haben ;) - Wie auch wir denn gücklicher Weise mussten wir nie länger als 10 min. warten.

© benste CC NC SA

28 Oct 2011 4:20pm GMT