22 Aug 2017

feedPlanet Python

Anwesha Das: The mistakes I did in my blog posts

Today we will be discussing the mistakes I did with my blog posts.
I started (seriously) writing blogs a year back. A few of my posts got a pretty nice response. The praise put me in seventh heaven. I thought I was a fairly good blogger.But after almost a year of writing, one day I chanced upon one of my older posts and reading it sent me crashing down to earth.

There was huge list of mistakes I made

The post was a perfect example of TLDR. I previously used to judge a post based on quantity. The larger the number of words, the better! (Typical lawyer mentality!)

The title and the lead paragraph were vague.

The sentences were long (far too long).

There were plenty grammatical mistakes.

I lost the flow of thought, broke the logical chain in many places.

The measures I took to solve my problem

I was upset. I stopped writing for a month or so.
After the depressed, dispirited phase was over, I got back up, dusted myself off and tried to find out ways to make be a better writer.

Talks, books, blogs:

I searched for talks, writings, books on "how to write good blog posts" and started reading, and watching videos. I tried to follow those while writing my posts.

Earlier I used to take a lot of time (a week) to write each post. I used to flit from sentence to new sentence. I used to do that so I do not forget the latest idea or next thought that popped into my head.
But that caused two major problems:

First, the long writing time also meant long breaks. The interval broke my chain of thought anyway. I had to start again from the beginning. That resulted in confusing views and non-related sentences.

Secondly, it also caused the huge length of the posts.

Now I dedicate limited time, a few hours, for each post, depending on the idea.
And I strictly adhere to those hours. I use Tomato Timer to keep a check on the time. During that time I do not go to my web browser, check my phone, do any household activity and of course, ignore my husband completely.
But one thing I am not being able to avoid is, "Mamma no working. Let's play" situation. :)
I focus on the sentence I am writing. I do not jump between sentences. I've made peace with the fear of losing one thought and I do not disturb the one I am working on. This keeps my ideas clear.

To finish my work within the stipulated time
- I write during quieter hours, especially in the morning, - I plan what to write the day before, - am caffeinated while writing

Sometimes I can not finish it in one go. Then before starting the next day I read what I wrote previously, aloud.

Revision:

Previously after I finished writing, I used to correct only the red underlines. Now I take time and follow four steps before publishing a post:

Respect the readers

This single piece of advice has changed my posts for better.
Respect the reader.
Don't give them any false hopes or expectations.

With that in mind, I have altered the following two things in my blog:

Vague titles

I always thought out of the box, and figured that sarcastic titles would showcase my intelligence. A off hand, humourous title is good. How utterly wrong I was.

People search by asking relevant question on the topic.
Like for hardware () project with esp8266 using micropython people may search with
- "esp8266 projects" - "projects with micropython" - "fun hardware projects" etc. But no one will search with "mybunny uncle" (it might remind you of your kindly uncle, but definitely not a hardware project in any sense of the term).

People find your blogs by RSS feed or searching in any search engine.
So be as direct as possible. Give a title that describes core of the content. In the words of Cory Doctorow write your headlines as if you are a Wired service writer.

Vague Lead paragraph

Lead paragraph; the opening paragraph of your post must be explanatory of what follows. Many times, the lead paragraph is the part of the search result.

Avoid conjunctions and past participles

I attempt not to use any conjunction, connecting clauses or past participle tense. These make a sentence complicated to read.

Use simple words

I use simple, easy words in contrast to hard, heavy and huge words. It was so difficult to make the lawyer (inside me) understand that - "simple is better than complicated".

The one thing which is still difficult for me is - to let go. To accept the fact all of my posts will not be great/good.
There will be faults in them, which is fine.
Instead of putting one's effort to make a single piece better, I'd move on and work on other topics.

22 Aug 2017 3:18am GMT

Anwesha Das: The mistakes I did in my blog posts

Today we will be discussing the mistakes I did with my blog posts.
I started (seriously) writing blogs a year back. A few of my posts got a pretty nice response. The praise put me in seventh heaven. I thought I was a fairly good blogger.But after almost a year of writing, one day I chanced upon one of my older posts and reading it sent me crashing down to earth.

There was huge list of mistakes I made

The post was a perfect example of TLDR. I previously used to judge a post based on quantity. The larger the number of words, the better! (Typical lawyer mentality!)

The title and the lead paragraph were vague.

The sentences were long (far too long).

There were plenty grammatical mistakes.

I lost the flow of thought, broke the logical chain in many places.

The measures I took to solve my problem

I was upset. I stopped writing for a month or so.
After the depressed, dispirited phase was over, I got back up, dusted myself off and tried to find out ways to make be a better writer.

Talks, books, blogs:

I searched for talks, writings, books on "how to write good blog posts" and started reading, and watching videos. I tried to follow those while writing my posts.

Earlier I used to take a lot of time (a week) to write each post. I used to flit from sentence to new sentence. I used to do that so I do not forget the latest idea or next thought that popped into my head.
But that caused two major problems:

First, the long writing time also meant long breaks. The interval broke my chain of thought anyway. I had to start again from the beginning. That resulted in confusing views and non-related sentences.

Secondly, it also caused the huge length of the posts.

Now I dedicate limited time, a few hours, for each post, depending on the idea.
And I strictly adhere to those hours. I use Tomato Timer to keep a check on the time. During that time I do not go to my web browser, check my phone, do any household activity and of course, ignore my husband completely.
But one thing I am not being able to avoid is, "Mamma no working. Let's play" situation. :)
I focus on the sentence I am writing. I do not jump between sentences. I've made peace with the fear of losing one thought and I do not disturb the one I am working on. This keeps my ideas clear.

To finish my work within the stipulated time
- I write during quieter hours, especially in the morning, - I plan what to write the day before, - am caffeinated while writing

Sometimes I can not finish it in one go. Then before starting the next day I read what I wrote previously, aloud.

Revision:

Previously after I finished writing, I used to correct only the red underlines. Now I take time and follow four steps before publishing a post:

Respect the readers

This single piece of advice has changed my posts for better.
Respect the reader.
Don't give them any false hopes or expectations.

With that in mind, I have altered the following two things in my blog:

Vague titles

I always thought out of the box, and figured that sarcastic titles would showcase my intelligence. A off hand, humourous title is good. How utterly wrong I was.

People search by asking relevant question on the topic.
Like for hardware () project with esp8266 using micropython people may search with
- "esp8266 projects" - "projects with micropython" - "fun hardware projects" etc. But no one will search with "mybunny uncle" (it might remind you of your kindly uncle, but definitely not a hardware project in any sense of the term).

People find your blogs by RSS feed or searching in any search engine.
So be as direct as possible. Give a title that describes core of the content. In the words of Cory Doctorow write your headlines as if you are a Wired service writer.

Vague Lead paragraph

Lead paragraph; the opening paragraph of your post must be explanatory of what follows. Many times, the lead paragraph is the part of the search result.

Avoid conjunctions and past participles

I attempt not to use any conjunction, connecting clauses or past participle tense. These make a sentence complicated to read.

Use simple words

I use simple, easy words in contrast to hard, heavy and huge words. It was so difficult to make the lawyer (inside me) understand that - "simple is better than complicated".

The one thing which is still difficult for me is - to let go. To accept the fact all of my posts will not be great/good.
There will be faults in them, which is fine.
Instead of putting one's effort to make a single piece better, I'd move on and work on other topics.

22 Aug 2017 3:18am GMT

21 Aug 2017

feedDjango community aggregator: Community blog posts

CFE CLI

The Coding for Entrepreneurs C...

21 Aug 2017 9:13pm GMT

feedPlanet Python

Doug Hellmann: smtpd — Sample Mail Servers — PyMOTW 3

The smtpd module includes classes for building simple mail transport protocol servers. It is the server-side of the protocol used by smtplib . Read more… This post is part of the Python Module of the Week series for Python 3. See PyMOTW.com for more articles from the series.

21 Aug 2017 1:00pm GMT

Doug Hellmann: smtpd — Sample Mail Servers — PyMOTW 3

The smtpd module includes classes for building simple mail transport protocol servers. It is the server-side of the protocol used by smtplib . Read more… This post is part of the Python Module of the Week series for Python 3. See PyMOTW.com for more articles from the series.

21 Aug 2017 1:00pm GMT

Mike Driscoll: PyDev of the Week: Katherine Scott

This week we welcome Katherine Scott (@kscottz) as our PyDev of the Week! Katherine was was the lead developer of the SimpleCV computer vision library and co-author of the SimpleCV O'Reilly Book. You can check out Katherine's open source projects over on Github. Let's take a few moments to get to know her better!

Can you tell us a little about yourself (hobbies, education, etc):

A quick summary about me:

I am currently the image analytics team lead at Planet Labs. Planet is one of the largest satellite imaging companies in the world and my team helps take Planet's daily satellite imagery and turn into actionable information. We currently image the entire planet every day at ~3m resolution and not only do I get to see that data, but I also have the resources to apply my computer vision skills to our whole data set. On top of this I get to work stuff in space! It goes without saying that I absolutely love my job. I am also on the board of the Open Source Hardware Association and I help put together the Open Hardware Summit.

Prior to working at Planet i co-founded two success start-up Tempo Automation and SightMachine. Prior to founding those two start-ups I worked at really awesome research and development company called Cybernet Systems. While I was at Cybernet I did computer vision, augmented reality, and robotics research.

Education:
I graduated from the University of Michigan in 2005 with dual degrees in computer engineering and electrical engineering. To put myself through school I worked as a research assistant with a couple of really awesome labs where I did research on MEMS neural prosthetics and the RHex Robot (a cousin to the Big Dog robot you may be familiar with). In 2010 I decided to go back to school to get my masters degree at Columbia University. I majored in computer science with a focus on computer vision and robotics. It was at the tail end of grad school that I got bit by the start-up bug and helped start Sight Machine.

Hobbies:
My hobbies are currently constrained by my tiny apartment in San Francisco, but I like to build and make stuff (art, hardware, software, etc) in my spare time. I am also really into music so I go to a lot of live shows. As I've gotten older I've found that I need to exercise if I want to stay in front of a screen so I like to walk, bike, and do pilates. I am also the owner of three pet rats. I started keeping rats after working with them in the lab during college.

Why did you start using Python?

I was almost exclusively a C/C++/C# user for the first ten years I was an engineer. There was some Lua and Java mixed in here and there but I spent 90% of my time writing C++ from scratch. When I started at SightMachine I switched over to Python to help build a computer vision library called SimpleCV for the company. I fell in love almost immediately. Python allowed me to abstract away a lot of the compiler, linker, and memory management related tasks and focus more on computer vision algorithm development. The sheer volume of scientific and mathematical libraries was also a fantastic resource.

What other programming languages do you know and which is your favorite?
I've been a professional engineer now for twelve years so I have basically seen it all and done it all. I've done non-trivial projects in C, C++, C#, Java, Javascript and Python and I've dabbled using some of the more esoteric languages like lisp, lua, coffee script, and ocaml. Python is my favorite because the "batteries are included." With so many libraries and packages out there it is like having a super power, if I can dream it up I can code it.

What projects are you working on now?

My job keeps me very busy right now but it is super rewarding as I feel like we are giving everyone on Earth an ability to see the planet in real time. In April Planet released a Kaggle competition that focuses on detecting illegal mining and deforestation in the Amazon. More recently I just wrapped up working on my latest Pycon Talk and putting together the speaker list for Open Hardware Summit. With this stuff out of the way I starting a couple of new projects with some far left activist groups in the Bay Area. We are trying to put together an activist hack-a-thon where we develop tools for Bay Area non-profits. The project I am going to focus on specifically is a tool to systematically mine and analyze the advertising content of hate speech websites in an effort to defund them. These projects are still in the planning stage, but I am hoping to have them up and running by late summer.

Which Python libraries are your favorite (core or 3rd party)?

The whole scientific python community is amazing and I am a huge fan of Project Jupyter. Given my line of work I use OpenCV, Keras, Scikit, Pandas, and Numpy on a daily basis. Now that I am doing GIS work I have been exploring that space quite a bit. Right now I am getting really familiar with GeoPandas, Shapely, GDAL's python bindings, and libraries the provide interfaces to Open Street Maps just to name a few. I also want to give a big shout out to the Robot Operating System and the Open Source Robotics Foundation.

Is there anything else you'd like to say?

I have a lot of things I could say but most of them would become a rant. I will say I try to make myself available over the internet, particularly to younger engineers just learning their craft. If you have questions about my field or software engineering in general, don't hesitate to reach out.

Thanks for doing the interview!

21 Aug 2017 12:30pm GMT

Mike Driscoll: PyDev of the Week: Katherine Scott

This week we welcome Katherine Scott (@kscottz) as our PyDev of the Week! Katherine was was the lead developer of the SimpleCV computer vision library and co-author of the SimpleCV O'Reilly Book. You can check out Katherine's open source projects over on Github. Let's take a few moments to get to know her better!

Can you tell us a little about yourself (hobbies, education, etc):

A quick summary about me:

I am currently the image analytics team lead at Planet Labs. Planet is one of the largest satellite imaging companies in the world and my team helps take Planet's daily satellite imagery and turn into actionable information. We currently image the entire planet every day at ~3m resolution and not only do I get to see that data, but I also have the resources to apply my computer vision skills to our whole data set. On top of this I get to work stuff in space! It goes without saying that I absolutely love my job. I am also on the board of the Open Source Hardware Association and I help put together the Open Hardware Summit.

Prior to working at Planet i co-founded two success start-up Tempo Automation and SightMachine. Prior to founding those two start-ups I worked at really awesome research and development company called Cybernet Systems. While I was at Cybernet I did computer vision, augmented reality, and robotics research.

Education:
I graduated from the University of Michigan in 2005 with dual degrees in computer engineering and electrical engineering. To put myself through school I worked as a research assistant with a couple of really awesome labs where I did research on MEMS neural prosthetics and the RHex Robot (a cousin to the Big Dog robot you may be familiar with). In 2010 I decided to go back to school to get my masters degree at Columbia University. I majored in computer science with a focus on computer vision and robotics. It was at the tail end of grad school that I got bit by the start-up bug and helped start Sight Machine.

Hobbies:
My hobbies are currently constrained by my tiny apartment in San Francisco, but I like to build and make stuff (art, hardware, software, etc) in my spare time. I am also really into music so I go to a lot of live shows. As I've gotten older I've found that I need to exercise if I want to stay in front of a screen so I like to walk, bike, and do pilates. I am also the owner of three pet rats. I started keeping rats after working with them in the lab during college.

Why did you start using Python?

I was almost exclusively a C/C++/C# user for the first ten years I was an engineer. There was some Lua and Java mixed in here and there but I spent 90% of my time writing C++ from scratch. When I started at SightMachine I switched over to Python to help build a computer vision library called SimpleCV for the company. I fell in love almost immediately. Python allowed me to abstract away a lot of the compiler, linker, and memory management related tasks and focus more on computer vision algorithm development. The sheer volume of scientific and mathematical libraries was also a fantastic resource.

What other programming languages do you know and which is your favorite?
I've been a professional engineer now for twelve years so I have basically seen it all and done it all. I've done non-trivial projects in C, C++, C#, Java, Javascript and Python and I've dabbled using some of the more esoteric languages like lisp, lua, coffee script, and ocaml. Python is my favorite because the "batteries are included." With so many libraries and packages out there it is like having a super power, if I can dream it up I can code it.

What projects are you working on now?

My job keeps me very busy right now but it is super rewarding as I feel like we are giving everyone on Earth an ability to see the planet in real time. In April Planet released a Kaggle competition that focuses on detecting illegal mining and deforestation in the Amazon. More recently I just wrapped up working on my latest Pycon Talk and putting together the speaker list for Open Hardware Summit. With this stuff out of the way I starting a couple of new projects with some far left activist groups in the Bay Area. We are trying to put together an activist hack-a-thon where we develop tools for Bay Area non-profits. The project I am going to focus on specifically is a tool to systematically mine and analyze the advertising content of hate speech websites in an effort to defund them. These projects are still in the planning stage, but I am hoping to have them up and running by late summer.

Which Python libraries are your favorite (core or 3rd party)?

The whole scientific python community is amazing and I am a huge fan of Project Jupyter. Given my line of work I use OpenCV, Keras, Scikit, Pandas, and Numpy on a daily basis. Now that I am doing GIS work I have been exploring that space quite a bit. Right now I am getting really familiar with GeoPandas, Shapely, GDAL's python bindings, and libraries the provide interfaces to Open Street Maps just to name a few. I also want to give a big shout out to the Robot Operating System and the Open Source Robotics Foundation.

Is there anything else you'd like to say?

I have a lot of things I could say but most of them would become a rant. I will say I try to make myself available over the internet, particularly to younger engineers just learning their craft. If you have questions about my field or software engineering in general, don't hesitate to reach out.

Thanks for doing the interview!

21 Aug 2017 12:30pm GMT

feedDjango community aggregator: Community blog posts

How to Use Celery and RabbitMQ with Django

Celery is an asynchronous task queue based on distributed message passing. Task queues are used as a strategy to distribute the workload between threads/machines. In this tutorial I will explain how to install and setup Celery + RabbitMQ to execute asynchronous in a Django application.

To work with Celery, we also need to install RabbitMQ because Celery requires an external solution to send and receive messages. Those solutions are called message brokers. Currently, Celery supports RabbitMQ, Redis, and Amazon SQS as message broker solutions.

Table of Contents

Why Should I Use Celery?

Web applications works with request and response cycles. When the user access a certain URL of your application the Web browser send a request to your server. Django receive this request and do something with it. Usually it involves executing queries in the database, processing data. While Django does his thing and process the request, the user have to wait. When Django finalize its job processing the request, it sends back a response to the user who finally will see something.

Ideally this request and response cycle should be fast, otherwise we would leave the user waiting for way too long. And even worse, our Web server can only serve a certain number of users at a time. So, if this process is slow, it can limit the amount of pages your application can serve at a time.

For the most part we can work around this issue using cache, optimizing database queries, and so on. But there are some cases that theres no other option: the heavy work have to be done. A report page, export of big amount of data, video/image processing are a few examples of cases where you may want to use Celery.

We don't use Celery through the whole project, but only for specific tasks that are time-consuming. The idea here is to respond to the user as quick as possible, and pass the time-consuming tasks to the queue so to be executed in the background, and always keep the server ready to respond to new requests.


Installation

The easiest way to install Celery is using pip:

pip install Celery

Now we have to install RabbitMQ.

Installing RabbitMQ on Ubuntu 16.04

To install it on a newer Ubuntu version is very straightforward:

apt-get install -y erlang
apt-get install rabbitmq-server

Then enable and start the RabbitMQ service:

systemctl enable rabbitmq-server
systemctl start rabbitmq-server

Check the status to make sure everything is running smooth:

systemctl status rabbitmq-server
Installing RabbitMQ on Mac

Homebrew is the most straightforward option:

brew install rabbitmq

The RabbitMQ scripts are installed into /usr/local/sbin. You can add it to your .bash_profile or .profile.

vim ~/.bash_profile

Then add it to the bottom of the file:

export PATH=$PATH:/usr/local/sbin

Restart the terminal to make sure the changes are in effect.

Now you can start the RabbitMQ server using the following command:

rabbitmq-server

rabbitmq-server

Installing RabbitMQ on Windows and Other OSs

Unfortunately I don't have access to a Windows computer to try things out, but you can find the installation guide for Windows on RabbitMQ's Website.

For other operating systems, check the Downloading and Installing RabbitMQ on their Website.


Celery Basic Setup

First, consider the following Django project named mysite with an app named core:

mysite/
 |-- mysite/
 |    |-- core/
 |    |    |-- migrations/
 |    |    |-- templates/
 |    |    |-- apps.py
 |    |    |-- models.py
 |    |    +-- views.py
 |    |-- templates/
 |    |-- __init__.py
 |    |-- settings.py
 |    |-- urls.py
 |    +-- wsgi.py
 |-- manage.py
 +-- requirements.txt

Add the CELERY_BROKER_URL configuration to the settings.py file:

settings.py

CELERY_BROKER_URL = 'amqp://localhost'

Alongside with the settings.py and urls.py files, let's create a new file named celery.py.

celery.py

import os
from celery import Celery

os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'mysite.settings')

app = Celery('mysite')
app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks()

Now edit the __init__.py file in the project root:

__init__.py

from .celery import app as celery_app

__all__ = ['celery_app']

This will make sure our Celery app is important every time Django starts.


Creating Our First Celery Task

We can create a file named tasks.py inside a Django app and put all our Celery tasks into this file. The Celery app we created in the project root will collect all tasks defined across all Django apps listed in the INSTALLED_APPS configuration.

Just for testing purpose, let's create a Celery task that generates a number of random User accounts.

core/tasks.py

import string

from django.contrib.auth.models import User
from django.utils.crypto import get_random_string

from celery import shared_task

@shared_task
def create_random_user_accounts(total):
    for i in range(total):
        username = 'user_{}'.format(get_random_string(10, string.ascii_letters))
        email = '{}@example.com'.format(username)
        password = get_random_string(50)
        User.objects.create_user(username=username, email=email, password=password)
    return '{} random users created with success!'.format(total)

The important bits here are:

from celery import shared_task

@shared_task
def name_of_your_function(optional_param):
    pass  # do something heavy

Then I defined a form and a view to process my Celery task:

forms.py

from django import forms
from django.core.validators import MinValueValidator, MaxValueValidator

class GenerateRandomUserForm(forms.Form):
    total = forms.IntegerField(
        validators=[
            MinValueValidator(50),
            MaxValueValidator(500)
        ]
    )

This form expects a positive integer field between 50 and 500. It looks like this:

Generate random users form

Then my view:

views.py

from django.contrib.auth.models import User
from django.contrib import messages
from django.views.generic.edit import FormView
from django.shortcuts import redirect

from .forms import GenerateRandomUserForm
from .tasks import create_random_user_accounts

class GenerateRandomUserView(FormView):
    template_name = 'core/generate_random_users.html'
    form_class = GenerateRandomUserForm

    def form_valid(self, form):
        total = form.cleaned_data.get('total')
        create_random_user_accounts.delay(total)
        messages.success(self.request, 'We are generating your random users! Wait a moment and refresh this page.')
        return redirect('users_list')

The important bit is here:

create_random_user_accounts.delay(total)

Instead of calling the create_random_user_accounts directly, I'm calling create_random_user_accounts.delay(). This way we are instructing Celery to execute this function in the background.

Then Django keep processing my view GenerateRandomUserView and returns smoothly to the user.

But before you try it, check the next section to learn how to start the Celery worker process.


Starting The Worker Process

Open a new terminal tab, and run the following command:

celery -A mysite worker -l info

Change mysite to the name of your project. The result is something like this:

Celery Worker

Now we can test it. I submitted 500 in my form to create 500 random users.

The response is immediate:

Random

Meanwhile, checking the Celery Worker Process:

[2017-08-20 19:11:17,485: INFO/MainProcess] Received task:
mysite.core.tasks.create_random_user_accounts[8799cfbd-deae-41aa-afac-95ed4cc859b0]

Then after a few seconds, if we refresh the page, the users are there:

Random

If we check the Celery Worker Process again, we can see it completed the execution:

[2017-08-20 19:11:45,721: INFO/ForkPoolWorker-2] Task
mysite.core.tasks.create_random_user_accounts[8799cfbd-deae-41aa-afac-95ed4cc859b0] succeeded in
28.225658523035236s: '500 random users created with success!'

Managing The Worker Process in Production with Supervisord

If you are deploying your application to a VPS like DigitalOcean you will want to run the worker process in the background. In my tutorials I like to use Supervisord to manage the Gunicorn workers, so it's usually a nice fit with Celery.

First install it (on Ubuntu):

sudo apt-get install supervisor

Then create a file named mysite-celery.conf in the folder: /etc/supervisor/conf.d/mysite-celery.conf:

[program:mysite-celery]
command=/home/mysite/bin/celery worker -A web --loglevel=INFO
directory=/home/mysite/mysite
user=nobody
numprocs=1
stdout_logfile=/home/mysite/logs/celery.log
stderr_logfile=/home/mysite/logs/celery.log
autostart=true
autorestart=true
startsecs=10

; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600

stopasgroup=true

; Set Celery priority higher than default (999)
; so, if rabbitmq is supervised, it will start first.
priority=1000

In the example below, I'm considering my Django project is inside a virtual environment. The path to my virtual environment is /home/mysite/.

Now reread the configuration and add the new process:

sudo supervisorctl reread
sudo supervisorctl update

If you are not familiar with deploying Django to a production server and working with Supervisord, maybe this part will make more sense if you check this post from the blog: How to Deploy a Django Application to Digital Ocean.


Further Reading

Those are the basic steps. I hope this helped you to get started with Celery. I will leave here a few useful references to keep learning about Celery:

And as usual, the code examples used in this tutorial is available on GitHub:

github.com/sibtc/django-celery-example


Referral Link

If you want to try this setup in a Ubuntu cloud server, you can use this referral link to get a $10 free credit from Digital Ocean.

21 Aug 2017 12:54am GMT

20 Aug 2017

feedDjango community aggregator: Community blog posts

Monorepo structure for Django & React Applications

Hello! Today I will guide you through setting up React application with Django! Let's get started!

First thing is where I place my javascript application? Should it be in another repository? Or maybe Django should use webpack to render js?

I decided to use pattern called monorepo. What does it …

20 Aug 2017 1:00pm GMT

17 Aug 2017

feedPlanet Twisted

Duncan McGreggor: NASA/EOSDIS Earthdata

Update

It's been a few years since I posted on this blog -- most of the technical content I've been contributing to in the past couple years has been in the following:

But since the publication of the Mastering matplotlib book, I've gotten more and more into satellite data. The book, it goes without saying, focused on Python for the analysis and interpretation of satellite data (in one of the many topics covered). After that I spent some time working with satellite and GIS data in general using Erlang and LFE. Ultimately though, I found that more and more projects were using the JVM for this sort of work, and in particular, I noted that Clojure had begun to show up in a surprising number of Github projects.

EOSDIS

Enter NASA's Earth Observing System Data and Information System (see also earthdata.nasa.gov and EOSDIS on Wikipedia), a key part of the agency's Earth Science Data Systems Program. It's essentially a concerted effort to bring together the mind-blowing amounts of earth-related data being collected throughout, around, and above the world so that scientists may easily access and correlate earth science data for their research.

Related NASA projects include the following:

The acronym menagerie can be bewildering, but digging into the various NASA projects is ultimately quite rewarding (greater insights, previously unknown resources, amazing research, etc.).

Clojure

Back to the Clojure reference I made above: I've been contributing to the nasa/Common-Metadata-Repository open source project (hosted on Github) for a few months now, and it's been amazing to see how all this data from so many different sources gets added, indexed, updated, and generally made so much more available to any who want to work with it. The private sector always seems to be so far ahead of large projects in terms of tech and continuously improving updates to existing software, so its been pretty cool to see a large open source project in the NASA Github org make so many changes that find ways to keep helping their users do better research. More so that users are regularly delivered new features in a large, complex collection of libraries and services thanks in part to the benefits that come from using a functional programming language.

It may seem like nothing to you, but the fact that there are now directory pages for various data providers (e.g., GES_DISC, i.e., Goddard Earth Sciences Data and Information Services Center) makes a big difference for users of this data. The data provider pages now also offer easy access to collection links such as UARS Solar Ultraviolet Spectral Irradiance Monitor. Admittedly, the directory pages still take a while to load, but there are improvements on the way for page load times and other related tasks. If you're reading this a month after this post was written, there's a good chance it's already been fixed by now.

Summary

In summary, it's been a fun personal journey from looking at Landsat data for writing a book to working with open source projects that really help scientists to do their jobs better :-) And while I have enjoyed using the other programming languages to explore this problem space, Clojure in particular has been a delightfully powerful tool for delivering new features to the science community.

17 Aug 2017 2:05pm GMT

16 Aug 2017

feedPlanet Twisted

Itamar Turner-Trauring: The tragic tale of the deadlocking Python queue

This is a story about how very difficult it is to build concurrent programs. It's also a story about a bug in Python's Queue class, a class which happens to be the easiest way to make concurrency simple in Python. This is not a happy story: this is a tragedy, a story of deadlocks and despair.

This story will take you on a veritable roller coaster of emotion and elucidation, as you:

Join me, then, as I share this tale of woe.

Concurrency is hard

Writing programs with concurrency, programs with multiple threads, is hard. Without threads code is linear: line 2 is executed after line 1, with nothing happening in between. Add in threads, and now changes can happen behind your back.

Race conditions

The following counter, for example, will become corrupted if increment() is called from multiple threads:

from threading import Thread

class Counter(object):
    def __init__(self):
        self.value = 0
    def increment(self):
        self.value += 1

c = Counter()

def go():
    for i in range(1000000):
        c.increment()

# Run two threads that increment the counter:
t1 = Thread(target=go)
t1.start()
t2 = Thread(target=go)
t2.start()
t1.join()
t2.join()
print(c.value)

Run the program, and:

$ python3 racecondition.py
1686797

We incremented 2,000,000 times, but that's not what we got. The problem is that self.value += 1 actually takes three distinct steps:

  1. Getting the attribute,
  2. incrementing it,
  3. then setting the attribute.

If two threads call increment() on the same object around the same time, the following series steps may happen:

  1. Thread 1: Get self.value, which happens to be 17.
  2. Thread 2: Get self.value, which happens to be 17.
  3. Thread 1: Increment 17 to 18.
  4. Thread 1: Set self.value to 18.
  5. Thread 2: Increment 17 to 18.
  6. Thread 1: Set self.value to 18.

An increment was lost due to a race condition.

One way to solve this is with locks:

from threading import Lock

class Counter(object):
    def __init__(self):
        self.value = 0
        self.lock = Lock()
    def increment(self):
        with self.lock:
            self.value += 1

Only one thread at a time can hold the lock, so only one increment happens at a time.

Deadlocks

Locks introduce their own set of problems. For example, you start having potential issues with deadlocks. Imagine you have two locks, L1 and L2, and one thread tries to acquire L1 followed by L2, whereas another thread tries to acquire L2 followed by L1.

  1. Thread 1: Acquire and hold L1.
  2. Thread 2: Acquire and hold L2.
  3. Thread 1: Try to acquire L2, but it's in use, so wait.
  4. Thread 2: Try to acquire L1, but it's in use, so wait.

The threads are now deadlocked: no execution will proceed.

Queues make concurrency simpler

One way to make concurrency simpler is by using queues, and trying to have no other shared data structures. If threads can only send messages to other threads using queues, and threads never mutate data structures shared with other threads, the result is code that is much closer to single-threaded code. Each function just runs one line at a time, and you don't need to worry about some other thread interrupting you.

For example, we can have a single thread whose job it is to manage a collection of counters:

from collections import defaultdict
from threading import Thread
from queue import Queue

class Counter(object):
    def __init__(self):
        self.value = 0
    def increment(self):
        self.value += 1


counter_queue = Queue()


def counters_thread():
    counters = defaultdict(Counter)
    while True:
        # Get next command out the queue:
        command, name = counter_queue.get()
        if command == "increment":
            counters[name].increment()

# Start a new thread:
Thread(target=counters_thread).start()

Now other threads can safely increment a named counter by doing:

counter_queue.put(("increment", "shared_counter_1"))

A buggy program

Unfortunately, queues have some broken edge cases. Consider the following program, a program which involves no threads at all:

from queue import Queue

q = Queue()


class Circular(object):
    def __init__(self):
        self.circular = self

    def __del__(self):
        print("Adding to queue in GC")
        q.put(1)


for i in range(1000000000):
    print("iteration", i)
    # Create an object that will be garbage collected
    # asynchronously, and therefore have its __del__
    # method called later:
    Circular()
    print("Adding to queue regularly")
    q.put(2)

What I'm doing here is a little trickery involving a circular reference, in order to add an item to the queue during garbage collection.

By default CPython (the default Python VM) uses reference counting to garbage collect objects. When an object is created the count is incremented, when a reference is removed the count is decremented. When the reference count hits zero the object is removed from memory and __del__ is called on it.

However, an object with a reference to itself-like the Circular class above-will always have a reference count of at least 1. So Python also runs a garbage collection pass every once in a while that catches these objects. By using a circular reference we are causing Circular.__del__ to be called asynchronously (eventually), rather than immediately.

Let's run the program:

$ python3 bug.py 
iteration 0
Adding to queue regularly
Adding to queue in GC

That's it: the program continues to run, but prints out nothing more. There are no further iterations, no progress.

What's going on?

Debugging a deadlock with gdb

Modern versions of the gdb debugger have some neat Python-specific features, including ability to print out a Python traceback. Setup is a little annoying, see here and here and maybe do a bit of googling, but once you have it setup it's extremely useful.

Let's see what gdb tells us about this process. First we attach to the running process, and then use the bt command to see the C backtrace:

$ ps x | grep bug.py
28920 pts/4    S+     0:00 python3 bug.py
$ gdb --pid 28920
...
(gdb) bt
#0  0x00007f756c6d0946 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x6464e96e00) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x6464e96e00, abstime=0x0) at sem_waitcommon.c:111
#2  0x00007f756c6d09f4 in __new_sem_wait_slow (sem=0x6464e96e00, abstime=0x0) at sem_waitcommon.c:181
#3  0x00007f756c6d0a9a in __new_sem_wait (sem=<optimized out>) at sem_wait.c:29
#4  0x00007f756ca7cbd5 in PyThread_acquire_lock_timed () at /usr/src/debug/Python-3.5.3/Python/thread_pthread.h:352
...

Looks like the process is waiting for a lock. I wonder why?

Next, we take a look at the Python backtrace:

(gdb) py-bt
Traceback (most recent call first):
  <built-in method __enter__ of _thread.lock object at remote 0x7f756cef36e8>
  File "/usr/lib64/python3.5/threading.py", line 238, in __enter__
    return self._lock.__enter__()
  File "/usr/lib64/python3.5/queue.py", line 126, in put
    with self.not_full:
  File "bug.py", line 12, in __del__
    q.put(1)
  Garbage-collecting
  File "/usr/lib64/python3.5/threading.py", line 345, in notify
    waiters_to_notify = _deque(_islice(all_waiters, n))
  File "/usr/lib64/python3.5/queue.py", line 145, in put
    self.not_empty.notify()
  File "bug.py", line 21, in <module>
    q.put(2)

Do you see what's going on?

Reentrancy!

Remember when I said that, lacking concurrency, code just runs one line at a time? That was a lie.

Garbage collection can interrupt Python functions at any point, and run arbitrary other Python code: __del__ methods and weakref callbacks. So can signal handlers, which happen e.g. when you hit Ctrl-C (your process gets the SIGINT signal) or a subprocess dies (your process gets the SIGCHLD signal).

In this case:

  1. The program was calling q.put(2).
  2. This involves acquiring a lock.
  3. Half-way through the function call, garbage collection happens.
  4. Garbage collection calls Circular.__del__.
  5. Circular.__del__ calls q.put(1).
  6. q.put(1) trys to acquire the lock… but the lock is already held, so it waits.

Now q.put(2) is stuck waiting for garbage collection to finish, and garbage collection can't finish until q.put(2) releases the lock.

The program is deadlocked.

Why this is a real bug…

The above scenario may seem a little far-fetched, but it has been encountered by multiple people in the real world. A common cause is logging.

If you're writing logs to disk you have to worry about the disk write blocking, i.e. taking a long time. This is particularly the case when log writes are followed by syncing-to-disk, which is often done to ensure logs aren't lost in a crash.

A common pattern is to create log messages in your application thread or threads, and do the actual writing to disk in a different thread. The easiest way to communicate the messages is, of course, a queue.Queue.

This use case is in fact directly supported by the Python standard library:

from queue import Queue
import logging
from logging.handlers import QueueListener, QueueHandler

# Write out queued logs to a file:
_log_queue = Queue()
QueueListener(
    _log_queue, logging.FileHandler("out.log")).start()

# Push all logs into the queue:
logging.getLogger().addHandler(QueueHandler(_log_queue))

Given this common setup, all you need to do to trigger the bug is to log a message in __del__, a weakref callback, or a signal handler. This happens in real code. For example, if you don't explicitly close a file, Python will warn you about it inside file.__del__, and Python also has a standard API for routing warnings to the logging system.

It's not just logging, though: the bug was also encountered among others by the SQLAlchemy ORM.

…and why Python maintainers haven't fixed it

This bug was originally reported in 2012, and in 2016 it was closed as "wont fix" because it's a "difficult problem".

I feel this is a cop-out. If you're using an extremely common logging pattern, where writes happen in a different thread, a logging pattern explicitly supported by the Python standard library… your program might deadlock. In particular, it will deadlock if any of the libraries you're using writes a log message in __del__.

This can happen just by using standard Python APIs like files and warning→logging routing. This happened to one of the users of my Crochet library, due to some logging in __del__ by the Twisted framework. I had to implement my own queuing system to ensure users weren't impacted by this problem. If I can fix the problem, so can the Python maintainers. For example, Queue.get and Queue.put could be atomic operations (which can be done in CPython by rewriting them in C).

Now, you could argue that __del__ shouldn't do anything: it should schedule stuff that is run outside it. But scheduling from reentrant code is tricky, and in fact not that different from mutating a shared data structure from multiple threads. If only there was a queue of some sort that we could call from __del__… but there isn't, because of this bug.

Some takeaways

  1. Concurrency is hard to deal with, but queue.Queue helps.
  2. Reentrancy is hard to deal with, and Python helps you a lot less.
  3. If you're using queue.Queue on Python, beware of interacting with the queue in __del__, weakref callbacks, or signal handlers.

And by the way, if you enjoyed reading this and would like to hear about all the many ways I've screwed up my own software, sign up for my Software Clown newsletter. Every week I share one of my mistakes and how you can avoid it.

Update: Thanks to Maciej Fijalkowski for suggesting actually demonstrating the race condition, and pointing out that __del__ probably really shouldn't do anything. Thanks to Ann Yanich for pointing out a typo in the code.

16 Aug 2017 4:00am GMT

10 Aug 2017

feedPlanet Twisted

Duncan McGreggor: Mastering matplotlib: Acknowledgments

The Book

Well, after nine months of hard work, the book is finally out! It's available both on Packt's site and Amazon.com. Getting up early every morning to write takes a lot of discipline, it takes even more to say "no" to enticing rabbit holes or herds of Yak with luxurious coats ripe for shaving ... (truth be told, I still did a bit of that).

The team I worked with at Packt was just amazing. Highly professional and deeply supportive, they were a complete pleasure with which to collaborate. It was the best experience I could have hoped for. Thanks, guys!

The technical reviewers for the book were just fantastic. I've stated elsewhere that my one regret was that the process with the reviewers did not have a tighter feedback loop. I would have really enjoyed collaborating with them from the beginning so that some of their really good ideas could have been integrated into the book. Regardless, their feedback as I got it later in the process helped make this book more approachable by readers, more consistent, and more accurate. The reviewers have bios at the beginning of the book -- read them, and look them up! These folks are all amazing!

The one thing that slipped in the final crunch was the acknowledgements, and I hope to make up for that here, as well as through various emails to everyone who provided their support, either directly or indirectly.

Acknowledgments

The first two folks I reached out to when starting the book were both physics professors who had published very nice matplotlib problems -- one set for undergraduate students and another from work at the National Radio Astronomy Observatory. I asked for their permission to adapt these problems to the API chapter, and they graciously granted it. What followed were some very nice conversations about matplotlib, programming, physics, education, and publishing. Thanks to Professor Alan DeWeerd, University of Redlands and Professor Jonathan W. Keohane, Hampden Sydney College. Note that Dr. Keohane has a book coming out in the fall from Yale University Press entitled Classical Electrodynamics -- it will contain examples in matplotlib.

Other examples adapted for use in the API chapter included one by Professor David Bailey, University of Toronto. Though his example didn't make it into the book, it gets full coverage in the Chapter 3 IPython notebook.

For one of the EM examples I needed to derive a particular equation for an electromagnetic field in two wires traveling in opposite directions. It's been nearly 20 years since my post-Army college physics, so I was very grateful for the existence and excellence of SymPy which enabled me to check my work with its symbolic computations. A special thanks to the SymPy creators and maintainers.

Please note that if there are errors in the equations, they are my fault! Not that of the esteemed professors or of SymPy :-)

Many of the examples throughout the book were derived from work done by the matplotlib and Seaborn contributors. The work they have done on the documentation in the past 10 years has been amazing -- the community is truly lucky to have such resources at their fingertips.

In particular, Benjamin Root is an astounding community supporter on the matplotlib mail list, helping users of every level with all of their needs. Benjamin and I had several very nice email exchanges during the writing of this book, and he provided some excellent pointers, as he was finishing his own title for Packt: Interactive Applications Using Matplotlib. It was geophysicist and matplotlib savant Joe Kington who originally put us in touch, and I'd like to thank Joe -- on everyone's behalf -- for his amazing answers to matplotlib and related questions on StackOverflow. Joe inspired many changes and adjustments in the sample code for this book. In fact, I had originally intended to feature his work in the chapter on advanced customization (but ran out of space), since Joe has one of the best examples out there for matplotlib transforms. If you don't believe me, check out his work on stereonets. There are many of us who hope that Joe will be authoring his own matplotlib book in the future ...

Olga Botvinnik, a contributor to Seaborn and PhD candidate at UC San Diego (and BioEng/Math double major at MIT), provided fantastic support for my Seaborn questions. Her knowledge, skills, and spirit of open source will help build the community around Seaborn in the years to come. Thanks, Olga!

While on the topic of matplotlib contributors, I'd like to give a special thanks to John Hunter for his inspiration, hard work, and passionate contributions which made matplotlib a reality. My deepest condolences to his family and friends for their tremendous loss.

Quite possibly the tool that had the single-greatest impact on the authoring of this book was IPython and its notebook feature. This brought back all the best memories from using Mathematica in school. Combined with the Python programming language, I can't imagine a better platform for collaborating on math-related problems or producing teaching materials for the same. These compliments are not limited to the user experience, either: the new architecture using ZeroMQ is a work of art. Nicely done, IPython community! The IPython notebook index for the book is available in the book's Github org here.

In Chapters 7 and 8 I encountered a bit of a crisis when trying to work with Python 3 in cloud environments. What was almost a disaster ended up being rescued by the work that Barry Warsaw and the rest of the Ubuntu team did in Ubuntu 15.04, getting Python 3.4.2 into the release and available on Amazon EC2. You guys saved my bacon!

Chapter 7's fictional case study examining the Landsat 8 data for part of Greenland was based on one of Milos Miljkovic's tutorials from PyData 2014, "Analyzing Satellite Images With Python Scientific Stack". I hope readers have just as much fun working with satellite data as I did. Huge thanks to NASA, USGS, the Landsat 8 teams, and the EROS facility in Sioux Falls, SD.

My favourite section in Chapter 8 was the one on HDF5. This was greatly inspired by Yves Hilpisch's presentation "Out-of-Memory Data Analytics with Python". Many thanks to Yves for putting that together and sharing with the world. We should all be doing more with HDF5.

Finally, and this almost goes without saying, the work that the Python community has done to create Python 3 has been just phenomenal. Guido's vision for the evolution of the language, combined with the efforts of the community, have made something great. I had more fun working on Python 3 than I have had in many years.

10 Aug 2017 4:12am GMT

11 Feb 2016

feedPlanet TurboGears

Christpher Arndt: Organix Roland JX-3P MIDI Expansion Kit

Foreign visitors: to download the Novation Remote SL template for the Roland JX-3P with the Organix MIDI Upgrade, see the link at the bottom of this post. Zu meinem letzten Geburtstag habe ich mir selbst einen Roland JX-3P geschenkt, inklusive einem DT200-Programmer (ein PG-200 Klon). Der JX-3P ist ein 6-stimmiger analoger Polysynth von 1983 und […]

11 Feb 2016 8:42pm GMT

13 Jan 2016

feedPlanet TurboGears

Christpher Arndt: Anmeldung für das PythonCamp 2016 ab Freitag, 15.1.2016

PythonCamp 2016 Kostenloser Wissensaustausch rund um Python (The following is an announcement for a Python "Un-Conference" in Cologne, Germany and therefor directed at a German-speaking audience.) Liebe Python-Fans, es ist wieder soweit: Am Freitag, den 15. Januar öffnen wir die Online-Anmeldung für Teilnehmer des PythonCamps 2016! Die nunmehr siebte Ausgabe des PythonCamps wird erneut durch […]

13 Jan 2016 3:00pm GMT

02 Nov 2015

feedPlanet TurboGears

Matthew Wilson: Mary Dunbar is the best candidate for Cleveland Heights Council

I'll vote for Mary Dunbar tomorrow in the Cleveland Heights election.

Here's why:

02 Nov 2015 5:14pm GMT

03 Aug 2012

feedPySoy Blog

Juhani Åhman: YA Update

Managed to partyally fix the shading rendering issues with the examples.I reckon the rest of rendering issues are opengl ES related, and not something in libsoy side.
I don't know opengl (ES) very well, so i didn't attempt to fix any further.

I finished implementing a rudimentary pointer controller in pysoy's Client.
There is a pointer.py example program for testing it. Unfortunately it keeps crashing once in a while.
I reckon the problem is something with soy.atoms.Position. Regardless, the pointer controller works.

I started to work on getting keyboard controller to work too, and of course mouse buttons for the pointer,
but I got stuck when writing the python bindings for Genie's events (signals). There's no connect method in pysoy, so maybe that needs to implemented, or then some other solution. I will look into this later.

Plan for this week is to finish documenting bodies, scenes and widgets. I'm about 50% done, and it should be straightforward. Next week I'm finally going to attempt to set up Sphinx and generate readable documentation. I reckon I need to refactor many of the docstrings as well.

03 Aug 2012 12:27pm GMT

10 Jul 2012

feedPySoy Blog

Mayank Singh: Mid-term and dualshock 3

Now that SoC mid-term has arrived, here's bit of update about what I have done till now. The wiimote xinput driver IR update is almost done. Though just like it can said about any piece of software, it's never fully complete.
I also corrected the code for Sphere in the libsoy repository to render an actual sphere.
For now I have started up on an integration of dualshock3 controller. I am currently studying the code given here: http://www.pabr.org/sixlinux/sixlinux.en.html and trying to understand how the dualshock works. I also need to write a controller class to be able to grab and move objects around without the help from the physics engine.

10 Jul 2012 3:00pm GMT

04 Jul 2012

feedPySoy Blog

Juhani Åhman: Weeks 5-7 update

I've have mostly finished writing unit tests for atoms now.
I didn't write tests for Morphs tough, since that seem to be still in WIP.
However, I did encounter a rare memory corruption bug that I'm unable to fix at this point,
because I don't know how to debug it properly.
I can't find the location where the error occurrs.

I'm going to spend rest of this week writing doctests and hopefully getting more examples to work.

04 Jul 2012 9:04am GMT

10 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: King Willams Town Bahnhof

Gestern musste ich morgens zur Station nach KWT um unsere Rerservierten Bustickets für die Weihnachtsferien in Capetown abzuholen. Der Bahnhof selber ist seit Dezember aus kostengründen ohne Zugverbindung - aber Translux und co - die langdistanzbusse haben dort ihre Büros.


Größere Kartenansicht




© benste CC NC SA

10 Nov 2011 10:57am GMT

09 Nov 2011

feedPlanet Plone

Andreas Jung: Produce & Publish Plone Client Connector released as open-source

09 Nov 2011 9:30pm GMT

feedPython Software Foundation | GSoC'11 Students

Benedict Stein

Niemand ist besorgt um so was - mit dem Auto fährt man einfach durch, und in der City - nahe Gnobie- "ne das ist erst gefährlich wenn die Feuerwehr da ist" - 30min später auf dem Rückweg war die Feuerwehr da.




© benste CC NC SA

09 Nov 2011 8:25pm GMT

feedPlanet Plone

ACLARK.NET, LLC: Plone secrets: Episode 4 – Varnish in front

This just in from the production department: use Varnish. (And please forgive the heavily meme-laden approach to describing these techniques :-) .)

Cache ALL the hosts

Our ability to use Varnish in production is no secret by now, or at least it shouldn't be. What is often less clear is exactly how to use it. One way I like[1], is to run Varnish on your public IP port 80 and make Apache listen on your private IP port 80. Then proxy from Varnish to Apache and enjoy easy caching goodness on all your virtual hosts in Apache.

Configuration

This should require less than five minutes of down time to implement. First, configure the appropriate settings. (Well, first install Apache and Varnish if you haven't already: `aptitude install varnish apache2` on Ubuntu Linux[0].)

Varnish

To modify the listen IP address and port, we typically edit a file like /etc/default/varnish (in Ubuntu). However you do it, configure the equivalent of the following on your system:

DAEMON_OPTS="-a 174.143.252.11:80 \
             -T localhost:6082 \
             -f /etc/varnish/default.vcl \
             -s malloc,256m"

This environment variable is then passed to varnishd on the command line. Next, pass traffic to Apache like so (in /etc/varnish/default.vcl on Ubuntu):

backend default {
 .host = "127.0.0.1";
 .port = "80";
 }

Now on to Apache.

Please note that the syntax above is for Varnish 3.x and the syntax has (annoyingly) changed from 2.x to 3.x.

Apache

The Apache part is a bit simpler. You just need to change the listen port (on Ubuntu this is done in /etc/apache2/ports.conf), typically from something like:

Listen *:80

to:

Listen 127.0.0.1:80

Restart ALL the services

Now restart both services. If all goes well you shouldn't notice any difference, except better performance, and when you make a website change and need to clear the cache[2]. For this, I rely on telnetting to the varnish port and issuing the `ban.url` command (formerly `url.purge` in 2.x):

$ telnet localhost 6082
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
200 205     
-----------------------------
Varnish Cache CLI 1.0
-----------------------------
Linux,2.6.35.4-rscloud,x86_64,-smalloc,-smalloc,-hcritbit

Type 'help' for command list.
Type 'quit' to close CLI session.

ban.url /
200 0

Cache ALL the disks

This site has Varnish and Apache configured as described in this article. It also has disk caching in Apache enabled, thanks to Elizabeth Leddy's article:

As a result, it's PEPPY AS THE DICKENS™ on a 512MB "slice" (Cloud server) from Rackspace Cloud. And now you know yet another "Plone secret". Now go make your Plone sites faster, and let me know how it goes in the comments section below.

Notes

[0] Using the latest distribution, "oneric".

[1] I first saw this technique at NASA when NASA Science was powered by Plone; I found it odd at the time but years later it makes perfect sense.

[2] Ideally you'd configure this in p.a.caching, but I've not been able to stomach this yet.


09 Nov 2011 5:50pm GMT

feedPlanet Zope.org

Updated MiniPlanet, now with meta-feed

My MiniPlanet Zope product has been working steady and stable for some years, when suddenly a user request came along. Would it be possible to get a feed of all the items in a miniplanet? With this update it became possible. MiniPlanet is an old-styl...

09 Nov 2011 9:41am GMT

08 Nov 2011

feedPlanet Plone

Max M: How to export all redirects from portal_redirection in an older Plone site

Just add the method below to the RedirectionTool and call it from the browser as:

http://localhost:8080/site/portal_redirection/getAllRedirects

Assuming that the site is running at loaclhost:8080 that is :-S

That will show a list of redirects that can be imported into plone 4.x


security.declareProtected(View, 'getAllRedirects')
def getAllRedirects(self):
"get'm'all"
result = []
reference_tool = getToolByName(self, 'reference_catalog')
for k,uuid in self._redirectionmap.items():
obj = reference_tool.lookupObject(uuid)
if obj is None:
print 'could not find redirect from: %s to %s' % (k, uuid)
else:
path = '/'.join(('',)+obj.getPhysicalPath()[2:])
result.append( '%s,%s' % (k,path) )
return '\n'.join(result)

08 Nov 2011 2:58pm GMT

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Brai Party

Brai = Grillabend o.ä.

Die möchte gern Techniker beim Flicken ihrer SpeakOn / Klinke Stecker Verzweigungen...

Die Damen "Mamas" der Siedlung bei der offiziellen Eröffnungsrede

Auch wenn weniger Leute da waren als erwartet, Laute Musik und viele Leute ...

Und natürlich ein Feuer mit echtem Holz zum Grillen.

© benste CC NC SA

08 Nov 2011 2:30pm GMT

07 Nov 2011

feedPlanet Zope.org

Welcome to Betabug Sirius

It has been quite some time that I announced_ that I'd be working as a freelancer. Lots of stuff had to be done in that time, but finally things are ready. I've founded my own little company and set up a small website: Welcome to Betabug Sirius!

07 Nov 2011 9:26am GMT

03 Nov 2011

feedPlanet Zope.org

Assertion helper for zope.testbrowser and unittest

zope.testbrowser is a valuable tool for integration tests. Historically, the Zope community used to write quite a lot of doctests, but we at gocept have found them to be rather clumsy and too often yielding neither good tests nor good documentation. That's why we don't use doctest much anymore, and prefer plain unittest.TestCases instead. However, doctest has one very nice feature, ellipsis matching, that is really helpful for checking HTML output, since you can only make assertions about the parts that interest you. For example, given this kind of page:

>>> print browser.contents
<html>
  <head>
    <title>Simple Page</title>
  </head>
  <body>
    <h1>Simple Page</h1>
  </body>
</html>

If all you're interested in is that the <h1> is rendered properly, you can simply say:

>>> print browser.contents
<...<h1>Simple Page</h1>...

We've now ported this functionality to unittest, as assertEllipsis, in gocept.testing. Some examples:

self.assertEllipsis('...bar...', 'foo bar qux')
# -> nothing happens

self.assertEllipsis('foo', 'bar')
# -> AssertionError: Differences (ndiff with -expected +actual):
     - foo
     + bar

self.assertNotEllipsis('foo', 'foo')
# -> AssertionError: "Value unexpectedly matches expression 'foo'."

To use it, inherit from gocept.testing.assertion.Ellipsis in addition to unittest.TestCase.


03 Nov 2011 7:19am GMT

19 Nov 2010

feedPlanet CherryPy

Robert Brewer: logging.statistics

Statistics about program operation are an invaluable monitoring and debugging tool. How many requests are being handled per second, how much of various resources are in use, how long we've been up. Unfortunately, the gathering and reporting of these critical values is usually ad-hoc. It would be nice if we had 1) a centralized place for gathering statistical performance data, 2) a system for extrapolating that data into more useful information, and 3) a method of serving that information to both human investigators and monitoring software. I've got a proposal. Let's examine each of those points in more detail.

Data Gathering

Just as Python's logging module provides a common importable for gathering and sending messages, statistics need a similar mechanism, and one that does not require each package which wishes to collect stats to import a third-party module. Therefore, we choose to re-use the logging module by adding a statistics object to it.

That logging.statistics object is a nested dict:

import logging
if not hasattr(logging, 'statistics'): logging.statistics = {}

It is not a custom class, because that would 1) require apps to import a third-party module in order to participate, 2) inhibit innovation in extrapolation approaches and in reporting tools, and 3) be slow. There are, however, some specifications regarding the structure of the dict.

    {
   +----"SQLAlchemy": {
   |        "Inserts": 4389745,
   |        "Inserts per Second":
   |            lambda s: s["Inserts"] / (time() - s["Start"]),
   |  C +---"Table Statistics": {
   |  o |        "widgets": {-----------+
 N |  l |            "Rows": 1.3M,      | Record
 a |  l |            "Inserts": 400,    |
 m |  e |        },---------------------+
 e |  c |        "froobles": {
 s |  t |            "Rows": 7845,
 p |  i |            "Inserts": 0,
 a |  o |        },
 c |  n +---},
 e |        "Slow Queries":
   |            [{"Query": "SELECT * FROM widgets;",
   |              "Processing Time": 47.840923343,
   |              },
   |             ],
   +----},
    }

The logging.statistics dict has strictly 4 levels. The topmost level is nothing more than a set of names to introduce modularity. If SQLAlchemy wanted to participate, it might populate the item logging.statistics['SQLAlchemy'], whose value would be a second-layer dict we call a "namespace". Namespaces help multiple emitters to avoid collisions over key names, and make reports easier to read, to boot. The maintainers of SQLAlchemy should feel free to use more than one namespace if needed (such as 'SQLAlchemy ORM').

Each namespace, then, is a dict of named statistical values, such as 'Requests/sec' or 'Uptime'. You should choose names which will look good on a report: spaces and capitalization are just fine.

In addition to scalars, values in a namespace MAY be a (third-layer) dict, or a list, called a "collection". For example, the CherryPy StatsTool keeps track of what each worker thread is doing (or has most recently done) in a 'Worker Threads' collection, where each key is a thread ID; each value in the subdict MUST be a fourth dict (whew!) of statistical data about
each thread. We call each subdict in the collection a "record". Similarly, the StatsTool also keeps a list of slow queries, where each record contains data about each slow query, in order.

Values in a namespace or record may also be functions, which brings us to:

Extrapolation

def extrapolate_statistics(scope):
    """Return an extrapolated copy of the given scope."""
    c = {}
    for k, v in scope.items():
        if isinstance(v, dict):
            v = extrapolate_statistics(v)
        elif isinstance(v, (list, tuple)):
            v = [extrapolate_statistics(record) for record in v]
        elif callable(v):
            v = v(scope)
        c[k] = v
    return c

The collection of statistical data needs to be fast, as close to unnoticeable as possible to the host program. That requires us to minimize I/O, for example, but in Python it also means we need to minimize function calls. So when you are designing your namespace and record values, try to insert the most basic scalar values you already have on hand.

When it comes time to report on the gathered data, however, we usually have much more freedom in what we can calculate. Therefore, whenever reporting tools fetch the contents of logging.statistics for reporting, they first call extrapolate_statistics (passing the whole statistics dict as the only argument). This makes a deep copy of the statistics dict so that the reporting tool can both iterate over it and even change it without harming the original. But it also expands any functions in the dict by calling them. For example, you might have a 'Current Time' entry in the namespace with the value "lambda scope: time.time()". The "scope" parameter is the current namespace dict (or record, if we're currently expanding one of those instead), allowing you access to existing static entries. If you're truly evil, you can even modify more than one entry at a time.

However, don't try to calculate an entry and then use its value in further extrapolations; the order in which the functions are called is not guaranteed. This can lead to a certain amount of duplicated work (or a redesign of your schema), but that's better than complicating the spec.

After the whole thing has been extrapolated, it's time for:

Reporting

A reporting tool would grab the logging.statistics dict, extrapolate it all, and then transform it to (for example) HTML for easy viewing, or JSON for processing by Nagios etc (and because JSON will be a popular output format, you should seriously consider using Python's time module for datetimes and arithmetic, not the datetime module). Each namespace might get its own header and attribute table, plus an extra table for each collection. This is NOT part of the statistics specification; other tools can format how they like.

Turning Collection Off

It is recommended each namespace have an "Enabled" item which, if False, stops collection (but not reporting) of statistical data. Applications SHOULD provide controls to pause and resume collection by setting these entries to False or True, if present.

Usage

    import logging
    # Initialize the repository
    if not hasattr(logging, 'statistics'): logging.statistics = {}
    # Initialize my namespace
    mystats = logging.statistics.setdefault('My Stuff', {})
    # Initialize my namespace's scalars and collections
    mystats.update({
        'Enabled': True,
        'Start Time': time.time(),
        'Important Events': 0,
        'Events/Second': lambda s: (
            (s['Important Events'] / (time.time() - s['Start Time']))),
        })
    ...
    for event in events:
        ...
        # Collect stats
        if mystats.get('Enabled', False):
            mystats['Important Events'] += 1

Original post blogged on b2evolution.

19 Nov 2010 7:08am GMT

12 Nov 2010

feedPlanet CherryPy

Kevin Dangoor: Paver is now on GitHub, thanks to Almad

Paver, the project scripting tool for Python, has just moved to GitHub thanks to Almad. Almad has stepped forward and offered to properly bring Paver into the second decade of the 21st century (doesn't have the same ring to it as bringing something into the 21st century, does it? :)

Seriously, though, Paver reached the point where it was good enough for me and did what I wanted (and, apparently, a good number of other people wanted as well). Almad has some thoughts and where the project should go next and I'm looking forward to hearing more about them. Sign up for the googlegroup to see where Paver is going next.

12 Nov 2010 3:11am GMT

09 Nov 2010

feedPlanet CherryPy

Kevin Dangoor: Paver: project that works, has users, needs a leader

Paver is a Python project scripting tool that I initially created in 2007 to automate a whole bunch of tasks around projects that I was working on. It knows about setuptools and distutils, it has some ideas on handling documentation with example code. It also has users who occasionally like to send in patches. The latest release has had more than 3700 downloads on PyPI.

Paver hasn't needed a lot of work, because it does what it says on the tin: helps you automate project tasks. Sure, there's always more that one could do. But, there isn't more that's required for it to be a useful tool, day-to-day.

Here's the point of my post: Paver is in danger of being abandoned. At this point, everything significant that I am doing is in JavaScript, not Python. The email and patch traffic is low, but it's still too much for someone that's not even actively using the tool any more.

If you're a Paver user and either:

1. want to take the project in fanciful new directions or,

2. want to keep the project humming along with a new .x release every now and then

please let me know.

09 Nov 2010 7:44pm GMT