15 Dec 2019

feedPlanet Python

S. Lott: Functional programming design pattern: Nested Iterators == Flattening

Here's a functional programming design pattern I uncovered. This may not be news to you, but it was a surprise to me. It cropped up when looking at something that needs parallelization to reduced the elapsed run time.

Consider this data collection process.

for h in some_high_level_collection(arg1):
for l in h.some_low_level_collection(arg2):
if some_filter(l):
logger.info("Processing %s %s", h, l)
some_function(h, l)

This is pretty common in devops world. You might be looking at all repositories in all github organizations. You might be looking at all keys in all AWS S3 buckets under a specific account. You might be looking at all tables owned by all schemas in a database.

It's helpful -- for the moment -- to stay away from taller tree structures like the file system. Traversing the file system involves recursion, and the pattern is slightly different there. We'll get to it, but what made this clear to me was a "simpler" walk through a two-layer hierarchy.

The nested for-statements aren't really ideal. We can't apply any itertools techniques here. We can't trivially change this to a multiprocessing.map().

In fact, the more we look at this, the worse it is.

Here's something that's a little easier to work with:

def h_l_iter(arg1, arg2):
for h in some_high_level_collection(arg1):
for l in h.some_low_level_collection(arg2):
if some_filter(l):
logger.info("Processing %s %s", h, l)
yield h, l

itertools.starmap(some_function, h_l_iter(arg1, arg2))

The data gathering has expanded to a few more lines of code. It gained a lot of flexibility. Once we have something that can be used with starmap, it can also be used with other itertools functions to do additional processing steps without breaking the loops into horrible pieces.

I think the pattern here is a kind of "Flattened Map" transformation. The initial design, with nested loops wrapping a process wasn't a good plan. A better plan is to think of the nested loops as a way to flatten the two tiers of the hierarchy into a single iterator. Then a mapping can be applied to process each item from that flat iterator.

Extracting the Filter

We can now tease apart the nested loops to expose the filter. In the version above, the body of the h_l_iter() function binds log-writing with the yield. If we take those two apart, we gain the flexibility of being able to change the filter (or the logging) without an awfully complex rewrite.

T = TypeVar('T')
def logging_iter(source: Iterable[T]) -> Iterator[T]:
for item in source:
logger.info("Processing %s", item)
yield item

def h_l_iter(arg1, arg2):
for h in some_high_level_collection(arg1):
for l in h.some_low_level_collection(arg2):
yield h, l

raw_data = h_l_iter(arg1, arg2)
filtered_subset = logging_iter(filter(some_filter, raw_data))
itertools.starmap(some_function, filtered_subset)

Yes, this is still longer, but all of the details are now exposed in a way that lets me change filters without further breakage.

Now, I can introduce various forms of multiprocessing to improve concurrency.

This transformed a hard-wired set of nest loops, if, and function evaluation into a "Flattener" that can be combined with off-the shelf filtering and mapping functions.

I've snuck in a kind of "tee" operation that writes an iterable sequence to a log. This can be injected at any point in the processing.

Logging the entire "item" value isn't really a great idea. Another mapping is required to create sensible log messages from each item. I've left that out to keep this exposition more focused.

I'm sure others have seen this pattern, but it was eye-opening to me.

Full Flattening

The h_l_iter() function is actually a generator expression. A function isn't needed.

h_l_iter = (
(h, l)
for h in some_high_level_collection(arg1)
for l in h.some_low_level_collection(arg2)
)

This simplification doesn't add much value, but it seems to be general truth. In Python, it's a small change in syntax and therefore, an easy optimization to make.

What About The File System?

When we're working with some a more deeply-nested structure, like the File System, we'll make a small change. We'll replace the h_l_iter() function with a recursive_walk() function.

def recursive_walk(path: Path) -> Iterator[Path]:
for item in path.glob():
if item.is_file():
yield item
elif item.is_dir():
yield from recursive_walk(item)

This function has, effectively the same signature as h_l_iter(). It walks a complex structure yielding a flat sequence of items. The other functions used for filtering, logging, and processing don't change, allowing us to build new features from various combinations of these functions.

tl;dr

The too-long version of this is:

Replace for item in iter: process(item) with map(process, iter).

This pattern works for simple, flat items, nested structures, and even recursively-defined trees. It introduces flexibility with no real cost.

The other pattern in play is:

Any for item in iter: for sub-item in item: processing is "flattening" a hierarchy into a sequence. Replace it with (sub-item for item in iter for sub-item in item).

These felt like blinding revelations to me.

15 Dec 2019 10:29am GMT

Anwesha Das: Rootconf Hyderbad, 2019

What is Rootconf?

Rootconf is the conference on sysadmins, DevOps, SRE, Network engineers. Rootconf started its journey in 2012 in Bangalore, 2019 was the 7th edition of Rootconf. In these years, through all the Rootconfs, there is a community that has developed around Rootconf. Now people do come to attend Rootconf not just to attend the conference but also to attend friends and peers to discuss projects and ideas.

Need for more Rootconf

Over all these years, we have witnessed changes in the network, infrastructure, and security threats. We have designed Rootconf (in all these years), keeping in mind the changing needs of the community. Lately, we have realized that the needs of the community based on their geographic locations/ cities. Like in Pune, there is a considerable demand for sessions that deals with small size infrastructure suited for startups and SMEs as there is a growing startup industry there. In Delhi, there is a demand for discussion around data centers, network designs, and so on. And in Hyderabad, there is a want for solutions around large scale infrastructure. The Bangalore event did not suffice to solve all these needs. So more the merrier, we decided to have more than one Rootconf a year.

Rootconf Pune was the first of this 'outstation Rootconf journey'. The next was Rootconf Hyderabad. It was the first event for which I was organizing the editorial, community, and all by myself.
I joined HasGeek as Community Manager and Editorial co-ordinator. After my Rootconf, Bangalore Zainab fixed a goal for me.

Z : 'Anwesha, I want to organize Rootconf Hyderabad all by yourself, you must be doing with no or minimum help from me.'
A: "Ummm hmmm ooops"
Z: 'Do not worry, I will be there to guide you. We will have our test run with you in Pune. So buck up, girl.'

Rootconf Hyderabad, the conference

The preparation for Rootconf Hyderabad started with them. After months of the editorial process - scouting for the proposals, reviewing them, having several rehearsals, and after passing the iron test in Pune, I reached Hyderabad to join my colleague Mak. Mak runs the sales at Hasgeek. With the camera, we had our excellent AV captain Amogh. So I was utterly secured and not worried about those two aspects.

A day before the conference Damini, our emcee, and I chalked out the plans for navigating the schedule and coordinating the conference. We met the volunteers at the venue after a humongous lunch with Hyderabadi deliciously (honest confession: food is the primary reason why I love to attend the conference in Hyderabad). We have several call volunteers in which our volunteer coordinator Jyotsna briefed them the duties. But it is always essential to make the volunteers introduced with the ground reality. We had a meet up at Thought Works.
The day of the conference starts early, much too early for the organizers and volunteers. Rootconf Hyderabad was no different. We opened the registration, and people started flocking in the auditorium. I opened the conference by addressing -

Then our emcee Damini took over. The first half of our schedule designed keeping the problems of large scale infrastructure in mind, like observability, maintainability, scalability, performance, taming the large systems, and networking issues. Piyush started our first speaker gave a talk on Observability and control theory. Next was Flipkart's journey of "Fast object distribution using P2P" by Ankur Jain. After a quick beverage break, Anubhav Mishra shared his take on "Taming infrastructure workflow at scale", the story of Hashicorp Followed by Tasdik Rahman and his story of "Achieving repeatable, extensible and self serve infrastructure" at Gojek."

The next half of the day planned to address the issues shared with infrastructure despite size or complexity. Like - Security, DevOpSec, scaling, and of course, microservices (an infrastructure conference seems incomplete without the discussion around monolith to microservices). Our very own security expert Lavakumar started it with "Deploying and managing CSP: the browser-side firewall", describing the security complexities post mege cart attack days. Jambunathan shared the tale of "Designing microservices around your data design" . For the last talk of the day, we had Gaurav Kamboj. He told us what happens with the system engineers at Hotstar when Virat Kohli is batting on his 90s, in "Scaling hotstar.com for 25 million concurrent viewers"
Birds of a Feather (BOF) session has always been a favorite at Rootconf. These non-recorded sessions give the participants a chance, to be frank. We have facilitators to progress the discussion and not presenters. While we had talks going on in the main audi, there are dedicated BOF area where we had sessions on

This was the first time gauging the popularity of the BOFs we tried something new. We had a BOF session planned at the primary audi. It was on "Doing DevSecOps in your organization," aided by Lava and Hari. It was one session in which our emcee Damini had a difficult time to end. People had so many stories to share questions to ask, but there was no time. I also got some angry looks (which I do not mind at all) :).

In India, I have noticed that most of the conferences fail to have good/up to the mark flash talks. Invariably they have community information, conference, or meetup notifications (the writer is guilty of doing it). So I proposed that why can not we accept proposals for flash talks as well. Half of them are pre-selected and rest selected on the spot. Zainab agreed to it. Now we are following this rule since Rootconf Pune, and the quality of the flash talks has improved a lot. We had some fantastic flash talks. You can check it for yourself at https://www.youtube.com/watch?v=AlREWUAEMVk.

Thank you

Organizing a conference is not a person's job. In an extensive infrastructure, it is the small tools, microservices that keeps an extensive system working. Consider conference as a system, tasks as microservices. It requires each task to be perfect for the conference to be successful and flawless. And I am blessed to have an amazing team. Each amazing volunteers, the Null Hyderabad, Mozilla, AWS community, our emcee Damini, Hall manager Geetanjali, Speakers, Sponsors, attendees, and my team HasGeek. Last but not least, thank you, Zainab, for trusting me, being by my side, and not letting me fall.

The experience

Organizing a conference has been the journey of estrogen and adrenaline overflow for me. Be it getting into nightmares the excitement of each ticket sales, the long chats with the reviewers about talks, BOFs, discussion with the communities what they want from Rootconf, jitters before the conference starts or tweets, a blog post from the people that they enjoyed the conference was useful for them. It was an exciting, scary, happy, and satisfying journey for me. And guess what, my life continues to be so as Rootconf is ready with it's Delhi edition. I hope to meet you there.

15 Dec 2019 10:15am GMT

Catalin George Festila: Python 3.7.5 : Simple intro in CSRF.

CSRF or Cross-Site Request Forgery is a technique used by cyber-criminals to force users into executing unwanted actions on a web application. To protect against web form CSRF attacks, it's isn't sufficient for web applications to trust authenticated users, must be equipped with a unique identifier called a CSRF token similar to a session identifier. Django 3.0 can be used with CSRF, see the

15 Dec 2019 6:03am GMT

10 Nov 2011

feedPlanetJava

OSDir.com - Java: Oracle Introduces New Java Specification Requests to Evolve Java Community Process

From the Yet Another dept.:

To further its commitment to the Java Community Process (JCP), Oracle has submitted the first of two Java Specification Requests (JSRs) to update and revitalize the JCP.

10 Nov 2011 6:01am GMT

OSDir.com - Java: No copied Java code or weapons of mass destruction found in Android

From the Fact Checking dept.:

ZDNET: Sometimes the sheer wrongness of what is posted on the web leaves us speechless. Especially when it's picked up and repeated as gospel by otherwise reputable sites like Engadget. "Google copied Oracle's Java code, pasted in a new license, and shipped it," they reported this morning.



Sorry, but that just isn't true.

10 Nov 2011 6:01am GMT

OSDir.com - Java: Java SE 7 Released

From the Grande dept.:

Oracle today announced the availability of Java Platform, Standard Edition 7 (Java SE 7), the first release of the Java platform under Oracle stewardship.

10 Nov 2011 6:01am GMT

28 Oct 2011

feedPlanet Ruby

O'Reilly Ruby: MacRuby: The Definitive Guide

Ruby and Cocoa on OS X, the iPhone, and the Device That Shall Not Be Named

28 Oct 2011 8:00pm GMT

14 Oct 2011

feedPlanet Ruby

Charles Oliver Nutter: Why Clojure Doesn't Need Invokedynamic (Unless You Want It to be More Awesome)

This was originally posted as a comment on @fogus's blog post "Why Clojure doesn't need invokedynamic, but it might be nice". I figured it's worth a top-level post here.

Ok, there's some good points here and a few misguided/misinformed positions. I'll try to cover everything.

First, I need to point out a key detail of invokedynamic that may have escaped notice: any case where you must bounce through a generic piece of code to do dispatch -- regardless of how fast that bounce may be -- prevents a whole slew of optimizations from happening. This might affect Java dispatch, if there's any argument-twiddling logic shared between call sites. It would definitely affect multimethods, which are using a hand-implemented PIC. Any case where there's intervening code between the call site and the target would benefit from invokedynamic, since invokedynamic could be used to plumb that logic and let it inline straight through. This is, indeed, the primary benefit of using invokedynamic: arbitrarily complex dispatch logic folds away allowing the dispatch to optimize as if it were direct.

Your point about inference in Java dispatch is a fair one...if Clojure is able to infer all cases, then there's no need to use invokedynamic at all. But unless Clojure is able to infer all cases, then you've got this little performance time bomb just waiting to happen. Tweak some code path and obscure the inference, and kablam, you're back on a slow reflective impl. Invokedynamic would provide a measure of consistency; the only unforeseen perf impact would be when the dispatch turns out to *actually* be polymorphic, in which case even a direct call wouldn't do much better.

For multimethods, the benefit should be clear: the MM selection logic would be mostly implemented using method handles and "leaf" logic, allowing hotspot to inline it everywhere it is used. That means for small-morphic MM call sites, all targets could potentially inline too. That's impossible without invokedynamic unless you generate every MM path immediately around the eventual call.

Now, on to defs and Var lookup. Depending on the cost of Var lookup, using a SwitchPoint-based invalidation plus invokedynamic could be a big win. In Java 7u2, SwitchPoint-based invalidation is essentially free until invalidated, and as you point out that's a rare case. There would essentially be *no* cost in indirecting through a var until that var changes...and then it would settle back into no cost until it changes again. Frequently-changing vars could gracefully degrade to a PIC.

It's also dangerous to understate the impact code size has on JVM optimization. The usual recommendation on the JVM is to move code into many small methods, possibly using call-through logic as in multimethods to reuse the same logic in many places. As I've mentioned, that defeats many optimizations, so the next approach is often to hand-inline logic everywhere it's used, to let the JVM have a more optimizable view of the system. But now we're stepping on our own feet...by adding more bytecode, we're almost certainly impacting the JVM's optimization and inlining budgets.

OpenJDK (and probably the other VMs too) has various limits on how far it will go to optimize code. A large number of these limits are based on the bytecoded size of the target methods. Methods that get too big won't inline, and sometimes won't compile. Methods that inline a lot of code might not get inlined into other methods. Methods that inline one path and eat up too much budget might push out more important calls later on. The only way around this is to reduce bytecode size, which is where invokedynamic comes in.

As of OpenJDK 7u2, MethodHandle logic is not included when calculating inlining budgets. In other words, if you push all the Java dispatch logic or multimethod dispatch logic or var lookup into mostly MethodHandles, you're getting that logic *for free*. That has had a tremendous impact on JRuby performance; I had previous versions of our compiler that did indeed infer static target methods from the interpreter, but they were often *slower* than call site caching solely because the code was considerably larger. With invokedynamic, a call is a call is a call, and the intervening plumbing is not counted against you.

Now, what about negative impacts to Clojure itself...

#0 is a red herring. JRuby supports Java 5, 6, and 7 with only a few hundred lines of changes in the compiler. Basically, the compiler has abstract interfaces for doing things like constant lookup, literal loading, and dispatch that we simply reimplement to use invokedynamic (extending the old non-indy logic for non-indified paths). In order to compile our uses of invokedynamic, we use Rémi Forax's JSR-292 backport, which includes a "mock" jar with all the invokedynamic APIs stubbed out. In our release, we just leave that library out, reflectively load the invokedynamic-based compiler impls, and we're off to the races.

#1 would be fair if the Oracle Java 7u2 early-access drops did not already include the optimizations that gave JRuby those awesome numbers. The biggest of those optimizations was making SwitchPoint free, but also important are the inlining discounting and MutableCallSite improvements. The perf you see for JRuby there can apply to any indirected behavior in Clojure, with the same perf benefits as of 7u2.

For #2, to address the apparent vagueness in my blog post...the big perf gain was largely from using SwitchPoint to invalidate constants rather than pinging a global serial number. Again, indirection folds away if you can shove it into MethodHandles. And it's pretty easy to do it.

#3 is just plain FUD. Oracle has committed to making invokedynamic work well for Java too. The current thinking is that "lambda", the support for closures in Java 7, will use invokedynamic under the covers to implement "function-like" constructs. Oracle has also committed to Nashorn, a fully invokedynamic-based JavaScript implementation, which has many of the same challenges as languages like Ruby or Python. I talked with Adam Messinger at Oracle, who explained to me that Oracle chose JavaScript in part because it's so far away from Java...as I put it (and he agreed) it's going to "keep Oracle honest" about optimizing for non-Java languages. Invokedynamic is driving the future of the JVM, and Oracle knows it all too well.

As for #4...well, all good things take a little effort :) I think the effort required is far lower than you suspect, though.

14 Oct 2011 2:40pm GMT

07 Oct 2011

feedPlanet Ruby

Ruby on Rails: Rails 3.1.1 has been released!

Hi everyone,

Rails 3.1.1 has been released. This release requires at least sass-rails 3.1.4

CHANGES

ActionMailer

ActionPack

ActiveModel

ActiveRecord

ActiveResource

ActiveSupport

Railties

SHA-1

You can find an exhaustive list of changes on github. Along with the closed issues marked for v3.1.1.

Thanks to everyone!

07 Oct 2011 5:26pm GMT

21 Mar 2011

feedPlanet Perl

Planet Perl is going dormant

Planet Perl is going dormant. This will be the last post there for a while.

image from planet.perl.org

Why? There are better ways to get your Perl blog fix these days.

You might enjoy some of the following:

Will Planet Perl awaken again in the future? It might! The universe is a big place, filled with interesting places, people and things. You never know what might happen, so keep your towel handy.

21 Mar 2011 2:04am GMT

improving on my little wooden "miniatures"

A few years ago, I wrote about cheap wooden discs as D&D minis, and I've been using them ever since. They do a great job, and cost nearly nothing. For the most part, we've used a few for the PCs, marked with the characters' initials, and the rest for NPCs and enemies, usually marked with numbers.

With D&D 4E, we've tended to have combats with more and more varied enemies. (Minions are wonderful things.) Numbering has become insufficient. It's too hard to remember what numbers are what monster, and to keep initiative order separate from token numbers. In the past, I've colored a few tokens in with the red or green whiteboard markers, and that has been useful. So, this afternoon I found my old paints and painted six sets of five colors. (The black ones I'd already made with sharpies.)

D&D tokens: now in color

I'm not sure what I'll want next: either I'll want five more of each color or I'll want five more colors. More colors will require that I pick up some white paint, while more of those colors will only require that I re-match the secondary colors when mixing. I think I'll wait to see which I end up wanting during real combats.

These colored tokens should work together well with my previous post about using a whiteboard for combat overview. Like-type monsters will get one color, and will all get grouped to one slot on initiative. Last night, for example, the two halfling warriors were red and acted in the same initiative slot. The three halfling minions were unpainted, and acted in another, later slot. Only PCs get their own initiative.

I think that it did a good amount to speed up combat, and that's even when I totally forgot to bring the combat whiteboard (and the character sheets!) with me. Next time, we'll see how it works when it's all brought together.

21 Mar 2011 12:47am GMT

20 Mar 2011

feedPlanet Perl

Perl Vogue T-Shirts

Is Plack the new Black?In Pisa I gave a lightning talk about Perl Vogue. People enjoyed it and for a while I thought that it might actually turn into a project.

I won't though. It would just take far too much effort. And, besides, a couple of people have pointed out to be that the real Vogue are rather protective of their brand.

So it's not going to happen, I'm afraid. But as a subtle reminder of the ideas behind Perl Vogue I've created some t-shirts containing the article titles from the talk. You can get them from my Spreadshirt shop.

20 Mar 2011 12:02pm GMT