10 Mar 2010
Planet Parrot
Andrew Whitworth: GSoC Idea: GMP Bindings
This conversation happened yesterday on IRC, with some off-topic things edited out:
darbelo: That reminds me. I hate our bignums and want them to die...
whiteknight: darbelo: I agree with the bignums thing 100%. I want bignums out of the repo and moved to their own project
whiteknight: There's no sense keeping them when I suspect a majority of users can't use them because they don't have GMP installed
darbelo: Actually, I wouldn't mind them being in the core if they weren't dependant on a lib I don't have.
darbelo: I actually had started to write a stand-alone BigInteger PMC after last year's SoC.
whiteknight: that would make an awesome project too.
whiteknight: I think we should have lots of projects like that, and for developers to be able to pick which solution they want
whiteknight: as we are now, it's easier to force BigInt to pretend to do what we need instead of just using the best solution, which might be DecNumber or something else
darbelo: Maybe, but GMP is much, much more than just bignums. It's a pretty big library.
darbelo: Our PMCs don't even start to scratch the surface of what GMP can do.
whiteknight: darbelo: so moving those PMCs out to a separate library and adding wrappers for other functionality might be nice
darbelo: I would consider a GMP binding much more valuble to parrot than our current use of the lib, yes.
bubaflub: last year i worked a bit with GMP library and suggested to dukeleto we work on a GMP binding for parrot
bubaflub: last year with GSOC and perl 5
darbelo: bubaflub: That would be nice to have.
bubaflub: though we used an existing perl 5 binding (Math::GMPz)
whiteknight: yes, that would be a wonderful project
bubaflub: but we could nab the test suite and what not
whiteknight: exactly. We have the two PMC types, and we could write wrappers for the rest of the library and get all sorts of additional power
bubaflub: i think access to the GMP library in general would be nice; the stuff i worked on last year was setting some foundational stuff for cryptography libraries
Parrot has two PMC types that wrap GMP: BigInt nd BigNum. I think, and apparently a few people agree, that these two types have no business being in the core Parrot repository and should be moved to another project. The immediate benefit to this would be that the bindings for GMP could be improved and expanded independently, instead of only providing what little functionality Parrot actually makes direct use of.
A good GSoC project for this year would be to move (or fork) the current BigInt and BigNum PMC types to a new project and use them as the cornerstone for writing a more comprehensive interface for the GMP library.This could include other PMC types, NCI function wrappers, PMC methods, ops, and other things to allow access to the power of the GMP library. Adding custom Integer-like and Float-like PMCs that autopromote to their Big- counterparts would be nice too.
For more info about this project, you could probably get in touch with myself, darbelo, or bubaflub.
10 Mar 2010 4:00pm GMT
09 Mar 2010
Planet Parrot
Andrew Whitworth: Weekend Hackathon
This weekend we were supposed to have a hackathon to get the new PCC branch up and running. The purpose of this branch is, as I have discussed before, to rearrange the call sequence so return values are processed after the function invocation instead of before. In the grand scheme of things, especially in comparison to the previous PCC refactors, this ends up being a minor change characterized mostly by massive code deletions instead of needing to write huge new functions or rewrite tons of existing functions. A few bugs stymied completion of the branch, but I have high hopes that the remaining bugs will get worked out soon. This branch was worked on primarily by allison, though I lent an eye as time permitted and chromatic lent some major debugging support as well.
Very few people ended up working on the PCC branch, even though that was the "official" target of the hackathon. A large amount of effort instead went to work on other branches. I'm certainly not complaining about the division of effort. In fact, I want to celebrate it. I'm extremely happy to see other worthwhile projects getting extra manhours devoted to them. It's very good to get people working on Parrot in any capacity, and as I mentioned above the PCC work was not a huge project that would have required a dozen developers focusing on it anyway.
cotto and bacek focused their considerable talents on the ops_pct branch, which aims to replace the Perl5-based ops parser with a bootstrapped version written with PCT. Im not sure about the exact status of that branch, but there was a huge flurry of commits and I have to believe things are progressing rapidly.
plobsing started a new branch to tackle ticket #1015. Using some of the new mechanisms he's developed to find and prevent cycles in the freeze/thaw code, he decided to try and fix the problems with cycles in deep clones as well. I don't know the current status, but last I saw his work was going well.
Coke has started a new branch to continue the makefile cleanups, this time focusing on the recursive makefile for the dynops. He seems to be running into some bugs this morning, but hopefully nothing that cannot be quickly overcome.
Overall I would label the hackathon a great success. A lot of people came out to IRC to follow progress and work on various projects, and it is all much-appreciated.
09 Mar 2010 12:01pm GMT
Andrew Whitworth: GSoC: Parrot in Summer 2010
Jonathan Leto sent out a great email to the list today about Parrot's involvement in GSoC this year. Parrot will be combining together with the Perl foundation again and entering as a single organization. I very much like this arrangement, under the blind assumption that we do better together in terms of student allotment than we do apart. I have no reason to doubt that.
Mentors: If you want to sign up to be a potential mentor, you can do it on the Perl foundation wiki.
Project Ideas: If you have any project ideas (I know I do!), list them on the Perl foundation wiki. If you tell me about the ideas as well, I'll feature them in a blog post and hopefully drum up some interest among prospective students.
09 Mar 2010 9:10am GMT
Jonathan Leto: Google Summer of Code 2010
I am working on the application for The Perl Foundation and Parrot
Foundation to participate in Google Summer of Code 2010. GSoC is a
program where Google funds eligible students to hack on open source
projects for a summer. It is a great opportunity for the students and
the communities that mentor them. You also may be interested in this
summary of our involvement last year . Our application will be
submitted by the end of this week.
Please join us in getting prepared for this year. There is a page for
possible mentors to volunteer as well as a page for
09 Mar 2010 7:24am GMT
02 Mar 2010
Planet Parrot
Andrew Whitworth: Difference between PMCs and Objects
There has been lots of talk and activity lately that has to deal with Parrot Objects. My rant about exceptions in Parrot has incited Tene to begin a flurry of development on that system, and Austin's Kakapo project has been regularly pushing the boundaries of what kinds of operations are and should be possible (and finding lots of bugs along the way!). Other people have been bringing up the topic as well, and lots of people are asking lots of questions about the implementation. I'm going to use this post to explain a bit about how Objects and PMCs work in Parrot, and maybe later I'll devote a post or two to ideas for fixing this system.
PMCs are basically objects, though extremely simple, flexible, and low-level. PMCs are interacted with, primarily, through the VTABLE interface. VTABLEs in Parrot are long lists of C function pointers that implement various behaviors. Calling the in-place addition VTABLE, add_i, is done like this in C:
VTABLE_add_i(interp, pmc, 5);
...Which translates to this:
pmc->vtable->add_i(interp, pmc, 5);
By pointing to a per-type VTABLE structure, PMCs with the same type can access a common list of function behaviors without overlapping or needing to do expensive switch/cases over a list of direct function calls. Likewise, determining the type of a PMC means finding the type of the VTABLE it points to:
pmc->vtable->base_type; // type number
pmc->vtable->whoami; // type name (Parrot STRING)
pmc->vtable->class; // Class or PMCProxy PMC for the type
Also, if we have the type number, we can look up the particular VTABLE in an array:
VTABLE * tbl = interp->vtable[index];
In a sense, that's all there is to a PMC. All interactions with a PMC happen through this interface of about 185 function pointers. A PMC, by itself, doesn't have things that we would normally associate with "objects" in higher-level systems: Attributes and Methods. Sure, PMCs do have a way to associate a C structure, and therefore maintain a list of what we call "attributes", but those aren't directly accessible from PIR without adding some kind of lookup routine to find them and maybe wrap them into one of the Parrot register types (INTVAL, FLOATVAL, STRING, PMC). PMCs also appear to have methods, but this really isn't the case when you look at it closely.
As I describe in a previous post, the long way to invoke a method on a PMC is like this:
$P0 = new ['Foo']
$P1 = find_method $P0, "bar"
callmethodcc $P0, $P1
The find_method opcode is a thin wrapper around the VTABLE_find_method interface function. If I translate this to an extremely condensed and wildly inaccurate pseudo-C listing, we get:
PMC * p0 = Parrot_pmc_new(interp, type_Foo);
PMC * p1 = VTABLE_find_method(interp, p0, "bar");
setup_method_call(interp, p0);
VTABLE_invoke(interp, p1);
This is obviously an extremely inaccurate listing, but should do well to illustrate my point. The method is actually a separate PMC type. It can be either a Sub (a .sub written in PIR) or an NCI (a wrapper type around a C function call). To make the call we set up the argument list (the invocant, $P0, is treated sort of like an argument but is kept distinct) and then invoke the method.
Before they are invoked, methods are stored inside either a Class or PMCProxy PMC associated with that type. When we call VTABLE_find_method(interp, p0, "bar"), we go through this machination:
PMC * class = pmc->vtable->class;
PMC * methods = class->data->methods;
PMC * method = VTABLE_get_pmc_keyed_str(interp, methods, "bar");
What we think of as an "object" and a "class" is actually a small collection of interoperating PMCs. The PMC itself contains a long list of VTABLEs and a small amount of data stored in a C structure, which cannot be directly accessed from PIR code. The PMCProxy PMC (like Class, which I will describe later, but designed to work with PMC types written in C) contains a hash of methods and a variety of other data. Methods themselves are their own PMCs, complete with their own type data. To really blow your mind consider that, as a PMC, you can call a method on a method, or even a method on a method on a method.
In short, a PMC is sort of like the building block that is used to create objects and a type system, though the PMCs themselves are not what we normally think of as "objects". The only way to interact with a PMC is through VTABLEs, not attributes or methods. Luckily, VTABLEs exist that allow us to query the object for related attributes and methods, though the PMC itself may not necessarily respond to these requests.
Using PMCs, Parrot does provide a proper Object system through the use of two special PMC types: Object and Class. Class, as can be guessed, is a "metaobject" that defines type information for objects of a single type. The Class uses a series of PMCs internally to manage things like method PMCs and attributes. The Object PMC is the basic building block of a class instance object. It provides a series of default vtables that allow it to interact with Class the way we expect (to find methods that are stored in the class reliably, for instance) and to provide a set of attributes that are available for access from PIR. PMCs are the almost formless building blocks, Object is a very specific PMC type that provides behaviors that we expect from an OO type system.
Now that we've covered basic definitions, what are the big operational differences between the two systems? Here's a short list:
- Object types are defined by Class PMCs. PMCs are defined by PMCProxy PMCs
- Class PMCs are created whenever we do a "newclass" or "subclass" operation from PIR. PMCProxy PMCs are created lazily, only when we actually need to introspect a built-in PMC type.
- Objects must be created from a Class, which means the Class PMC must exist before any Objects of that type can be created. PMCs can be created by themselves and generally don't require instantiation from another PMC.
- Objects have very regimented behavior: You can (and should) expect certain things when you access a named attribute or named Method. In a PMC these behaviors may be overridden to do different and unexpected things. Specifically, it can be very difficult to get access to named attributes on a PMC unless they are explicitly made visible from PIR (which can be a lot of work, and not a lot of PMC types do it completely)
- Inheritance between PMCs happens at the C level, so C-level attribute structures are merged together and made visible from C code. Inheritance between objects happens at the PIR level, method and attribute lists are combined and made visible as expected when accessed from PIR code. Inheritance from a PMC to an object is almost always broken, if you expect the attributes and methods from the PMC to magically become visible as attributes and methods on the Object. I've never seen inheritance from an Object to a PMC subclass, but I suspect it is broken even worse.
- The VTABLEs in the Object PMC all provide an option to use a PIR-based override routine to implement the behavior. To do this, every VTABLE function in the Object PMC searches the associated Class for a similarly named VTABLE Sub PMC and, if one is found, calls that. PMC types almost never search for an override in the Proxy, and if you define one it will never be called (unless you specifically implement the logic to search for and execute it). On a related note the VTABLEs of an Object, because they are stored as PMCs in a Hash in the Class, can be modified at runtime. The VTABLEs of a PMC cannot be (well, I guess you could change the pointer to call a different function if your C-foo is strong, but I would prepare for fire and brimstone. Also, I won't fix any "bugs" that arise from this misguided behavior). I estimate at least 10% of reported bugs or feature requests in Parrot come from the "this sucks worse than I would expect" behavior of subclassing Objects from PMCs. If you can get away with it, it is almost always better to delegate to a built-in type instead of inheriting from it directly. But, I can talk more about problems and workaround solutions like this in another post.
So there you have a guide to the differences between Objects and PMCs. PMCs are the low-level building blocks of an object system, and Objects are combinations of several PMCs and a large number of default VTABLEs to implement an expected set of OO behaviors. In a sense, Objects are PMCs, but in another sense they really aren't.
02 Mar 2010 8:00am GMT
chromatic: Perl 6 Design Minutes for 24 February 2010
The Perl 6 design team met by phone on 24 February 2010. Larry, Allison, Patrick, and chromatic attended.
Larry:
- my work last week was almost entirely responsive to various discussions on irc and p6l, even when it doesn't seem like it
- clarified that
LEAVE-style phasers do not trip till after an exception is handled (and not resumed) - the implementation of take is specifically before unwinding even if implemented with a control exception
- simplified series operator by moving generator function to the left side (any function on right side will now be a limiting conditional)
- a
*is no longer required to intuit the series on the left; the absence of generator before the...operator is sufficient - first argument on the right of
...is now always a limiter argument - for convenience and consistency, added a new
...^form to exclude a literal limiter from the generated series - unlike ranges, however, there is no leading exclusion
^...or^...^ - series is a list associative list infix, and each
...pays attention only the portion of the list immediately to its left (plus the limit from the right) - an "impossible" limit can terminate a monotonic intuited series even if the limit can never match exactly
- variables now default to a type of
Any, and must explicitly declareMuorJunctiontype to hold junctions - this is to reduce pressure to duplicate many functions like
==withMuarguments; most of our failure values should be derived from Any in any case - a
Muresult is more indicative of a major malfunction now, and is caught at first assignment to anAnyvariable Instant/Durationtypes are biased away fromNumand towardsRat/FatRatsemanticsInstantis now completely opaque; we no longer pretend to be the same as TAI, numerically speakingInstants are now considered a more basic type than epochs, which are just particular named instants- all culturally aware time can be based on calculations involving instants and durations
- list associative operators now treat non-matching op names as non-associative rather than right-associative, forcing parens
Whateversemantics now autocurry any prefix, postfix, or infix operator that doesn't explicitly declare that it handles whateverness itselfWhateverCodeobjects now take a signature to keep clear how many args are not yet curried- so
*+*is now more likeWhateverCode:($x,$y) - autocurrying is still transitive so multiple ops can curry themselves around a
* - added semilists as
Sliceltype to go withParcel - this allows us to bind
@array[1,2,3]differently from@array[1,2,3;4,5,6], for instance - the
Matchertype now excludesBoolarguments to prevent accidental binding to outer$_when closure is needed whenand~~will now warn of always/never matching on direct use ofTrueorFalsenames as matcher- STD generalizes
\wlookahead to all twigils now - STD now treats non-matching list associatives as non-associative
- things like
1 min 2 max 3are now illegal, and require parenthesization for clarity - STD now treat invocant colon as just a comma variant so it does not fall afoul of the list associativity change
- CORE now recognizes the
TrigBaseenumeration
Patrick:
- first release of the new branch of Rakudo last week
- passing ~25,000 tests at the release
- thanks to optimizations from chromatic, Jonathan, and Vasily, Rakudo has a lot of speed improvements
- in particular, it can run those tests in under 10 minutes, non-parallel, depending on your hardware
- older releases took 25 minutes and more
- the regex tests will slow things down
- ultimately, we're seeing a big speed improvement over the past releases
- cleaned up lists and slices, now they work pretty well
- worked with Solomon Foster and others to speed up trig operations
- fixed a bug related to lexicals declared in classes
- fixed the long-standing and often recurring problem with curlies ending a line/statement causing the next statement to be a statement modifier
- easy to fix in the new grammar
- that was nice
- made an initial implementation of the
sortmethod - it's very short, because Parrot provides one
- there are a few bugs in Rakudo there still, but I'll get them
- planning for the Copenhagen hackathon on March 5 - 9
- Jonathan and I have been updating the Rakudo roadmap
- will check that in in the next couple of hours
- so far, every time we review it, we surprise ourselves at how much we've accomplished
- we're meeting all of the top priority goals without making any heroic efforts
- we'll put those goals in as well as timelines
- most of the major tasks from previous roadmaps have happened
Allison:
- working on Python this week
- attended Python VM summit, Python language summit, and PyCon
- Parrot's on good track to support what Python needs
- useful to make community connections
- when I reviewed Pynie, I was surprised to see how close it is to supporting the whole Python syntax
- some of those features are big, like objects
- but we should support them soon
- Debian packages delayed by the absence of a sponsor
- they should go into Debian soon though
- I put in a request for feature-freeze exception for Ubuntu 10.4
- Parrot 2.0 should go in
- haven't made any commits to the PCC branch
- that'll be a top priority for next week
c:
- fixed a Parrot GC bug for last week's Rakudo release
- made some optimizations in Rakudo and Parrot
- helped Jonathan find a few more
- fixed a long-standing math MMD bug
- still working on HLL subclassing; more tricky than you think
- may be some conflicting design goals about vtable overriding and MMD
Allison:
- Patrick, do we need an explicit deprecation for old PGE and NQP?
Patrick:
- I think Will already added one for NQP
- we can add one for PGE if we need
- they don't necessarily have to disappear at the next release
- but no one's planning to maintain them
Allison:
- no reason not to put in the notice now
- we don't have to remove them at the earliest possible date
02 Mar 2010 5:12am GMT
01 Mar 2010
Planet Parrot
Andrew Whitworth: NQ-NQP Blog
A few days ago I mentioned an interesting new project called NQ-NQP, an implementation of the NQP language with a flex/bison frontend and an LLVM code generating backend. I've heard tonight that he's started blogging about it. Anybody who is interested in NQP or LLVM stuff might do well to give it a read.
01 Mar 2010 7:07pm GMT
28 Feb 2010
Planet Parrot
Andrew Whitworth: Proposal to Change find_method
Austin Hastings, as part of his Kakapo project (which I now have a commit bit to!) has started creating a mock object framework. We were talking about how to implement expected method calls, so I took a look at the find_method VTABLE of the Object PMC for some inspiration. What I saw was absolutely horrible, so I promptly created a branch to fix it. However, the more I looked and edited, the bigger I found the problems to be. I'll talk more about Kakapo in another post.
When I do code like this:
$P0 = new ['Foo']
$P0.'Bar'()
What is really happening is something similar to this:
$P0 = new ['Foo']
$P1 = find_method $P0, 'Bar'
callmethodcc $P0, $P1
Internally, the find_method opcode calls the VTABLE_find_method function on the given object. The object itself is expected then to walk the method resolution order (MRO) of it's inheritance hierachy to find a suitable method and return it. Along the way, the Object PMC needs to completely violate the encapsulation of the Class PMC to gather information about the MRO and then to search the list of methods in the Class for an entry with the given name. In short version, the C code from Object.find_method looks like this:
int num_classes = VTABLE_elements(interp, class->all_parents);
int i;
for (i = 0; i num_classes; i++) {
cur_class = VTABLE_get_pmc_keyed_int(interp,class->all_parents, i);
if (VTABLE_exists_keyed_str(interp, class->methods, name))
return VTABLE_get_pmc_keyed_str(interp, class->methods, name);
}
So Object reads the attributes of it's Class PMC directly, and manually traverses the MRO looking for the proper method. This causes a few problems. First, as a mostly stylistic point, this completely breaks encapsulation. We can't make a change to the MRO or the method storage and lookup mechanism in Class without likewise changing the behavior in Object.
Second point, since Object needs to know how to traverse the MRO and lookup methods, and requires intimate internal knowledge of the classes in the MRO, we are extremely limited in the types of objects that can be in the inheritance hierarchy. That is, we can't define our own metaobject types, we must use Class or PMCProxy, or a subclass thereof (and a careful reading of the code suggests that even subclasses will not work). This seems to be a remarkable limitation when you consider some of the diverse high-level languages that Parrot aims to support.
One thing I tried to do was create a find_method VTABLE in the Class PMC, and then delegate traversal of the MRO to Class instead of Object. This helped improve encapsulation greatly, but created another problem: Now I couldn't call methods on Class itself. Here's example code that broke:
$P0 = getclass 'Foo'
$P0.'add_vtable_override'("bar")
What we want to do is call a method on the class object itself, but what we end up doing is finding a method on objects of that type, and then trying to call that method on the class object. Problems.
Let's recap some issues:
- Find_method searches for a method to use on a given invocant
- The Class type has methods that need to be accessible through find_method
- Object has to break encapsulation and monkey around in Class's internals, which means we can only use Class objects, and objects strictly isomorphic to Class (like PMCProxy) in an MRO
- We cannot delegate the method lookup operation to the Class object, where it arguably belongs.
With these things in mind, I had an idea that I sent to the list which aims to fix all this: Create a new VTABLE function that searches for a method in a metaobject, instead of searching for the method on the invocant (like find_method does now). In terms of PIR, I'm thinking of enabling this kind of sequence:
$P0 = new ['Foo']
$P1 = getclass 'Foo'
$P2 = find_class_method 'Bar'
callmethodcc $P0, $P2
I don't want to remove find_method or change it in any way. But what I want to have is a way to delegate method lookup to the Class object as well. I think we will find that when we have a way to delegate lookup to the Class object that we will use it much more frequently and to greater effect than we use find_method now. I also think we will find that find_method can eventually be deprecated entirely, but that's another issue for another time.
One other problem that I failed to mention above is that every class has it's own completely linearized resolution order. So if Foo is a Bar, and Bar is a Baz, the Foo class has the MRO ("Foo", "Bar", "Baz"), Bar would have the MRO ("Bar", "Baz"), and Baz would have the MRO ("Baz"). Asking the Foo Class object for a method "Frobulate" would look in Bar, which would ask Baz. Then, Foo would move to the next item in it's MRO, Baz, and ask it. The net result is that Baz would be queried twice, since the Foo Class item doesn't know necessarily that Baz is in Bar's MRO, and Bar doesn't know that it is being queried from Foo (maybe Bar was being queried directly). So what we need is some kind of way to keep track of the MRO up front, and avoid re-defining the search MRO for each new delegation.
I think we could solve this issue if we defined a new VTABLE like this:
VTABLE PMC * find_class_method(STRING *name, PMC *mro_iterator)
In this conception, SELF would be the metaobject currently being searched, name would be the string name of the method to find, and mro_iterator would be an iterator object for the MRO list. When we do the PIR code:
$P0 = getclass "Foo"
$P1 = find_class_method $P0, "Frobulate"
The first call to the Foo class object would be VTABLE_find_class_method("Frobulate", NULL). Foo would then create an iterator over it's MRO (removing itself from the front of the list to avoid direct recursion) and passing that MRO iterator to Bar, which then calls the next item on the list (Baz). This has a few major advantages which are not necessarily obvious up front: Any object that defines find_class_method can be inserted into the MRO. This includes things that aren't really classes like Roles, Mixins, extension methods, and even autoloaders. Second, we gain more flexibility to modify the MRO of a class, because that class (and it's super-classes) can add additional search parents to the iterator as needed. We would also gain the ability to have more manual control over the MRO, because we could add a find_class_method_p_p_s_p op variant that also takes an existing MRO iterator. This would enable us to better implement something like a super() call, where we take the MRO iterator, manually pop the top item off it, and then call find_class_method with it. I've got several bonus points available to whoever can explain how to call a method in a super class when it's overridden in the subclass, without having to hard-code in the name of the parent class. With the new VTABLE and a new op, this becomes trivial.
So that's my idea for method lookups. I've sent a mail to the list with the idea, and I'm going to raise the idea at #ps if I can make it to the meeting. I think it has a lot of merit, enables a few cool new abilities and doesn't take away any existing functionality. I would like to hear any other ideas, but I'm becoming convinced that this one is a winner.
28 Feb 2010 4:50pm GMT
25 Feb 2010
Planet Parrot
chromatic: Perl 6 Design Minutes for 17 February 2010
The Perl 6 design team met by phone on 17 February 2010. Larry, Allison, Patrick, and chromatic attended.
Larry:
- much work clarifying relationship of parcels to everything else (
<a b>, assignment, arguments, captures, parameters, signatures,gather/take, and loop returns) - we now list all scope declarators in one spot
- conjectured some ideas on how to handle the allomorphism of literals more dwimmily
- had already specced some of this behavior for literals found inside
qwangles. - literals that exceed a
Rat64's denominator automatically keep the string form around for coercion to other types - clarified that anon declarator allows a name but simply doesn't install it in the symbol table
- respecced the trig functions to use a pragma to imported fast curried functions
- still uses enum second argument for the general case (rakudo is still stuck on slow strings there)
- on iterators, renamed
.getobjto.getargsince arguments are the typical positional/slicey usage - signatures are never bound against parcels anymore, only against captures
- we now use "argument" as a technical term meaning either a real parcel or an object that can be used independent of context as an argument
- anything that would stay discrete when bound to a positional, basically
return,take, and loop return objects are also arguments in that sense- they all return either a parcel or anything that can stand on its own as an argument
- STD now adds a shortname alias on adverbialized names, ignores collisions on the shortname for now, which is okay for multis
- STD now complains about longname (adverbialized) collisions
- STD no longer carps about duplicate anonymous routine declarations
- made the undeclared type message the same for parameters as for other declarations
- clarify the error message about anonymous variables
- no longer report a
$)variable error where)is the$*GOAL - add
WHATetc. to list of functions that require an argument
Allison:
- working on two HLL implementations
- one is Pynie, the other is Camle
- nothing to do with Caml or ML
- I've noticed huge improvements in NQP-rx from the previous NQP
- can't say which feature improvements make the most difference, but I'll migrate Pynie pretty soon to take advantage of the new version
- continuing to shepherd Debian and Ubuntu packages
Patrick:
- essentially all I did was unify things
- previously it had been two or three tools
- it's just one
Allison:
- even the syntax seems more regular
Patrick:
- there are more pieces available in NQP-rx
- Rakudo's -ng is now master
- the old master is now -alpha
- we took a big hit on spectests, but they seem to be coming back quickly
- 5000 tests pass on trunk now
- we have 16k or 17k we haven't re-enabled; they make the spectest slower
- Jonathan thinks we may pass 25,000 tests now
- that's great, considering where we were a week ago
- I redid Rakudo's container, value, and assignment module
- previously variables held values directly
- now they contain reference PMCs
- that cleaned up many things
- we use more PMCs, but now we don't clone and copy as much
- we move references around more
- seems closer to how Perl 6 handles things
- was much easier than I expected
- updated the NQP-rx regex engine and built in constant types
- handles Unicode character names
- reclaims plenty of tests
- answered lots of questions for people adding things into Rakudo
- prioritizing other people writing code over writing code
- increases our developer pool; seems to be working well
- new release of Rakudo planned for tomorrow
- don't know how many tests we'll pass, but it should go well
- plan to put in a few things like
sortand grammars over the next week - then I'll review the RT queue to find bugs and (hopefully) closeable bugs
c:
- working on GC tuning
- also working on String PMC tuning
- working on built-in types and their behavior as classes and parent classes
- the multidispatch bugs in particular I hope to solve
25 Feb 2010 12:27am GMT
24 Feb 2010
Planet Parrot
Andrew Whitworth: PDD23 Exceptions Critique
Following my post a few days ago, I would like to take a more in-depth look at PDD23, which lays the specification for the exceptions subsystem. I hadn't intended to go through line-by-line, but in a lot of places I have to.
[Update: I wrote this post at the same time as I wrote the last one on the topic, but I delayed in posting this one until now. In the interim time, Austin created a page on the wiki to plan out a major refactor of the system and Tene started a branch to do some work. I'll post updates on both those things as they happen.]
exceptions are indications by running code that something unusual -- an "exception" to the normal processing -- has occurred. When code detects an exceptional condition, it throws an exception object. Before this occurs, code can register exception handlers, which are functions (or closures) which may (but are not obligated to) handle the exception. Some exceptions permit continued execution immediately after the throw; some don't.
Exceptions transfer control to a piece of code outside the normal flow of control. They are mainly used for error reporting or cleanup tasks.
This is, essentially, the preamble to the rest of the document and already shows some disconnect with reality. High level languages are already using exceptions to handle normal control flow in some cases. In this case they are less "exception" and more "expection". I could go on and talk about how bad an idea it is to use exceptions for normal control flow for a variety of reasons, but I won't. I know that Parrot's control flow model still isn't mature enough to tackle all the cases that HLLs have been digging up, so exceptions are the only available mechanism to implement some structures. Also, that I am aware of, no exception prevents resuming after the point of the throw. I believe that determination is left up to the handler.
When an exception is thrown, Parrot walks up the stack of active exception handlers, invoking each one in turn, but still in the dynamic context of the exception (i.e. the call stack is not unwound first).
I need to carefully read through some of the code again, but I'm pretty certain that this is patently false. ExceptionHandlers are implemented as Continuations which do rewind the call stack and are executed in the dynamic context of the function that contains the handler. Again, I need to look at all the code and the semantics in greater detail, but at the very least this is highly suspect.
Exception handlers can resume execution after handling the exception by
invoking the continuation stored in the 'resume' slot of the exception object. That continuation must be invoked with no parameters; in other words, throw never returns a value.
Not a problem here so much as a little nit. Why can't exception resumes return values? If you think about common exception uses in some popular programming languages this is never used. But when you consider that exceptions in Parrot are currently used, as I mention above, to implement complex control flow, you start to see that there is maybe some utility to it. Slightly more to the point, what if the resume object wasn't just a continuation pointing to the opcode after the throw instruction, but was instead a Sub object representing a lexically-scoped finally{} block that needed to be invoked? I can come up with a few ideas of places where the functionality to pass parameters to the resume continuation might be nice to have. It's interesting to consider that maybe we resume to a multi-sub, which dispatches to a post-handler routine based on signature? I have several ideas like this, and while they are all a little bit off the radar of current programming languages they are by no means unthinkable or undesirable in the long run. If it's possible to provide this, and Parrot's internal mechanics should certainly make it so, I don't see why we would artificially limit it.
The die opcode throws an exception of type exception;death and severity
except_error with a payload of message. The exception payload is a string PMC containing message.
I have been accused of being anti-Perl, and I maintain that I am not. Maybe I'll devote another blog post to the topic later. But I don't think Parrot needs a "die" op that does what it does here. I can understand and appreciate that Perl is very motivated by linguistic factors, and that Parrot has been traditionally very influenced by Perl. But Parrot's opcodes represent an assembly language, and using these kinds of linguistic features seems a little bit out of place. Why have "die", when we have "exit"?
The routines to search the op library are not linear. I think it uses a skip list, but I haven't studied the implementation enough to be able to say so definitively. What I do know is that the time it takes to search the oplist for a valid op name is proportional in some measure to the number of op "short" names. I think it's O(log n). As an example, die_s, die_p, and die_i_i all have the short name "die". IMCC, during lexical analysis, looks to determine whether an opcode exists in the library using it's short name. Later in the process, IMCC hunts down the exact long name of the op, which again uses the same algorithm (skip list?) but looks at long names instead of short names. I'll spare more details on this point, but the lesson is clear: Having fewer ops is better for IMCC's code generation performance. Having fewer short names (even if the number of ops remains the same) improves parsing performance in IMCC. For a PIR-based benchmark, we would see some improvement (though admittedly it would be very small) if we did nothing besides rename all "die" opcodes to "exit" instead.
When I see the word "die", It seems to me like it should do what it says: Kill the program. Do not pass go. Do not collect 200 dollars. Die. I can't imagine having any other preconception about it. Why would the "die" opcode not make the program...die? At least, why not without an explicitly-defined mechanism to prevent it, such as how Perl5 uses eval()? So you can imagine my surprise that die seems to throw just another exception that can be caught. You can imagine how perplexed I tried calling
die 'Program is closing'
or
exit 0
didn't exit my program! Instead, I had to use
die 5, 0
to tell the system that yes, I actually wanted the program to shut down. Of course, now I can't supply a helpful message about why we need to die. It's also surprising to me that, for some reason, the exit opcode seems to have the same general behavior. It doesn't actually exit if you have a handler active, and doesn't have an overload that let's you manually specify a severity that forces an exit. So that seems pointless to me. Again, what else could the word "exit" mean besides "get out of my damn program"?
All exceptions will have at least message, severity, resume, and payload attributes.
There are three forms of the die opcode: die_s, die_p, and die_i_i. The first two basically throw a normal, catchable exception with the given argument treated as the string message to display to the user. The third form throws a normal, catchable exception with a user-definable severity and error code. The exit opcode has form exit_i, which throws a normal, catchable exception with only the given error code. The throw opcode has flavors throw_p, and throw_p_p, which let you throw a given pre-constructed exception, optionally with a given resume continuation. This all seems like a hugely redundant waste of opcodes which all essentially do the same thing but each of which only lets you specify a subset of the parameters that every exception object is supposed to provide. None of the opcodes allow you to specify a payload, even though the spec suggests (as I will discuss below) the payload should be used for type filtering by HLLs, and the current implementation prevents proper type subclassing!
"die" lets you specify a message or a severity and error code, but doesn't actually make the program die. "exit" lets you specify an error code only, and doesn't necessarily make the program exit. "throw" lets you specify a pre-built exception and optionally a custom resume continuation only. Considering that every exception must have a message, severity, resume, and payload, this assortment of opcodes really doesn't make any sense at all.
I won't harp on opcodes any further in this post, but I think I've made my point: The ops we do have are a stupid mish-mash of the kinds of ops we need to work with exceptions. If every Exception must have resume, severity, type, and payload, why do our ops not support that? Why do we have die, when we have throw, rethrow, and exit? I highly suggest we slim down these opcodes. I think an exit_i opcode is fine, if it forces an exit in lieu of a specifically-defined exit-handler. That is, most handlers would not handle exit events by default, allowing the exit op to do what we expect. To catch and handle these types, which would be necessary in some places involving embedding or nesting, we could specifically define an exit-handler type that is capable of catching them.
I think a throw_p opcode is all we really need to throw other types of exceptions. Maybe, if we were worried about writing out all the PIR for constructing elaborate Exception objects, we could have a throw_p_s_i_i, which would set all four required attributes at once, and throw it.
Anyway, that's enough on this particular subtopic. But, in tangent, I would like to suggest again that we try to find a good way to specify aggregate literals in PIR code. In this way we could specify exception constants (or proto-exception initializer objects) to reduce the runtime cost of constructing exceptions where things like the severity, type, and message are the same. The ability to specify ExceptionHandler constants in the code likewise would create a huge performance savings, especially when you consider that in a normally-operating program more ExceptionHandlers are created and registered than Exceptions.
count_eh Return the quantity of currently active exception handlers.
I'm not certain that we need an opcode for this, especially since I think it's used pretty infrequently. A method call on the current context object could provide the same info. A series of methods would allow fine-grained manipulation of the handler stack, which would be even better.
If no handler is found, and the exception is non-fatal (such as a warning), and there is a continuation in the exception record (because the throwing opcode was throw), invoke the continuation (resume execution). Whether to resume or die when an exception isn't handled is determined by the severity of the exception.
I'm not sure if the implementation follows the letter of the spec in regards to the "exception record". As far as I am aware, an unhandled exception doesn't automatically cause the program to resume normal control flow no matter what type it is. I need to check on this, but I have never witnessed this behavior. If it does exist, I apologize for not knowing about it, of course.
typedef enum {
EXCEPT_normal = 0,
EXCEPT_warning = 1,
EXCEPT_error = 2,
EXCEPT_severe = 3,
EXCEPT_fatal = 4,
EXCEPT_doomed = 5,
EXCEPT_exit = 6
} exception_severity;
As Austin mentioned, there are way too many of these. Also, as I've found out experimentally, only EXCEPT_doomed actually causes Parrot to exit despite other severities having harmful-sounding names like "fatal", and "exit". In my mind we need only four severities, at most: Trivial, Normal, Fatal and Control. Anything else is superfluous, not just in theory but also in the code as it currently exists. Trivial exceptions can automatically resume if unhandled. Normal exceptions are ones that represent an error. They can be handled by any default handler, but cause a program exit when unhandled. Fatal exceptions mark an error that is typically unrecoverable unless a special exit handler has been specifically configured to catch such events. Control exceptions bypass the error-reporting system and are used to implement non-error control flow. I'm hard-pressed to come up with any other designations we would ever need for this mechanism.
typedef enum {
EXCEPTION_BAD_BUFFER_SIZE,
EXCEPTION_MISSING_ENCODING_NAME,
EXCEPTION_INVALID_STRING_REPRESENTATION,
EXCEPTION_ICU_ERROR,
EXCEPTION_UNIMPLEMENTED,
EXCEPTION_NULL_REG_ACCESS,
EXCEPTION_NO_REG_FRAMES,
EXCEPTION_SUBSTR_OUT_OF_STRING,
EXCEPTION_ORD_OUT_OF_STRING,
...
} exception_type_enum;
There are a huge number of exception types, and they really seem superfluous when you consider that every exception must contain a message field with a human-readable message that describes it and a payload field that can contain any arbitrary object with additional data. I know that the intention with this huge list is to implement exception types without using subclasses. The reason for this is that subclasses can be largely expensive because each subclass needs to have it's own VTABLE and other information which can become prohibitive if we want to have more than a few types. I've recently put forward an idea for allowing extremely inexpensive subclasses which was inspired by exactly this problem. My idea was not without it's caveats, of course, but it's not the only possible route to take to make the subclassing operation less expensive. That said...
The payload more specifically identifies the detailed cause/nature of
the exception. Each exception class will have its own specific payload type(s). See the table of standard exception classes for examples.
So every Exception has a payload, which can be a user-defined object type with information about the exception type, and it needs to have one of these dozens of enum values that indicates it's type? This is all highly redundant, and there are at least two paths we could follow to make this system sane:
- Only have one type of Exception PMC with no subclasses. Get rid of the type enums. The Exception "type" can be determined from the user-specified payload, if any. Add opcodes or methods that better facilitate throwing an exception with a custom payload. We're likely going to need to define several "Payload" PMC types to handle those exceptions thrown by core. This would require implementing cheap subclasses, but has the benefit that built-in types can be overridden by HLL types if needed.
- Have many subclasses of Exception. Get rid of type enums. We only need a throw_p opcode and can construct "new ['ICUError']" objects or whatever we need. This is going to require implementation of cheap subclasses, and will allow HLL type overrides if needed.
Either way, a major improvement over what we have now.
Exceptions have been incorporated into built-in opcodes in a limited way. For the most part, they're used when the return value is either impractical to
check (perhaps because we don't want to add that many error checks in line), or where the output type is unable to represent an error state (e.g. the output I register of the ord opcode).
Color me stupid, but isn't consistency of interface a good thing? How do we know, without having to memorize the behavior of all 1302 ops, which throw exceptions to signal errors and which do not?
Other opcodes respond to an errorson setting to decide whether to throw an exception or return an error value.
I think this should be the default behavior. All ops should throw exceptions on error if "ops throw exceptions" is turned on. Otherwise, no ops do. This setting is cheap enough to toggle.
{{ TODO: "errorson" as specified is dynamically rather than lexically
scoped; is this good? Probably not good. Let's revisit it when we get the basic exceptions functionality implemented. }}
Good point! Maybe an opcode for this isn't a great idea. Methods on the ParrotInterpreter object (to set global settings) and methods on the CallContext PMC (to set local settings) would be a good alternative. When is the basic implementation expected?
{{ NOTE: There are a couple of different factors here. One is the ability to globally define the severity of certain exceptions or categories of exceptions without needing to define a handler for each one. (e.g. Perl 6 may have pragmas to set how severe type-checking errors are. A simple "incompatible type" error may be fatal under one pragma, a resumable warning under another pragma, and completely silent under a third pragma.) Another is the ability to "defang" opcodes so they return error codes instead of throwing exceptions. We might provide a very simple interface to catch an exception and capture its payload without the full complexity of manually defining exception handlers (though it would still be implemented as an exception handler internally)
Another warning in the same vein as the previous note. The point here is that we may want to say that some opcodes throw exceptions, but that we may want those exceptions to have different effects under different "pragmas". This kind of system can be hugely expensive if every error-capable opcode needs to check not only whether to return an error code or throw an exception, but also what the severity of that exception is depending on a series of pragmata that, most likely, would need to be lexically-scoped anyway. Way too complicated. Far better is to enable cheap subclasses of Exception, and have the HLL hot-swap type-maps at runtime with different behaviors such as different severities. Or better yet, forget hot-swapping and instead introspect on the Exception subclasses' Class object to change the default severity values and behaviors. That way when the new Exception object is created, the initialization routine sets a different default severity, the op throws it no matter what, and the exceptions system handles things like it is supposed to do.
So that's my in-depth critique of the Exceptions PDD. I may make it a regular feature to go through other PDDs as well, and I'm sure I'll post other ideas, proposals, and insights for this system in the future as well.
24 Feb 2010 8:08am GMT
23 Feb 2010
Planet Parrot
Andrew Whitworth: Cheap Subclasses
I had an idea the other night when reading over PDD23. That PDD talks about the intention to have an entire hierarchy of exception types, but then mentions a caveat that having too many types is expensive. That got me to thinking, does it really have to be so expensive to make subtypes?
In Parrot when we create a subtype we first create a new VTABLE struct. This struct contains function pointers to all the VTABLE interface functions, plus a small amount of metadata about the class. The VTABLE structure contains a string that is the class name, and a pointer to the Class or PMCProxy PMC that defines the type. There are several function pointers in the VTABLE structure. On a very quick count tonight it looks like there are about 184 of them, and before the vtable_massacre branch merged there were significantly more. Plus other fields, there are over 200 pointers (or fields with equivalent size) in that structure. It's a huge amount of memory to hold for every type, especially if HLLs are expecting to be able to create large amounts of their own types.
Now, consider a case like what is described in PDD23, where we have several exception subtypes which appear to differ from each other only by name. It's a huge waste to give each of these subtypes it's own 184-pointer VTABLE structure, when they are all going to be mostly identical. It's absurd to do it that way, and this is probably a big reason why we don't support the subtypes as described in PDD23.
Consider now the case of user-defined classes and subclasses. This is, I suspect, the largest set of types for most applications. Every PIR-defined object type is an Object PMC, which means the VTABLE structure in C for every user-defined type is 99% identical to the VTABLE structure of Object. All the function pointers, all 184 of them, are identical. The associated NameSpace PMC (after chromatic's refactor the Class PMC instead) contains a list of all the :vtable and :method Sub PMCs. The VTABLEs in Object all search the NameSpace for an override and then launch that override if provided. So for types defined in PIR, we don't need the whole VTABLE struct: just the pointer to the Class PMC that contains the info. We can point the VTABLE pointer to Object's VTABLE and use it without needing an expensive copy.
Instead of creating a Class PMC and a VTABLE structure with over 200 pointers, we only define the Class and the handful of defined overrides that we already define anyway. This is significant memory savings for applications that define many types.
There are two options to implement this kind of idea:
- Add a PMC* pointer to every PMC that points to the Class or PMCProxy object that controls it. This could create a mess in GC if Class and PMCProxies weren't marked constant.
- Define a new "PMCType" structure. PMCType would contain pointers like a string name, a Class PMC pointer, and maybe a VTABLE pointer. If we add this structure, PMCs get larger by one pointer. If we replace the VTABLE struct and include a pointer to a VTABLE in the PMCType, we have to suffer an additional pointer dereference per VTABLE call (with opportunities to cache).
So this system is not without it's tradeoffs, but with this in place we gain the ability to define large numbers of cheap subclasses of built-in types like what is specified in PDD23, but we also significantly simplify the process of creating new classes in PIR and reduce the amount of memory required for each type.
23 Feb 2010 8:00pm GMT
Andrew Whitworth: Haskell with LLVM
[Update 23 Feb 2010: I've been informed that this was not a JIT, but instead a native-code generation backend for LLVM demonstrating LLVM's aggressive optimization potential. These numbers are not representative of JIT performance.]
Several people sent me a link to a very interesting blog post yesterday about using LLVM to provide native code generation for Haskell in GHC. I recommend it as an interesting read.
One thing I will point out is that the blog post doesn't really explain the whole situation. He shows plenty of examples where LLVM improved performance, but only mentions briefly that this isn't typical of larger programs and that most programs won't experience as much speed up, if any. So to anybody who reads this remember the caveat that the results aren't typical. JIT speeds some things up and slows some things down. It's not a magic bullet that makes everything better.
23 Feb 2010 8:41am GMT
Andrew Whitworth: Parrot's Exceptions System
I've been vaguely unhappy with the exceptions system for a while now. Everybody knows that the implementation really hasn't caught up with the spec, and until now I've been pretty happy to write off all my problems as being an artifact of an incomplete implementation. Plus, I've seen some of the great work that some of our developers have done fixing various bugs and implementing various changes, and I'm always willing to let problems slide under the rug if I know good minds are working on them. Today, however, I was talking to Austin and he expressed some criticisms on IRC that really do a great job of expressing the thoughts I (and others) have had, and show that maybe it's the spec that's the problem, not the implementation:
I was going to embark on a rant about this, but then I read the PDD, and i realized the entire exception subsystem is a farce.
That which is documented is inadequate and poorly thought out. And that which is implemented doesn't do even remotely what is documented.
The pdd makes the assumption that exception filtering will be done based on 'type', but provides no mechanism for extending the 'types'. The logical (and widely popular) alternative is to filter based on subclass. The pdd's answer to that is that you can throw anything, if you just stuff it in the payload. So naturally, the parameters to the exception handler objects are the...
...exception and it's *message*.
The throw/rethrow ops differ in that rethrow marks the exception unhandled. IMO, rethrow should be transparent - particularly, the exception backtrace should still point at the original location where the exception occured. The pdd makes nothing of this, and naturally parrot gets it wrong.
There are too many categories of severity, too many attributes (backtrace versus resume versus thrower; severity versus exit code versus type versus class).
So there you have it, a pretty succinct criticism of Parrot's exception system. I'll be elaborating on some of these ideas in the next few days.
23 Feb 2010 8:00am GMT
22 Feb 2010
Planet Parrot
Andrew Whitworth: ParrotProjects: February 2010 Edition
I haven't posted a ParrotProjects update in a while, but that doesn't mean development of new projects has slowed down at all. Quite the contrary, there are plenty of new projects popping up left and right.
Fun
I can't speak towards how enjoyable it might be, but Fun is an implementation of the Joy language on Parrot. It's still early in development, but it is exciting to have more functional languages targetting Parrot like this.
Digest-Dynpmcs
The Parrot repo currently contains a few dynamic PMCs (dynpmcs) for calculating digests such as MD1, MD4, MD5, and various SHA sums. It has been decided that these kinds of things should find a new home, so our own enterprising developer darbelo has forked them to a new home on Gitorious. Copies of these PMCs are still in the repo pending a deprecation cycle, but after the 2.3 release they will only live on Gitorious.
ParrotSDL
If you need to write any SDL applications, you might be excited to hear that bindings for the multimedia library for Parrot are in active development. ParrotSDL is still a new project and is navigating through some difficulties with Parrot's NCI system. The lead developer, Parrot newcomer kthakore, is also working on SDL bindings for Perl5, so he's very familiar with the whole system.
kthakore is looking for PIR coders to help with the project. Chat about ParrotSDL and the Perl port happens at irc://irc.perl.org/#sdl
NQ-NQP
You might think of NQP as being a language that only runs on Parrot, but you'd be wrong. Developer ash_ has been working on a variant of NQP that runs on top of LLVM instead. This compiler, which is not quite NQP, is a very interesting project and may help to inform our future use of LLVM for Parrot's JIT system.
22 Feb 2010 12:14pm GMT
21 Feb 2010
Planet Parrot
Andrew Whitworth: Opcode and OpLib PMCs
A few days ago, after some discussion with NotFound and others on #parrot, I started a small branch to experiment with some new PMC types. The results of that work were the two new experimental PMC types Opcode and OpLib. The branch merged into trunk shortly after the 2.1.0 release, so now they are available--experimentally--for people to test and use.
OpLib provides an introspective accessor layer over the interpreter's op table. The OpLib allows us to get a current count of the number of opcodes currently loaded in the system. It can also be used to return the index number of an opcode specified by name, or the name of an opcode given by it's number. On one hand it's important to hide these kinds of details from the average PIR user for reasons of backwards-compatibility and encapsulation. However, for the people writing PIR assemblers and disassemblers in PIR, the information is vital.
These PMC types are read-only types. You can use them to read information about the opcodes in the system, but you can't manipulate that information. However, I'm not against that capability entirely. Imagine the ability to remap an op number to a new custom opcode at runtime. This would allow us to write tools that can attach to live PIR code such as memory usage analyzers, profilers, watchdog monitors, etc. Of course, in most cases this capability would horribly crash the program if used incorrectly, but in the right hands it has much potential. This, if it happens at all, is a long way off.
These two PMC types are still immature but they, along with the ever-improving Packfile PMCs, are already starting to enable some cool new applications. We don't quite expose all the information yet that we need to do complete compilation or decompilation, and some improvements are needed in Parrot itself to fill in some of the remaining gaps, but we are getting closer.
Before the 3.0 release I think we will have a PIR/PASM compiler that runs on top of Parrot natively. This could be written in PIR, of course, or one of the other cool developing languages such as NQP, Winxed, or something else. With this, we could cut IMCC out of the loop almost entirely if we wanted. We could also easily come up with new assembly languages or language dialects for interacting with Parrot. My dislike for PIR is not a secret, so the ability to come up with another, better, assembly language for working with Parrot is an idea that makes me very happy.
21 Feb 2010 2:05pm GMT
20 Feb 2010
Planet Parrot
Jonathan Worthington: Unpacking data structures with signatures
My signature improvements Hague Grant is pretty much wrapped up. I wrote a couple of posts already about the new signature binder and also about signature introspection. In this post I want to talk about some of the other cool stuff I've been working on as part of it.
First, a little background. When you make a call in Perl 6, the arguments are packaged up into a data structure called a capture. A capture contains an arrayish part (for positional parameters) and a hashish part (for smok^Wnamed parameters). The thing you're calling has a signature, which essentially describes where we want the data from a capture to end up. The signature binder is the chunk of code that takes a capture and a signature as inputs, and maps things in the capture to - most of the time, anyway - variables in the lexpad, according to the names given in the signature.
Where things get interesting is that if you take a parameter and coerce it to a Capture, then you can bind that too against a signature. And it so turns out that Perl 6 allows you to write a signature within another signature just for this very purpose. Let's take a look.
multi quicksort([$pivot, *@values]) {
my @before = @values.grep({ $^n < $pivot });
my @after = @values.grep({ $^n >= $pivot });
(quicksort(@before), $pivot, quicksort(@after))
}
multi quicksort( [] ) { () }
Here, instead of writing an array in the signature, we use [...] to specify we want a sub-signature. The binder takes the incoming array and coerces it into a Capture, which essentially flattens it out. We then bind the sub-signature against it, which puts the first item in the incoming array into $pivot and the rest into @values. We then just partition the values and recurse.
The second multi candidate has a nested empty signature, which binds only if the capture is empty. Thus when we have an empty list, we end up there, since the first candidate requires at least one item to bind to $pivot. Multi-dispatch is smart enough to know about sub-signatures and treat them like constraints, which means that you can now use multi-dispatch to distinguish between the deeper structure of your incoming parameters. So, to try it out...
my @unsorted = 1, 9, 28, 3, -9, 10;
my @sorted = quicksort(@unsorted);
say @sorted.perl; # [-9, 1, 3, 9, 10, 28]
It's not just for lists either. An incoming hash can be unpacked as if it had named parameters; for that write the nested signature in (...) rather than [...] (we could have use (...) above too, but [...] implies we expect to be passed a Positional). For any other object, we coerce to a capture by looking at all of the public attributes (things declared has $.foo) up the class hierarchy and making those available as named parameters. Here's an example.
class TreeNode { has $.left; has $.right; }
sub unpack(TreeNode $node (:$left, :$right)) {
say "Node has L: $left, R: $right";
}
unpack(TreeNode.new(left => 42, right => 99));
This outputs:
Node has L: 42, R: 99
You can probably imagine that a multi and some constraints on the branches gives you some interesting possibilities in writing tree transversals. Also fun is that you can also unpack return values. When you write things like:
my ($a, $b) = foo();
Then you get list assignment. No surprises there. What maybe will surprise you a bit is that Perl 6 actually parses a signature after the my, not just a list of variables. There's a few reasons for that, not least that you can put different type constraints on the variables too. I've referred to signature binding a lot, and it turns out that if instead of writing the assignment operator you write the binding operator, you get signature binding semantics. Which means...you can do unpacks on return values too. So assuming the same TreeNode class:
sub foo() {
return TreeNode.new(left => 'lol', right => 'rofl');
}
my ($node (:$left, :$right)) := foo();
say "Node has L: $left, R: $right";
This, as you might have guessed, outputs:
Node has L: lol, R: rofl
Note that if you didn't need the $node, you could just omit it (put keep the things that follow nested in another level of parentheses). This works with some built-in classes too, by the way.
It works for some built-in types with accessors too:
sub frac() { return 2/3; }
my ((:$numerator, :$denominator)) := frac();
say "$numerator, $denominator";
Have fun, be creative, submit bugs. :-)
20 Feb 2010 12:21am GMT