19 Aug 2014


Parrot 6.7.0 "Grey-headed Lovebird" Released

On behalf of the Parrot team, I'm proud to announce Parrot 6.7.0, also known as "Grey-headed Lovebird". Parrot is a virtual machine aimed at running all dynamic languages.

Parrot 6.7.0 is available on Parrot's FTP site, or by following the download instructions. For those who want to hack on Parrot or languages that run on top of Parrot, we recommend our organization page on GitHub, or you can go directly to the official Parrot Git repo on Github

Parrot 6.7.0 News:

- Core
+ find_codepoint: added more name aliases for perl6 (LF,FF,CR and NEL)
+ Optimize internal pcc CallContext calls from VTABLE to direct calls
and inline CallContext ATTR accessors to omit the obj check. [GH #1083]
- Documentation
+ Updated documentation for StringHandle.readall and FileHandle.readall, which
reads just the rest of the buffer if tell > 0. [GH #1084]
- Tests
+ Improve test plan for t/library/pg.t
- Community
+ Our GSoC student Chirag Agrawal successfully passed the final evaluation.
All three project parts have been merged already.

read more

19 Aug 2014 1:10pm GMT

18 Aug 2014


GSoC 2014 - Final Report

Hey everyone!

I am extremely happy to announce that I have successfully completed my GSoC project!

I would like to take this opportunity to thank the community for this wonderful learning experience. It has really helped me to add a new dimension to my knowledge while at the same time introducing me to the world of open-source projects.

Most importantly, I would like to thank my mentor Reini Urban (rurban) for the tremendous help he offered me, ever since my first contact with the community. As a matter of fact, without his ideas and invaluable guidance, completing the project would have been an impossible task. I would also like to thank my co-mentor Bruce Gray (Util) for guiding us throughout the project and providing us with valuable inputs to overcome the many hurdles we faced.

In this blog, I will attempt to summarize the work I have completed through my project.

My project's primary objective was to improve the performance of the method signatures. My project was divided into three tasks:-

read more

18 Aug 2014 5:18pm GMT

13 Aug 2014


GSoC 2014 - Report 12

Hey everyone!

This week I ran the bench test on the three tasks to verify the performance gain. But due to my slow machine, I could only generate varied and unreliable data.

However, rurban confirms that all the tests are passing and the branch pcc-gh1083 is ready to be merged for the Parrot release scheduled on 19 August. Also, he soon plans to test it with a 32 bit machine as well.

read more

13 Aug 2014 3:14pm GMT

06 Aug 2014


GSoC 2014 - Report 11

Hey everyone!

I will be sharing what I have completed this week. Last week, I had mentioned that I would be fixing further problems in my branch to optimize pmc2c (https://www.parrot.org/zyroz10).
I am happy to share that I have successfully completed this work and believe that the issue GH #1083 (https://github.com/parrot/parrot/issues/1083) is now resolved.

To elaborate on the work done --- a couple of weeks back, I started by omitting the obj check in the CallContext ATTR accessors in include/pmc/pmc_callcontext.h.

read more

06 Aug 2014 4:11pm GMT

30 Jul 2014


GSoC 2014 - Report 10

Hey everyone!

This week I completed the part of my task that required me to export the internally needed CallContext vtable methods (https://github.com/parrot/parrot/issues/1083). These exported methods are now included in src/call/*.c to improve the performance, since these methods are now being directly called.

In the previous week, I had finished fixing the headers to omit the obj check for CallContext ATTR accessors. But I had made these changes manually by fixing include/pmc/pmc_callcontext.h.

read more

30 Jul 2014 2:49pm GMT

23 Jul 2014


GSoC 2014 - Report 9

Hey everyone!

Through this post, I will be sharing my progress towards the task - https://github.com/parrot/parrot/issues/1083.

I have completed the part of the task that required me to inline the CallContext ATTR accessors to omit the obj check. For now, I have manually edited include/pmc/pmc_callcontext.h to fix the macro definitions.
However, after completing the remaining task and ascertaining an improvement in speed performance, I plan to improve pmc2c to automatically generate this fix.

read more

23 Jul 2014 4:13pm GMT

16 Jul 2014


GSoC 2014 - Report 8

Hey everyone!

Let me share this week's progress.

Earlier this week, I finished my tests with Parrot for its releases 2.7 - 3.0. Including rurban's profiling, we now have data (reliable to some extent) for the commits in this range. The main objective for the profiling was to determine the highest overhead to be targeted next for refactor.

(For the list of all the identified overheads, please take a look at - http://wiki.enlightenedperl.org/gsoc2014/ideas/improve_performance_of_me...)

read more

16 Jul 2014 4:16pm GMT

09 Jul 2014


GSoC 2014 - Report 7

Hey everyone!

My this week's work involves testing. With this work, we are trying to figure out the commits that slowed down Parrot during its releases 2.7 - 3.0. To do this, I am running the bench.sh tool provided in parrot-bench. rurban is helping me out with these tests to save time (since I have got a slow machine) and also to cross-check results.

read more

09 Jul 2014 3:39pm GMT

02 Jul 2014


GSoC 2014 - Report 6

Hey everyone!

I am happy to announce that my task #2 (https://github.com/parrot/parrot/issues/1080) is now complete and the issue has been closed.

To give a gist of what has been done -

The goal was to optimize the pmc2c compiler, more specifically the PCCMETHODs, by avoiding the run-time overhead of having to call two costly C functions per method call. These C functions were:-

Parrot_pcc_fill_params_from_c_args(interp, _call_object, sig, &_self, args...);
Parrot_pcc_set_call_from_c_args(interp, _call_object, rettype, result);

read more

02 Jul 2014 2:28pm GMT

25 Jun 2014


GSoC 2014 - Report 5

Hey everyone!

I will catch you up on my work this week. As I had mentioned in my last post (http://www.parrot.org/zyroz4), I have already started working on a new task (https://github.com/parrot/parrot/issues/1080).

Since, this work requires me to make changes to the Pmc2c compiler, I am required to code in Perl. I am however, new to Perl and thus, spent quite some time getting used to some basic coding in Perl this week.

read more

25 Jun 2014 3:50pm GMT

16 Feb 2013


It's Been Quiet

It's been quiet. Too quiet.

Interest in Parrot has waned over the past 18 months. The most recent flurry of activity happened when Allison Randal brought up the fact that The Parrot Foundation was in shambles and suggested shutting it down. This naturally brought up the state of Parrot itself and what the future holds for it, if anything. The situation is perhaps less than ideal. The short answer is that Parrot's immediate prospects are iffy at best, but there is at least one niche where Parrot still has a chance to shine.

The surface problem with Parrot is that there's a lack of people who can find the tuits to hack on it these days. Different people have their own analyses as to why this is happening. My best answer is that Parrot doesn't have a compelling value proposition. Hosting every dynamic language was pretty revolutionary around the time Parrot was started more than a decade ago. Today that's no longer the case and the bigger language runtimes like the JVM, CLR and JavaScript (not a VM but a very poplar compilation target) can run circles around Parrot on most of the axes that matter.

Those of us who care about Parrot need to find a way to make it matter and to do so quickly.

Rakudo is the current most complete and active language implementation that runs on Parrot, and even *it* is moving toward running on many backends. Parrot's best bet is to focus exclusively on supporting Rakudo and give it a reason to stick around. If supporting all dynamic languages was ever a good idea for Parrot, that's no longer the case. The reality of Parrot's effective niche has become much harder to ignore. The best move is to adapt accordingly.

Parrot has been inactive (among many reasons) because its developers can see that the goal of hosting all dynamic languages isn't realistically attainable given Parrot's current resources. With a new and more tightly defined plan, Parrot has a fighting chance to find a useful niche.

Parrot's new niche and reason for existence needs to be to support Rakudo and nqp until those languages either fail, succeed, or have no further use for Parrot.

This will be a liberating shift for Parrot. The official policy is now "make nqp and Rakudo better". Within that constraint, any change is welcome. In a bit more detail, the two goals by which any potential change should be judged are:

1) Does it provide a benefit to Rakudo, especially a *measurable* *non-theoretical* benefit?

If a change makes Rakudo happy, sold! This includes requested features, optimizations, bug fixes and the like. This is *the* primary concern and the best way to provide value to nqp and Rakudo.

2) Does it make Parrot's code simpler without increasing complexity elsewhere?

Simplifying Parrot is valuable, but only in a much more indirect way. This goal is a distant second in importance to performance improvements. That said, simplifying Parrot is still helpful. Some of Parrot's problems come from the decade of accumulated cruft. A simpler Parrot is more approachable and easier to profile, maintain and debug. Simplicity should be pursued as long as that simplicity doesn't mean shuffling complexity elsewhere and *especially* if the simplification comes with a performance bump.

That's all there is to it. With simple and immediate rules rather than a slow and deliberate deprecation policy, half-done features that were kept around for years "just in case" can safely be removed.

Another implication of all this is that our deprecation and support policy are going away. They were well-intentioned but appropriate for a project in a much more mature and stable state. Our new support policy is "we'll try to fix bugs and keep nqp running". We'll continue to make monthly releases but they will not be labelled as "supported" or "developer" as in the past.

Observers of Parrot will note by now that this isn't the first time that Parrot has tried something radical. This isn't even the first time that *I've* tried something radical. What's different this time is that we're no longer trying to be all things to all languages; we're trying to be one thing to one language that's already our customer. This will still involve a ton of work, but the scope reduction shrinks the task from Herculean to merely daunting.

So here's where you, the reader come in. Whether you've hacked on Parrot in the past or came for the lulz and accidentally got interested, you can help. The big goals are to make Parrot (and by extension nqp and Rakudo) smaller and faster. Below are a few specific ways you can help. Whatever you do though, don't make any changes that will be detrimental to nqp and Rakudo, and coordinate any backwards-incompatible changes before they get merged into Parrot master.

Grab a clone of Parrot and nqp. Build and install them. Play with the sixparrot branch, where some initial work is already in progress. Already there? Great! The next steps are a little harder.

Remove code paths that nqp doesn't exercise. This can be single if statements or it can be whole sections of the source tree. Tests are the same as code; if the nqp and Rakudo's tests don't exercise them, out they go. Tests exist to increase inertia, but are only useful to the degree that they test useful features. When in doubt, either ask in #parrot or just rip it out and see what happens.

Relatedly, profile and optimize for nqp. If you like C, break out valgrind, build out a useful benchmark and see how fast you can make it run. If you find some code that doesn't seem to be doing anything, you've just found an optimization!

Learn nqp and Perl 6. There's been a lack of tribal knowledge about nqp's inner workings ever since Parrot started distancing itself from Rakudo. We need to reverse that tendency so that nqp is regarded as an extension of Parrot.

Overall, the next few months will be interesting. I don't know if they'll result in success for Parrot, but I'm willing to give it one more shot.

16 Feb 2013 7:51am GMT

21 Nov 2012


More IO Work?

I might not be too bright. Either that or I might not have a great memory, or maybe I'm just a glutton for punishment. Remember the big IO system rewrite I completed only a few weeks ago? Remember how much of a huge hassle that turned into and how burnt-out I got because of it? Apparently I don't because I'm back at it again.

Parrot hacker brrt came to me with a problem: After the io_cleanup merge he noticed that his mod_parrot project doesn't build and pass tests anymore. This was sort of expected, he was relying on lots of specialized IO functionality and I broke a lot of specialized IO functionality. Mea culpa. I had a few potential fixes in mind, so I tossed around a few ideas with brrt, put together a few small branches and think I've got the solution.

The problem, in a nutshell is this: In mod_parrot brrt was using a custom Winxed object as an IO handle. By hijacking the standard input and output handles he could convert requests on those handles into NCI calls to Apache and all would just work as expected. However with the IO system rewrite, IO API calls no longer redirect to method calls. Instead, they are dispatched to new IO VTABLE function calls which handle the logic for individual types.

First question: How do we recreate brrt's custom functionality, by allowing custom bytecode-level methods to implement core IO functionality for custom user types?

My Answer: We add a new IO VTABLE, for "User" objects, which can redirect low-level requests to PMC method calls.

Second Question: Okay, so how do we associate thisnew User IO VTABLE with custom objects? Currently the get_pointer_keyed_int VTABLE is used to get access to the handle's IO_VTABLE* structure, but bytecode-level objects cannot use get_pointer_keyed_int.

My Answer: For most IO-related PMC types, the kind of IO_VTABLE* to use is staticly associated with that type. Socket PMCs always use the Socket IO VTABLE. StringHandle PMCs always use the StringHandle IO VTABLE, etc. So, we can use a simple map to associate PMC types with specific IO VTABLEs. Any PMC type not in this map can default to the User IO VTABLE, making everything "just work".

Third Question: Hold your horses, what do you mean "most" IO-related PMC types have a static IO VTABLE? Which ones don't and how do we fix it?

My Answer: The big problem is the FileHandle PMC. Due to some legacy issues the FileHandle PMC has two modes of operation: normal File IO and Pipe IO. I guess these two ideas were conflated together long ago because internally the details are kind of similar: Both files and pipes use file descriptors at the OS level, and many of the library calls to use them are the same, so it makes sense not to duplicate a lot of code. However, there are some nonsensical issues that arise because Pipes and files are not the same: Files don't have a notion of a "process ID" or an "exit status". Pipes don't have a notion of a "file position" and cannot do methods like seek or tell. Parrot uses the "p" mode specifier to tell a FileHandle to be in Pipe mode, which causes the IO system to select a between either the File or the Pipe IO VTABLE for each call. Instead of this terrible system, I suggest we separate out this logic into two PMC types: FileHandle (which, as it's name suggests, operates on Files) and Pipe. By breaking up this one type into two, we can statically map individual IO VTABLEs to individual PMC types, and the system just works.

Fourth Question: Once we have these maps in place, how do we do IO with user-defined objects?

My Answer: The User IO VTABLE will redirect low-level IO requests into method calls on these PMCs. I'll break IO_BUFFER* pointers out into a new PMC type of their own (IOBuffer) and users will be able to access and manipulate these things from any level. We'll attach buffers to arbitrary PMCs using named properties, which means we can attach buffers to any PMC that needs them.

So that's my chain of thought on how to solve this problem. I've put together three branches to start working on this issue, but I don't want to get too involved in this code until I get some buy-in from other developers. The FileHandle/Pipe change is going to break some existing code, so I want to make sure we're cool with this idea before we make breaking changes and need to patch things like NQP and Rakudo. Here are the three branches I've started for this:

Like I said, these are all very rough drafts so far. All these three branches build, but they don't necessarily pass all tests or look very pretty. If people like what I'm doing and agree it's a good direction to go in, I'll continue work in earnest and see where it takes us.

21 Nov 2012 8:00am GMT

14 Sep 2012


September Status

First, some personal status:

Personal Status

I haven't blogged in a little while, and there's a few reasons for that. I'll list them quickly:

  1. Work has been…tedious lately and when I come home I find that I want to spend much less time looking at a computer, especially any computer that brings more stress into my life. Also,
  2. My computer at home generates a huge amount of stress. In addition to several physical problems with it, and the fact that I effectively do not have a working mouse (the built-in trackpad is extremely faulty, and the external USB mouse I had been using is now broken and the computer won't even book if it's plugged into the port), I've been having some software problems with lightdm and xserver crashing and needing to be restarted much more frequently than I think should be needed. We are planning to buy me a new one, but the budget won't allow that until closer to xmas.
  3. The io_cleanup1 work took much longer than I had anticipated. I wrote a lot more posts about that branch than I ever published, and the ones I did publish were extremely repetitive ("It's almost finished, any day now!"). Posting less means I got out of the habit of posting, which is a hard habit to be in and does require some effort.

I'm going to do what I can to post something of a general Parrot update here, and hopefully I can get back in the habit of posting a little bit more regularly again.

io_cleanup1 Status

io_cleanup1 did indeed merge with almost no problems reported at all. I'm very happy about that work, and am looking forward to pushing the IO subsystem to the next level. Before I started io_cleanup1, I had some plans in mind for new features and capabilities I wanted to add to the VM. However, I quickly realized that the house had some structural problems to deal with before I could slap a new coat of paint on the walls. The structure is, I now believe, much better. I've still got that paint in the closet and eventually I'm going to throw it on the walls.

The io_cleanup branch did take a lot of time and energy, much more than I initially expected. But, it's over now and I'm happy with the results so now I can start looking on to the next project on my list.

Threads Status

Threads is very very close to being mergable. I've said that before and I'm sure I'll have occasion to say it again. However there's one remaining problem pointed out by tadzik, and if my diagnosis is correct it's a doozie.

The basic threads system, which I outlined in a series of blog posts ages ago goes like this: We cut out the need to have (most) locks, and therefore we cut out many possibilities of deadlock, by making objects writable only from the thread that owns them. Other threads can have nearly unfettered read access, but writes require sending a message to the owner thread to perform the update in a synchronized, orderly manner. By limiting cross-thread writes, we cut out many expensive mechanisms that would need to be used for writing data, like Software Transactional Memory (STM) and locks (and, therefore, associated deadlocks). It's a system inspired closely by things like Erlang and some functional languages, although I'm not sure there's any real prior art for the specifics of it. Maybe that's because other people know it won't work right. The only thing we can do is see how it works.

The way nine implemented this system is to setup a Proxy type which intercepts and dispatches read/write requests as appropriate. When we pass a PMC from one thread to another, we instead create and pass a Proxy to it. Every read on the proxy redirects immediately to a read on the original target PMC. Every write causes a task to dispatch to the owner thread of the target PMC with update logic.

Here's some example code, adapted from the example tadzik had, which fails on the threads branch:

function main[main](var args) {
    var x = 1;
    var t = new 'Task'(function() { x++; say(x); });
    ${ schedule t };
    ${ wait t };

Running this code on the threads branch creates anything from an assertion failure to a segfault. Why?

This example creates a closure and schedules that closure as a task. The task scheduler assigns that task to the next open thread in the pool. Since it's dispatching the Task on a new thread, all the data is proxied. Instead of passing a reference to Integer PMC x, we're passing a Proxy PMC, which points to x. This part works as expected.

When we invoke a closure, we update the context to point to the "outer" context, so that lexical variables ("x", in this case) can be looked up correctly. However, instead of having an outer which is a CallContext PMC, we have a Proxy to a CallContext.

An overarching problem with CallContext is that they get used, a lot. Every single register access, and almost all opcodes access at least one register, goes through the CallContext. Lexical information is looked up through the CallContext. Backtrace information is looked up in the CallContext. A few other things are looked up there as well. In short, CallContexts are accessed quite a lot.

Because they are accessed so much, CallContexts ARE NOT dealt with through the normal VTABLE mechanism. Adding in an indirect function call for every single register access would be a huge performance burden. So, instead of doing that, we poke into the data directly and use the raw data pointers to get (and to cache) the things we need.

And there's the rub. For performance we need to be able to poke into a CallContext directly, but for threads we need to pass a Proxy instead of a CallContext. And the pointers for Proxy are not the same as the pointers for CallContext. See the problem?

I identified this issue earlier in the week and have been thinking it over for a few days. I'm not sure I've found a workable solution yet. At least, I haven't found a solution that wouldn't impose some limitations on semantics.

For instance, in the code example above, the implicit expectation is that the x variable lives on the main thread, but is updated on the second thread. And those updates should be reflected back on main after the wait opcode.

The solution I think I have is to create a new dummy CallContext that would pass requests off to the Proxied LexPad. I'm not sure about some of the individual details, but overall I think this solution should solve our biggest problem. I'll probably play with that this weekend and see if I can finally get this branch ready to merge.

Other Status

rurban has been doing some great cleanup work with native PBC, something that he's been working on (and fighting to work on) for a long time. I'd really love to see more work done in this area in the future, because there are so many more opportunities for compatibility and interoperability at the bytecode level that we aren't exploiting yet.

Things have otherwise been a little bit slow lately, but between io_cleanup1, threads and rurban's pbc work, we're still making some pretty decent progress on some pretty important areas. If we can get threads fixed and merged soon, I'll be on to the next project in the list.

14 Sep 2012 7:00am GMT

27 Aug 2012


io_cleanup1 Lands!

FINALLY! The big day has come. I've just merged whiteknight/io_cleanup1 to master. Let us rejoice!

When I started the project, months ago, I had intended to work on the branch for maybe a week or two at the most. Get in, clean what I could, get out. Wash, rinse, repeat. That's exactly why I named the branch "io_cleanup1", because I intended it to just be the first of what would be a large series of small branches. Unfortunately as I started cleaning I was lead to other things that needed to go. And those things lead elsewhere. Before I new it I had deleted just about all the code in all the files in src/io/* and started rewriting from the ground up.

Sometimes sticking with a plan and breaking up projects into small milestones is a good thing. Othertimes when you know what the final goal is and you're willing to put in the effort, it's good to just go there directly. That's what I ended up doing.

To give you an idea of what my schedule was originally, I had intended to get this first branch wrapped up and merged before GSOC started, so that I could keep my promise of implementing 6model concurrently with that program. With GSOC over last week (I'll write a post-mortem blog entry about it soon), I've clearly failed at that. I'm extremely happy with the results so far and given the choice I would not go back and do things any differently. The IO system was in terrible condition and it desperately needed this overhaul. I wish it hadn't taken me so long, but with a system that's so central and important, it was worthwhile taking the extra time to make sure things were correct.

Where to go from here? My TODO list for the near future is very short:

  1. Threads
  2. 6model
  3. More IO work

The Threads branch, the magnum opus of Parrot hacker nine is 99.9% of the way there. If we can just push it up over the cliff, we should be able to merge soon and open up a whole new world of functionality and cool features for Parrot. I'm already planning out all the cool additions to Rosella I'm going to make once threads are merged: Parallel test harness. Asynchronous network requests, an IRC client library. The addition of a real, sane threading system opens up so many avenues to us that really haven't been available before. Sure there are going to be plenty of hiccups and speedbumps to deal with as we really get down and start to use this system for real things, but the merge of the threads branch represents a huge step forward and a great foundation to build upon.

I'm going to be putting forward as much effort as I can to getting this branch wrapped up and merged. Some of the remaining problems only manifest on hard-to-test platforms, which is where things start to get tricky. As I mentioned in an email to parrot-dev a while ago, test reports on rare platforms are great, but if we can't take action on the reported failures we can get ourselves into something of a bind. The capability to find problems on those platforms and the capability to fix problems on those platforms are two very different capabilities. But, most of the time that's a small issue and we're going to just have to find a way to muscle through and get this branch merged one way or the other. If we can merge it without purposefully excluding any platforms, that would be great.

Before anybody thinks that I'm done with IO and that system is now complete, think again. There is still plenty of work to be done on the IO subsystem, and all sorts of cool new features that become possible with the new architecture and unified type semantics. I want to separate out Pipe logic from FileHandle into a new dedicated PMC type. Opening FileHandles in "p" mode for pipes is clumsy at best, and I want a more sane system. And while I'm at it, 2-way and 3-way pipes would make for a great feature addition (we can't currently do these in any reliable way).

The one thing that has changed most dramatically in the new IO system is buffers. The buffering subsystem has not only been rewritten but completely redesigned. Instead of being type-specific they are now unified and type independent. Buffers are their own struct with their own API. Instead of having a single buffer that is used for both read and write, handles now have separate read and write buffers that can be created and managed independently. I want to create a new PMC type to wrap these buffers and give the necessary management interface so they can be used effectively from the PIR level and above.

Finally, the whiteknight/io_cleanup1 branch tried to stay as backwards compatible as possible, so many breaking changes I wanted to make had to wait until later. In the future expect to see many smaller branches to remove old broken features, old crufty interfaces, and old bad semantics. We'll make these kinds of disruptive changes in much smaller batches, with more space between them.

27 Aug 2012 7:00am GMT

22 Aug 2012


Parrot 4.7.0 "Hispaniolan" Released!

On behalf of the Parrot team, I'm proud to announce Parrot 4.7.0, also known as "Hispaniolan". Parrot is a virtual machine aimed at running all dynamic languages.

Parrot 4.7.0 is available on Parrot's FTP site, or by following the download instructions at http://parrot.org/download. For those who would like to develop on Parrot, or help develop Parrot itself, we recommend using Git to retrieve the source code to get the latest and best Parrot code.

Parrot 4.7.0 News:

- Core
    + Added .all_tags() and .all_tagged_pmcs() methods to PackfileView PMC
    + Several build and coding standards fixes

The SHA256 message digests for the downloadable tarballs are:

4360ac3dffafffaa00bce561c1329df8ad134019f76930cf24e7a875a4422a90 parrot-4.7.0.tar.bz2
c0bffd371dea653b9881ab2cc9ae5a57dc9f531dfcda0a604ea693c9d2165619 parrot-4.7.0.tar.gz

Many thanks to all our contributors for making this possible, and our sponsors for supporting this project. Our next scheduled release is 18 September 2012.

The release is indeed out a day late. It's not that I forgot about it, it's just that I can't read a calendar and HOLY CRAP, IT'S WEDNESDAY ALREADY? When did that happen? So, and I can't stress this enough, Mea Culpa.

22 Aug 2012 7:00am GMT

22 Jul 2012


io_cleanup1 Done?

This morning I made a few last commits on my whiteknight/io_cleanup1 branch, and I'm cautiously optimistic that the branch is now ready to merge. The last remaining issue, which has taken the last few days to resolve, has been fixing readine semantics to match some old behavior.

A few days ago I wrote a post about how complicated readline is. At the time, I thought I had the whole issue under control. But then Moritz pointed out a problem with a particular feature unique to Socket that was missing in the new branch.

In master, you could pass in a custom delimiter sequence as a string to the .readline() method. Rakudo was using this feature like this:

str = s.readline("\r\n")

Of course, as I've pointed out in the post about readline and elsewhere, there was no consistency between the three major builtin types: FileHandle, Socket and StringHandle. The closest thing we could do with FileHandle is this:

str = f.readline();

Notice two big differences between FileHandle and Socket here: First, FileHandle has a separate record_separator method that must be called separately, and the record separator is stored as state on the FileHandle between .readline() calls. Second, FileHandle's record separator sequence may only be a single character. Internally, it's stored as an INTVAL for a single codepoint instead of as a STRING*, even though the .record_separator() method takes a STRING* argument (and extracts the first codepoint from it).

Initially in the io_cleanup1 branch I used the FileHandle semantics to unify the code because I wasn't aware that Socket didn't have the same restrictions that FileHandle did, even if the interface was a little bit different. I also didn't think that the Socket version would be so much more flexible despite the much smaller size of the code to implement it. In short, I really just didn't look at it closely enough and assumed the two were more similar than they actually were. Why would I ever assume that this subsystem ever had "consistency" as a driving design motivation?

So I rewrote readline. From scratch.

The new system follows the more flexible Socket semantics for all types. Now you can use almost any arbitrary string as the record separator for .readline() on FileHandle, StringHandle and Socket. In the whiteknight/io_cleanup1 branch, as of this morning, you can now do this:

var f = new 'FileHandle';
f.open('foo.txt', 'r');
string s = f.readline();

…And you can also do this, which is functionally equivalent:

var f = new 'FileHandle';
f.open('foo.txt', 'r');
string s = f.readline("TEST");

The same two code snippets should work the same for all built-in handle types. For all types, if you don't specify a record separator by either method, it defaults to "\n".

Above I mentioned that almost any arbitrary string should work. I use the word "almost" because there are some restrictions. First and foremost, the delimiter string cannot be larger than half the size of the buffer. Since buffers are sized in bytes, this is a byte-length restriction, not a character-length restriction. In practice we know that delimiters are typically things like "\n", "\r\n", ",", etc. So if the buffer is a few kilobytes this isn't a meaningful limitation. Also, the delimiter must be the same encoding as the handle uses, or it must be able to convert to that encoding. So if your handle uses ascii, but you pass in a delimiter which is utf16, you may see some exceptions raised.

I think that the work on this branch, save for a few small tweaks, is done. I've done some testing myself and have asked for help to get it tested by a wider audience. Hopefully we can get this branch merged this month, if no other problems are found.

22 Jul 2012 7:00am GMT