Daily Archives: November 29, 2006

What if? What not? Testing … testing … testing

We continue our occasional series of posts on the Sun/FSF alliance and the release of Java as open-source by looking at the process of language definition and standardization, and the role of testing in enforcing conformance to a language specification.

Open-source is closely associated with open standards. For example, Linux implements the POSIX standards that were created around Unix, and the Apache HTTPD server implements the key standards associated with HTML.

Usually the implementor has some freedom of action, and one of the great appeals of open-source is the opportunity to affords to use that freedom to innovate on top of open-standards. For example, I use the Amarok program to play my digitally-encoded music on my SuSE box. There are many such players but I’ve found Amarok works the best for me. Amarok relies on code from others to play mp3 files as well asflac files . I prefer flac since it is based on an open-standard, while mp3 is encumbered by patents.

However, Java is much more precisely defined, in the form of the Java Language Specification (JLS) and the Java Virtual Machine Specificatinon (JMS). These are precisely written and are also available under a license that lets you implment the specifications.

This of course made the job of writing a Java compiler a well-defined task. Our compiler, Jikes, takes as input source programs written in Java and produces as output files in a precisely-defined format called “class files” that serves as the input to the Java virtual machine.

Philippe Charles and I were very careful from the start of our project to never exposed ourself to any of the source-code from Sun or to any of the tests that we new Sun had produced, and which were available to IBM under its license from Sun. Accordinly, we relied on such tests as we could put together, and were also one of the first users of the “Modena” test suite.

However, once word got out that we would be releasing Jikes in binary form at IBM’ alphaWorks site, we were instructed that IBM did not want to release any Java-based technology that did not conform to the applicable standards, and we were thus told we had to pass Sun’s tests before we could release the code, and were granted access to the test suite. In those days it was called the JCK (Java Compatability Kit).

We weren’t concerned with the virtual machine tests, only with the compiler tests. There were two key parts. The “b” tests tested for diagnosing errors; that is, each test had one or more errors and to pass the test was to recognize these errors. By the way, though Jikes is perhaps best known for its speed, I think its greatest strength is in the quality of the error messages. Philippe’s Ph.D. these was in the area of parsing, and his two major advances were (1) to advance the state of the art in compressing the tables that drove the parse, and (2) to provide code that, given the grammar as input, would automatically generated code that would detect and repair errors.

We used Philippe’s parser-generator as part of the Jikes work. It was released as part of the Jikes work, under the name JikesPG. It is also the parser-generator used by Eclipse’s Java parser. One indication of its quality is that JikesPG was able to accept the Java grammar exactly as it was written in the JLS. We noticed in a later release of the JLS that the grammar had been changed. The new grammar accepted the same language; the changes were made so Sun’s own parser-generator could process the grammar.

The “c” tests tests consists of correct programs and to pass the test was to compile the source, execute the resulting program, and produce the same results.

It took us a few weeks to pass the “c” tests. As I recall the tests uncovered about as many errors in Jikes as Jikes uncovered in the tests.

We also uncovered an error in one the vendor run-times, I think it was in Microsoft’s C++ compiler. To compile Java you need to able to evaluate expressions involving constants at compile-time, and the language is so precisely defined that every bit in the result is known. As a result we had to include our own code (or it might have been code written by someone else at IBM) to do 64-bit arithmetic. (The only other code in Jikes that we didn’t write ourselves was some of the info-zip code we used to unpack Java “jar” files. (By the way, this is one of the reasons Jikes makes such a good example for a compiler course — it was written from the ground up, bit-by-bit, byte-by-byte.)

I did all the testing thereafter , and I spent thousands of hours from early 1997 to the end of 1999 running the “c” tests. It was fortunate that Jikes was so fast, as it routinely compiled the tests about 10 times faster than Sun’s own compiler, javac. Indeed, the standard demo I spent when showing Jikes at conferences during 1999 was to have two windows open, one compiling the tests using Jikes, the other using Javac, as the difference in speed was easy to see.

That test suites tend to mimic the behavior of what is being tested is a well-known phenomenon in the world of testing. For example, perhaps the best book ever written about the C programming language is the classic C: A Reference Manual by Samuel Harbison and Guy Steele (the same Steele who is also one of the authors of the JLS). It reflects the work they back in the 1980’s while at Tartan Labs. I recall an anecdote about their experience; perhaps it can be found in the book or perhaps I heard it elsewhere. Their first version attempted to reflect the language spec as precisely as possible. However, they started getting complaints from users who were getting different results using the standard Unix C compiler. They tracked the problems down, and realized the differences were due to a number of bugs in the Unix compiler, so they then modified their compiler to have precisely the same bugs so compiler programs would exhibit the same behavior.

We had similar experiences in the early days of Jikes. Here’s an excerpt from the Jikese FAQ:

Why does Jikes reject a program that another compiler accepts, or accept one that it rejects?

You may find that Jikes accepts a program that another compiler rejects (or can’t compile), or rejects programs that another compiler accepts.

Each version of Jikes represents our best effort at the proper interpretation of the language specification. Although Jikes is designed to work with all but the earliest versions of the JDK, we make no claim that any particular version supports precisely the same language as any particular version of the JDK. Since some products are designed to work with specific versions of the JDK, the compilers associated with them may not always recognize the same programs as Jikes.

This section contains some examples of issues related to interpreting the specification.

Extraneous Semicolons

Your program may contain extraneous semicolons that are silently ignored by many compilers. For example, given

public class Test {
   void f() {};          // first extra semicolon
};                       // second extra semicolon

Jikes accepts the program but issues:

     2.       void f() {};       // first extra semicolon
*** Warning: An EmptyDeclaration is a deprecated feature that 
             should not be used - ";" ignored

     3.    };                   // second extra semicolon
*** Warning: An EmptyDeclaration is a deprecated feature that
             should not be used - ";" ignored

The first extra semicolon is erroneous, the second is allowed by section 7.6. Jikes treats each as cause to issue a warning. You can use the -nowarn option to suppress this (and other) warnings, or, even better, you can use your editor to delete those extra semicolons.
Unreachable Statements

It is a compile-time error if a statement cannot be executed because it is unreachable (section 14.19). When Jikes first appeared, some other compilers didn’t properly detect unreachable statements, and accepted programs such as the following:

class Test {
   void method(boolean b) {
      if (b) return;
      else return;
      b = !b;

Jikes rejected this program:

        b = !b;

*** Semantic Error: This statement is unreachable

(This is the example referrred to in PC Week (April 14, 1997): IBM, Netscape Up Web Ante.)

Another example, and one that confused many users, is shown by the following program:

class Test {
   public static void main(String[] args) {
      try {
      catch (Exception e) {

Jikes accepts this program but issues the warning:

        catch (Exception e) {
*** Caution: This catch block is unreachable: there is no
             exception whose type is assignable to
             "java/lang/Exception" that can be thrown during
             execution of the body of the try block

This was the most frequently reported problem when Jikes first appeared. It took several months to confirm that Jikes was right all along. See Query #2 to Sun for the full story.

By the way, “Query to Sun” refers to a series of posts we made to Sun about the language spec. These and many other documents were once available on the web, but are no longer so. However, the Notes database containing them is still around and I will endeavor to have the documents made web-accessible.

The best example I have ever seen in the design and implementation of a programming language is Ada. The effort began in the mid 1970’s when the U.S. Department of Defense (DOD)realized that their vendors were using scores of different programming languages, and, thanks mainly to some visionary leadership, DOD undertake an effort to create a new programming language that would meet their needs.

First, they enlisted experts to draft a series of requirement documents, known as “Strawman,” “Ironman,” and I think the last one was “Steelman.” Then they funded two teams and charged them with the task of designing a language that met the requirements. The winner was the “Green Team,” led by Jean Ichbiah of Alsys. (This is why so many Ada books have green covers.)

DOD then funded two efforts, one at NYU to produce an executable form of the specification, another at Softech to create a set of tests to test conformance to the specification. I was part of the NYU team (as was Philippe); the tests were written by Softech, under the leadership of John Goodenough (a ferociously skilled test-writer.)

Ada was unveiled to the public in 1982 in the form of a joint announcement that the compiler was available, as were the tests, and the compiler had passed all the tests.

DOD then set up a group to supervise the language standard going forward. I believe it was called the ARB (Ada Review Board). It was the final arbiter on language specification as well as any proposed language changes.

My last job at NYU was to translate the original Ada compiler from SETL, a programming language developed at NYU, into C, with the goal — that was met — of being able to compile and pass the tests on an IBM PC/AT. [1]

The key people from the NYU team went on to found a company called ACT (Advanced Compiler Technologies). They translated the compiler into Ada and it has become the industry standard. It is licensed under the GPL.

But is one until now untold story from the Jikes day that really brought home to me the importance of testing in maintaining an implementation’s conformance to a specification.

Sometime well after Jikes had been released in open-source form I was approached by a group at IBM. They had been approached by a vendor. The vendor said they were acting on behalf of a client, and that the client wanted a commercial license to the Jikes source code, as it stood just before the release as open-source.

I pointed out this didn’t make much sense, since the code was freely available, but the vendor persisted, as they wanted to be able to build a commercial offering on top of the Jikes code. I then pointed out that even if they did get code in its state when it was released then they would be unable to make use any of the bug fixes that had been received from our contributors, since those fixes came to us via the open-source process.

They persisted, and there were a few more calls. Finally, the senior IBMer involved said they were welcome to the code, so long as they agreed to make no changes that would cause their product to fail any of the compatability tests.

We never heard from them again — that was the last call.

Now I am well aware of the Sun/FSF alliance and the recent announcements of Java as open-source –though I think they prefer to call it “free software.” But I don’t know all the details. I think some code has been released, some more has been promised.

However, here is my own view, based on my experience designing and implementing programing languages:

Any release by Sun of all or part of the Java source-code is of value if and only if Sun also releases ANY and ALL of the associated tests they have developed to test that code.,

For example, supposed you get the code and compile it on Linux. How will you know you got it right without testing? Perhaps you have some secret sauce, but the only way I know to find out is to run tests.

I’m hoping in my ignorance that Sun has already vowed to release all these tests. If so, I congratulate them.

If not, we shall see if Sun is up to the test.

Licensing and Policy Summit for Software Sharing in Higher Education: Trip Report

This post is a slightly-edited trip report I wrote following my participation in the Licensing and Policy Summit for Software Sharing in Higher Education conference in Indianapolis in mid-October. I’ll speak more about it in future posts.

Dan Greenstein, the co-chair, from the University of Caliifornia was unable to attend, so Bradley Wheeler, CIO of the University of Indiana (all campuses) led the conference.

I confirmed with Brad that all the conference discussions are open. I took almost 20 pages of notes, and will post them to the conference portal soon. Brad volunteered the services of Tina Howard, a technical writer on the staff of IU, to take notes and help organize the conference discussions. I offered my help to Tina in answering any questions she might have about open-source, as she is new to this area. I also took along a digital camera, and handed it to Brad Thursday after lunch, suggesting he might want to take some pictures. He did; I will post them to flickr shortly and then post the URL’s to the portal.

There are two key open-source projects in this area currently underway. Both are enterprise-level applications: Sakai for course management, and Kuali for managing a university’s financial operations.

There were only two commercial firms represented: IBM and rsmart.com, a firm that provides services and support for Sakai and Kuali, and which is also is playing a key role in development. Key figures here are Chris Coppola from rsmart (Kuali) and Joseph Hardin from University of Michigan (Sakai).

There was representation from the following universities: Indiana University (all campuses), University of California (all campuses), MIT, University of Michigan (Paul Courant, former provost was present along with Joe Hardin), Cornell and Penn State. Over a third of the attendees were attorneys. There were also many folks involved in technology transfer and university operations.

Cliff Schmidt of the Apache Software Foundation was present Wednesday night and all day Thursday. His input was greatly appreciated.

There was significant representation from outside the U.S.:

Randy Smith from the Tech Transfer Dept. of the University of British Columbia;

Naomi Korn and Ralph Weeden (counsel) from JISC, http://www.jisc.ac.uk/ , from a group in the U.K. that in their words, “provide world-class leadership in the innovative use of Information and Communications Technology to support education and research;”;

Malcolm Bain from Legistics, a law firm in Barcelona that is currently providing guidance to a consortium of universities in Catalonia that is developing a system that will support 10,000+ concurrent users;

John Norman from Cambridge University;

Robin Stanton, CIO of Australian National University (and former department chair of Andrew Tridgell, author of Samba and a current member of IBM’s Linux Technology Center);

Randy Metcalfe from OSS-Watch, http://www.oss-watch.ac.uk/ . In their words, “OSS Watch promotes awareness and understanding of the legal, social, technical and economic issues that arise when educational institutions engage with free and open source software. It does this by providing unbiased advice and guidance to UK higher and further education.”

I individually spoke during the conference with all the folks named above, as well as most of the other attendees.

Present also were folks from the Mellon Foundation and Ithaka.org. Mellon has provided key funding for the projects as well as funding for this conference (including paying for all the meals). Mellon is very sophisticated in their operations. They don’t just give way money but try to do so strategically, providing initial funding so projects can become self-sustaining. Ithaka specializes in services in this area. For example, Matthew Rascoff and Paul Courant wrote much of a very well-done report on open-source and academia available from ithaka; Barnaby Gibson, counsel to ithaka, was a very active participant in the legal discussions. He also serves as pro bono counsel to the Sakai Foundation. Mattthew, Barnaby and I were of the same flight home to New York, and I spoke with them while waiting for our delayed-flight to take off (Tip to all for the future; avoid Northwest Airlines if at all possible. My original flight to Indiana was canceled, and my returning flight was delayed. A flight to Boston that was supposed to take off about the time of the flight back to LGA was also canceled.)

Someone from a major university approached me and said they were in interested in using IBM’s own “dogear” technology. I mentioned that while Dogear is very good, it provides a tagging function very similar to that offered by http://del.icio.us . He said he knew that. They have an a university portal and he had suggested folks use del.icio.us to record their bookmarks, but many were hesitant to do since the bookmarks were public, while Dogear could be used to keep the bookmarks private inside their firewall. I said I would put him in touch with someone in IBM who could help

Most of the conference was about very specific issues related to Contributor License Agreements (CLA’s) and providing some statements about patents in the open-source license for Sakai and Kuali.

The CLA used to date has been that used by Apache It has been accepted by about 6 universities, including Indiana and Stanford, but both UC (University of California, all campuses) and MIT requested that additional language be included before they would approve. These changes have been accepted. The language is similar, and relates to the following problem.

Existing CLA’s (in my view) are directed towards two groups: the open-source developers who write and maintain the code, and the corporations who contribute open-source code or use open-source code in their products (IBM does both). The developers just want some assurance they won’t be sued. The companies want to be comfortable using open-source code in their products. The open-source folks recognize the need for a more rigorous review process; that is a price they must pay to get more corporations to use their code and also to get those corporations to help contribute their own developers to repair and refine the code.

Corporations own the IP of their employees. But universities are in a different situation. For example, in some (most?) universities the situation is as follows:
a) Undergraduates own their work;
b) Graduate students who are paying for their own education own their work;
c) Graduate research assistants don’t own their work; some universities require they sign an agreement to this effect, as does IBM with each of its employees;
d) Faculty have a more complex relationship. Indeed this is the main concern.

Faculty by tradition and also in most cases “by policy” have their own rights to their work. For example, an English professor who writes a book gets the proceeds of its sales. EE and Computer Science professors have asked for similar rights for the code they write; and in some cases have gotten the same rights. “Policy” is especially tricky in that university policy is “black letter policy,” based on explicit documents that in some cases are over a hundred years old. Revising these documents is a very non-trivial process, one not to be undertaken lightly.

All this is made more complicated by the special value associated with patents in the area of genetics, pharmaceuticals and bioengineering. And it gets even more complicated because universities are more and more focusing on the money to be made via technology transfer, and so some are less willing to make their work available in as open a form as has been the norm in the past.

When a professor gets a patent, a university typically has the first right to license that patent, sharing the proceeds with the inventor. However, if the university passes, then all the patent rights remain with the faculty member in the form of an exclusive license. However, the Apache CLA says that a contributor grants rights to all of its patents that related to the code being contributed. However, both MIT and UC are unwilling to accept this language, as they say that can’t determine at the time of contribution if a member of their faculty may have a patent that reads on the patent. So if they sign the CLA they are potentially reducing the potential income to the holder of the patent, and if this indeed proves to be the case then they can be — and have been — sued by the faculty member; As one attorney put it simply, “What is the language that says we don’t know what we don’t know?”

The net of the added language — as I can best recall — from UC and MIT is to say the contributions are not from the institution but from the contributors as individuals.

There was some discussion of revising policy to define the notion of a “enterprise software box,” by which is meant that faculty would have to agree that the university reserved the right to license their patnets for use in open-source code being developed by the university and jointly with other universities for their own purposes, as is the case with Sakai and Kuali. Both UC and MIT agreed to attempt some policy revision, though this is by no means easy. For example, UC may have to deal with over 165,000 signed agreements in effect with various faculty members in the past and present.

Cliff Schmidt, the “legal advisor/member” of the Apache Software Foundation (ASF) was present all day Thursday. He is not an attorney but specializes in legal issues on behalf of ASF. He mentioned they were in the process of revising their CLA, and might be amenable to adding language suitable for academic contributions. That is, ASF might add language that would apply only to academic contributions, so the changes wouldn’t affect the existing language for corporate contributions. However, he did say it was unlikely such changes would be put into effect before mid-2007.

The universites had already agreed before the meeting to try to move to the Apache License 2 if possible. Some of their current work is licensed under the Educational Community License, the ECL: http://www.opensource.org/licenses/ecl1.php. As an aside, I pointed out ECL that in my personal view ECL is incompatible with Apache V2, BSD, or MIT because of the requirement to give, “Notice of any changes or modifications to the Original Work, including the date the changes were made.” (The PERL Artistic license has similar language.) Many seemed surprised by this observation.

Brad Wheeler, the chair, sent the attorneys out of the room Friday morning to see if they could come up with some actual language, as all agreed it was hard to discuss how to best proceed without some actual language at hand. They labored for just over an hour (I timed them) and returned to say they had come to an agreement, but were unwilling to share the language until they had had a chance to review it with their own organizations.

By the way, attorneys Charles Drucker of UC and Karin Rivard were very impressive and quite eloquent in expressing their concerns. As an aside, before dinner on Thursday, I said to Karin that I found it ironic that the creators of the BSD and MIT licenses were now moving to Apache 2. Karin said she preferred BSD in its current form to MIT’s license (another irony.)

There were a few votes and such taken during the discussion. I made a point of abstaining from any vote as I didn’t think it proper for me on behalf of IBM to participate in this way. I did participate in the discussion, particularly in some aspects of open-source and project management where I sensed that the folks involved had limited experience, but I made clear when the conference started that I would only be giving my own views based on my experience, and should not be perceived as offering any official views from IBM.

There was a consensus reached near the end of the meeting that it was important to have some patent-related language in the outbound license. The ECL license doesn’t have any. When I pointed out that they were providing a very high level of Due Diligence — one that went beyond that which might be needed for commercial entitiies to use their code, as the code was primarily meant for use just by their own and other universities — they said it was important to have a very rigorous process so they call sell their work to the university CFO’s and Chancellors.

The final consensus was:

1) The community would attempt to incorporate the UC and MIT changes into the existing CLA as soon as possible;

2) Patent-related language would be added to the outbound license, which would be Apache 2 (preferred), modified Apache 2 (less preferred), or modified ECL (least preferred);

3) The optimal solution would be for ASF to revise its CLA to recognize the special nature of academic contributions, and perhaps even revising the outbound license, but it was understood that any changes on the outbound side by ASF could not weaken the existing language, that is it would still have to be acceptable to the corporate community;

4) That, since ASF changes might not be available until mid-2007, the community might proceed by revising ECL and seeking approval from the Open Source Initiatve (OSI) for approval of the revised ECL. The revised ECL would be a substantial revision; indeed, it would be basically just Apache 2 with language added to make it acceptable to the academic community, at least that part of the community developing enterprise-level applications for its own use.

I told the group as a whole that many of the folks involved in managing IBM’s Open Source activities were located in Westchester and I would be happy to arrange informal meetngs, both with Research or to meet on an informal basis with some of our attorneys. Barnaby Gibson from ithaka is based in NYC, and while we were chatting when waiting for our flight to board said he might take me up on the offer.

Friday morning I sat between Randy Metcalfe of OSS-Watch, an open-source information provided based in the U.K., and Jacqueline Ewenstein, an attorney from Mellon. I will try to build a relationship with Randy; I can already tell by the incoming searches to my blog that he has started reading it. I congratulated Jacqueline, Ira Fuchs, and Don Waters from Mellon for their support of the conference.

A highlight of the trip was meeting Paul Courant from University of Michigan. He is a wonderful and very wise man, and I was especially pleased to learn, when I asked him if he was any relation to Richard Courant, that he was his grandson. I asked Brad to take a picture of the two of us and plan to post it on my blog soon. I spent almost twenty years at the Courant Institute of Mathematical Sciences (CIMS), the world’s foremost school of applied mathematics, a reputation due principally to Richard Courant and the many of the colleagues from Germany he enlisted to join him when he fled Nazi Germany to find a new life. I almost crossed paths with Paul in that he worked he summer of ’66 as a computer operator and I came to CIMS in early September of that year.

While waiting for my return flight I learned that Matthew Raskoff of Ithaca is from New Rochelle, which I knew was the home of Richard Courant any many of his CIMS colleagues, including Kurt Friedrichs. Matthew said he was a good friend of one of Kurt’s sons. I mentioned I had had Kurt as one of my professors. My most vivid memory of him is that one morning back in the 60’s when we was into his 80’s I saw him waiting impatiently for the elevator to arrive. He just couldn’t bear to wait an extra minute or two to get to his office so he could start work on mathematics!

On my flight to Indianapolis I fell into a conversation with a fellow passenger and he said I should make every effort to visit the Indianapolis Speedway, home of the “Indy 500.” When the conference ended early Friday afternoon, Julie Dreesen, one of Brad’s people who did a great job manging the logistics of the conference, helped me confirm the Speedway was open. I visited the Speedway on my way to the airport and, being unfamiliar with my rental car controls, managed to leave the lights on when I went to visit the Indy 500 Museum. The net of this was that I had to ask the help of the kind folks who run the Speedway (it is a private company), so they could “start my engine.” This is the topic of my most recent blog post, https://daveshields.wordpress.com/2006/10/22/gentlemen-start-my-engine/

My thanks to Brad Wheeler of Indiana University who sought IBM participation, and to Mike King of IBM for inviting me to participate in the conference last week in Indiana. It was a great honor and privilege to be able to attend.

  • Pages

  • November 2006
    M T W T F S S
  • RSS The Wayward Word Press

  • Recent Comments

    daveshields on SPITBOL for OSX is now av…
    Russ Urquhart on SPITBOL for OSX is now av…
    Sahana’s Respo… on A brief history of Sahana by S…
    Sahana’s Respo… on A brief history of Sahana by S…
    James Murray on On being the maintainer, sole…
  • Archives

  • Blog Stats

  • Top Posts

  • Top Rated

  • Recent Posts

  • Archives

  • Top Rated