Oil 0.8.pre5 - Progress in C++

  • 时间: 2020-05-27 08:36:41

This is the latest version of Oil, a Unix shell:

Oil version 0.8.pre5 - Source tarballs and documentation.

To build and run it, follow the instructions inINSTALL.txt. If you're newto the project, seeWhy Create a New Shell?and the2019FAQ.

Table of Contents

Semi-Automatic Translation to C++

Two Analogies: Go Compiler and TeX

DSLs and Code Generation

Wrapping Shell Dependencies

Appendix: Selected Metrics

Highlights

  • As of this release, we runspec tests against theoil-native binary! In other words, we're measuring how well the semi-automatictranslation to C++works.
    • Here arethe results. The Python version of OSH passes1560tests (+), while the C++ version passes420tests. This is significant progress, but there's more to do, which I discuss below.
  • Koichi Murase madeover a dozenfixes to OSH, motivated by runningble.sh (full changelog).
  • I made a few fixes torun the ShellSpec project. Notably,shopt -s extglobis now respected.
  • Internal: we have proper C++ unit tests and run them onour continuousbuild. I started using thegreatest.htest framework, and it's simple and effective (Zulip thread).

I'd still like more bug reports! SeeHow To Test OSH.

(+) Test harness bug that will be fixed: 1539should be 1560.

Closed Issues

#758Incorrect fnmatch due to extended glob syntax
#754Implement test -u and test -g
#753${var+foo} shouldn't cause error when 'set -o nounset'
#7271 ? (a=42) : b shouldn't require parentheses

Semi-Automatic Translation to C++

Two Analogies: Go Compiler and TeX

What's all this about C++? Here are two analogies to help explain what's goingon.

  1. GopherCon 2014: Go from C to Go by RussCox(YouTube, 31 minutes).It's time for the Go compilers to be written in Go, not in C. I'll talkabout the unusual process the Go team has adopted to make that happen:mechanical conversion of the existing C compilersinto idiomatic Gocode.(Grindis the one-off tool that helped withtranslation, analogous tomycpp.)

    The flavor of the work is similar to what I'm doing with Oil, but there's akey difference: Oil's source will remain in statically typed Python and DSLslikeZephyr ASDL for the forseeable future. We won't be writing C++by hand.

    Static types play an important role in both translations.

  2. How to compile the source code ofTeX.Knuth wrote TeX in a dialect of Pascal, but it'snotcompiled with aPascal compiler. Instead, it's translated to C and compiled with a Ccompiler.

The common thread is that we want topreserve the correctnessof anexisting codebase. Oil runsthousands of lines ofexistingbash scripts, including some ofthe biggest shell programs in theworld.

Rewriting by hand would introduce a lot of bugs, so instead we write a customtranslator and apply it to the codebase. In Oil's case, there are more codegenerators to remove dynamic typing and reflection, discussed below.

Recap

In addition to the new spec test metrics, these line counts give a feel forrecent progress:

  • The0.7.pre9 release in December.
    • osh_parse.cchas9,867lines of code (rawdata). I showed thatthe OSH parser can be gradually refactored and translated to C++. Notably, the result isas fast ashand-written C code.
  • The0.8.pre2 release in March.
    • osh_eval.cchas16,491lines of code. In addition to the parser, we translate the word and arithmetic evaluators.
  • This release,0.8.pre5.
    • osh_eval.cchas20,875lines of code. We translate the command evaluator, including assignments. So the resulting C++ interpreter can run code likereadonly x=y; echo $x. Details below.

For comparison, the slow OSH interpreter consists of about30Klines ofPython code. This doesn't include theOil language,which I haven't started translating.

The translation isn't going as quickly as I'd like it to, but it's working, andI'm solving interesting technical problems along the way.

As far as I can tell, this unusual process is the shortest path to a fastshell. (As mentioned in January, Iencourage parallelefforts. Feel free to ask me aboutthis.)

Details

I keep a log of the translation process onZulip.

  • Static typing offlag parsingwas a big deal (Zulip thread). A common theme of translation is turning Python reflection into textual code generation, and this was another instance of it.
    • Assignment builtins likedeclare -g foo=barnow work, so we have a path to translate moreshell builtins to C++.
  • Zephyr ASDL is turning into half of a programming language (Zulipthread). Specifically, it's a language for describingtyped data, which Python is missing. It now supports dicts/maps with the syntaxmap[string, int].
  • The interpreter is still"pure", which is why only420tests pass. The nascentosh_eval.ccdoesn't even runls, because it's external process! But it understands the hairy details of word evaluation${}, arithmetic evaluation$(( )), brace expansion{a,b}, and more.

More background: the March recap had a similar section with Zulip threads:mycpp: The Good, the Bad, and theUgly.

TODO on Translation

Even though about two-thirds of OSH translates to C++ and compiles, and much ofit runs correctly, there's still a lot of work left.

Oil is simply a big project: recall thatbash consists ofover 140K linesof code. I estimate thatOSH implements 80% ofbash, with significant fixes. And Oil is a newlanguage with many features on top.

DSLs and Code Generation

Oil's source code will remain in high-level languages for the forseeablefuture, so we need to enhance the code generators to produce correct and fastC++.

  • mycpp
    • The OSH interpreter uses Python'stry/finallyfor scoped destruction, but C++ doesn't havefinally. We should probably use Python's context managers, and havemycpp translate such blocks into constructors and destructors.
  • Zephyr ASDL
    • The translation process deals with exceptions in a messy way, using something approximating#ifdef. Exceptions are more like structs than classes, so they could logically expressed with ASDL schemas.
  • Thepgen2 parser generator
    • The syntax of theOil language is expressed withpgen2, and we don't have a C++ code generator for it yet. After discussion with Jason Miller, I think we should borrow the original code generator and runtime fromCPython rather than try to translate the slow Python implementation.

Wrapping Shell Dependencies

In theJanuary blog roadmap, Imentioned that there aretwo technical problemswith translation.

One of them was wrapping native C code, which I no longer see as a risk. It'sjust work. The shell has three main dependencies:

  1. libc. I've wrapped pure functions likefnmatch()in C++, and this is straightforward.
  2. The Unix kernel. Wrapping functions likeexecve()is similar to wrappinglibc, buterrnohandling is an issue I want to revisit. (TheseUnix comics are relevant.)
  3. GNUreadline for interactive features. To be honest, I'd rather punt interactive features to Oil code, analogous toble.sh. But Oil should have basic readline support.

Open Problems

  • The interpreter's memory management is probably the biggest open issue. I have ideas, but I haven't tested them with an implementation.
  • Theautocompletion code makes good use of Python'syield, which I can't (or don't want to) use in C++. I might rewrite it withfork()andwrite()to a pipe.

Plan for 2020

Asmentioned in January, the bare minimumfor "success" is when OSH to replacesbash for my own use.

After reviewing all this work, I still feel like OSH can be "finished" in 2020.I won't be extremely surprised if isn't, but it seems reasonable.

On the other hand, it seems clear that the Oil language will remain a prototypefor all of 2020. I haven't gotten much feedback on it, probably because thereisn't much documentation.

This is disappointing, but I don't have a solution to this problem.

In short, theproject's focus has necessarily narrowed. The only two goalson my radar are:

  1. The OSH language should be translated to C++, tested, and optimized.
  2. The Oil language should be divorced from the Python runtime and similarly translated. This will almost certainly bleed into 2021.

I should write a longer blog post about this, butalmost everything else iscut. Oil will be more like alibrarythan a shell. (As mentioned, I'llneed basic GNUreadline support for my own use.)

The docs are another sore point. I've mostly been writing them "on demand"(whenever anyone asks). It seems like that pattern will continue, given allthe other work that needs to be done.

What's Next?

  • Continue translating Oil to C++, guided by metrics.
    • Increase the number of spec tests passing from430, shown inspec.wwz/cpp/osh-summary.html.
    • Increase the number of lines of code translating and compiling from20,875.
  • Fix bugs reported by users. Bug reports really help! Again, seeHow toTest OSH.
  • Improve the OSH interpreter, especially with regard toerrexit(issue709). I'd also like to resume work onRunning ble.sh WithOil.

Feel free toask questions in the commentsor onZulip!

Appendix: Selected Metrics

Let's compare this release with the previous one, version0.8.pre4.

Native Code Metrics

We have nearly 70K lines of C++ code, including over 20K translated bymycpp.

The size of theosh_eval.opt.strippedexecutable differs between GCC andClang, an I don't yet know why. In any case, the increase is consistent withtranslating and compiling more lines of code.

Test Results

OSH spec tests:

There was no work on the Oil language! I'm a bit concerned by that, which isone reason for the scope reduction mentioned above.

Line Counts

We have ~300 new significant lines of code in OSH:

And ~500 new physical lines of code:

Benchmarks

The parsing benchmark didn't change much:

Nor did the runtime benchmark: