Science and technology

What’s new with Awk? |

Awk is a powerful scripting tool that makes it simple to course of textual content. Awk scripts use a pattern-action syntax, the place Awk performs an motion for each line in a file that matches a sample. This supplies a versatile but highly effective scripting language to cope with textual content. For instance, the one-line Awk script /error/ {print $1, $2, $3} will print the primary three space-delimited fields for any line that accommodates the phrase error.

While we even have the GNU variant of Awk, referred to as Gawk, the unique Awk stays underneath improvement. Recently, Brian Kernighan began a challenge so as to add Unicode assist to Awk. I met with Brian to ask in regards to the origins of Awk and his current improvement work on Awk.

Jim Hall: Awk is a superb instrument to parse and course of textual content. How did it begin?

Brian Kernighan: The most direct affect was a instrument that Marc Rochkind developed whereas engaged on the Programmer’s Workbench system at Bell Labs. As I keep in mind it now, Marc’s program took an inventory of normal expressions and created a C program that will learn an enter file. Whenever this system discovered a match for one of many common expressions, it printed the matching line. It was designed for creating error checking to run over log information from phone operations knowledge. It was such a neat concept—Awk is only a generalization.

Jim: AWK stands for the three of you who created it: Al Aho, Peter Weinberger, and Brian Kernighan. How did the three of you design and create Awk?

Brian: Al was serious about common expressions and had lately carried out egrep, which supplied a really environment friendly lazy-evaluation approach for a a lot larger class of normal expressions than what grep supplied. That gave us a syntax and dealing code.

Peter had been serious about databases, and as a part of that he had some curiosity in report technology, just like the RPG language that IBM supplied. And I had been making an attempt to determine some sort of enhancing system that made it potential to deal with strings and numbers with roughly equal ease.

We explored designs, however not for a very long time. I feel Al might have supplied the essential pattern-action paradigm, however that was implicit in quite a lot of present instruments, like grep, the stream editor sed, and within the language instruments YACC and Lex that we used for implementation. Naturally, the motion language needed to be C-like.

Jim: How was Awk first used at Bell Labs? When was Awk first adopted into Unix?

Brian: Awk was created in 1977, so it was a part of Seventh-edition Unix, which I feel appeared in about 1979. I would not say it was adopted, a lot because it was simply one other program included as a result of it was there. People picked it up in a short time, and we quickly had customers everywhere in the Labs. People wrote a lot larger applications than we had ever anticipated, too, even tens of 1000’s of traces, which was wonderful. But for some sorts of purposes, the language was a very good match.

Jim: Has Awk modified over time, or is Awk right now roughly the identical Awk from 1977?

Brian: Overall, it has been fairly steady, however there have been a good variety of small issues, principally to maintain up with not less than the core components of Gawk. Examples embrace issues like features to do case conversion, shorthands for some sorts of normal expressions, or particular filenames like /dev/stderr. Internally, there’s been plenty of work to interchange fixed-size arrays with arrays that develop. Arnold Robbins, who maintains Gawk, has additionally been extremely useful with Awk, offering good recommendation, testing, code, and assist with Git.

Jim: You’re presently including Unicode assist to Awk. This is a kind of initiatives that appears apparent once you hear it, as a result of Unicode is in every single place, however not each program helps it but. Tell us about your challenge so as to add Unicode to Awk.

Brian: It’s been type of embarrassing for some time now that Awk solely dealt with 8-bit enter, although in equity it predates Unicode by 10 or 20 years. Gawk, the GNU model, has dealt with Unicode correctly for fairly some time, so it is good to be updated and appropriate.

Jim: How huge of a challenge is including Unicode assist? Did this require many adjustments to the supply code?

Brian: I have never counted, however it’s most likely 200 or 300 traces, primarily concentrated in both the common expression recognizer or within the numerous built-in features that must function in characters, not bytes, for Unicode enter.

Jim: How far alongside are you in including Unicode to Awk?

Brian: There’s a department of the code at GitHub that is fairly updated. It’s been examined, however there’s all the time room for extra testing.

One factor to say: It handles UTF-8 enter and output, however for Unicode code factors, which aren’t the identical factor as Unicode graphemes. This distinction is vital however technically very sophisticated, not less than as I perceive it. As a easy instance, a letter with an accent may very well be represented as two code factors (letter and accent) or as a single character (grapheme). Doing this proper, no matter meaning, may be very arduous.

Jim: In a Computerphile video, you point out including assist for comma-separated values (CSV) parsing to Awk. How is that challenge going?

Brian: While I had my fingers within the code once more, I did add assist for CSV enter, since that is one other little bit of the language that was all the time clunky. I have never performed something for CSV output, since that is simple to do with a few quick features, however possibly that must be revisited.

Jim: What sorts of issues do you utilize Awk for in your day-to-day work?

Brian: Everything. Pretty a lot something that fiddles textual content is a goal for Awk. Certainly, the Awk program I exploit most is a straightforward one to make all traces in a textual content doc the identical size. I most likely used it 100 occasions whereas writing solutions to your questions.

Jim: What’s the good (or most uncommon) factor you could have used Awk to do?

Brian: A very long time in the past, I wrote a C++ program that transformed Awk applications into C++ that regarded as near Awk as I might handle, by doing issues like overloading brackets for associative arrays. It was by no means used, however it was a enjoyable train.

Further studying

Most Popular

To Top