Dumb Stuff In Parsing Js

Some stuff about parsing JS

This comes from a conversation with Jason Orendorff.

The real thing to read is this

Unpaired surrogates are allowed. This is because js used to be utf8, but now we mostly rely on utf16. Unpaired surrogates are weird things that were used to express things that we don’t need in UTF16 since thats covered.

Another fun one is the unicode escape sequences can be used as identifiers. Unless… they compose a keyword. For example, lets say you have let. The unicode key for e is \u0065. If you replace the e in let with the unicode sequence, you can use this to assign a variable.

Conditional keywords: all keywords are conditional, if you think about property names, you can use “if” or “catch” as a property name or method name on an object. But some keywords are more conditional than others. For example yield. Yield - 1 outside of a generator means something different than yield -1 inside a generator. Specifically, the first one is a subtraction, the second is returning -1. We also have strict mode, which introduces a number of new reserved keywords.

If you want one particularly fun example, try writing a for loop using three “of”s. For (of of of)

We recently had this come up in committee: consider the loop for (async of …) — both async, and of, are conditional keywords that could be either a keyword or an identifier in this case. If you write it as a c style loop; so the style of loop with semicolons, you know usually its I = 0; I < whatever, I++. That kind of loop, but for the first term write an async arrow function named “of”. In this case, of will be the identifier. However, if you write this slightly differently, without a c style for loop, you will end up with async being an identifier, and of being a keyword. We addressed this in committee by making it illegal to have async in a for loop.

Finally, automatic semicolon insertion is really hard. The fact that javascript doesn’t enforce semicolons means that we might parse an entire expression first, find that there is an error, then have to reparse it with a semicolon at the end. So for example, you can’t assign to yield in a generator. If you have let on one line, and then a yield statement on the next line, this would be parsed as an assignment

Notes mentioning this note

There are no notes linking to this note.