Internationalizing source code

You will be hard pressed to find a programming language whose keywords are not in English. Being an English speaker myself, I have never had to go through the hassle of having to look at a blob of apparent gibberish and attempt to make some sense of it. When I see the following code, I know exactly what each keyword means since it is in my native tounge.

while (index < 10) {
index = index + 1;
}

Ideally, a Russian developer (that only speaks Russian) would write the same code using his native tounge as is seen in the following example:

russian_while_loop

Programming languages are defined by a series of character-based tokens with character-based separators. A parser is used in the compiler to read these characters and match them with predefined sequences. To support multiple spoken languages (for lack of a better phrase) multiple language specifications would need to be written each having its own set of keywords.

What if, rather than storing the literal string while and having a language specification that has that as one of its keywords, we stored something that represented the concept of a while-loop? Each developer would then be allowed to define the way that they wished to present the concept in an editor. If there was a decoupling of source code presentation and source code storage then localization of editors could be accomplished. Compilers would no longer need to parse out keywords; they would read the concept from the storage format.

This notion could be extended even further if desired. It may be possible to store identifer (e.g. variables, methods, etc) information generically and provide various translations. This would allow the JDK API for example to use identifiers meaningful to each locale. This is obviously not needed for every project but for more global APIs (e.g. Apache commons) it may be highly desirable.

This entry is continuing the thread on separating the presentation (view) of a programming language from its storage format (model). There are also entries on annotating, adding images and de-textifying source code as well as simplifying the understanding of code structure.

Thanks to Igor Fedulov for the translations.

Advertisements

8 comments

  1. Am sure you have thought about this – but the approach you outline would be useful when the number of constructs is not too large. otherwise one would end up having to learn – the language of these constructs, how many; how they interact; the rules in composition …
    on another note, some languages dont map one to one with English – take any phonetic language. what woul dhappen if multiple words represented the equivalent of a “while” ?
    also some interesting issues with languages written right to left.
    intriguing idea though. Thanks for putting it up

  2. Anand, thanks for your comment.
    I think that I know what you’re getting at with your first statement, but would you clarify it a bit? Thanks!
    When I was writing the entry I assumed that the multiple word problem could be handled with a suitable concatenation convention such as “twoWords” or “two_words”. These work well for the English-like (i.e. Latin alphabet) languages but I honestly don’t know about others (e.g. Kanji). Anyone have any thoughts?
    BiDi is another story. I have to assume that there are well known techniques for mapping BiDi text into a format that has a single direction (I’m only guessing at this point). It would be the job of the presentation layer to map from a single direction concept-based storage format to a language specific (possibly BiDi) presentation. Unfortunately I know very little about BiDi except that there are APIs available to simplify my life. If someone has a background in BiDi, give us a shout!

  3. What i was getting at is this – lets say u had a few hundred “concepts” in there (wouldnt be too surprised if a language like C++ had a very large number). where does that leave the programmer ? if this is a new paradigm of working, then shouldnt the number be few and simple ? otherwise, one is simply exchanging 6 for half a dozen.
    one could of course argue that most C++ programmers have a fixed set of concepts they use. maybe they wil do the same in this approach ..
    the word concatenation approach wont work – in some languages that i know, takes on a whole new meaning…
    have you worked with “graphical” languages. they somehow petered out sometime in the early 90s. conceptually similar approach to what you are talking about ?

  4. Thanks again for you comments Anand.
    At this stage of my discussion, I’m not claming to change existing languages (at the concept level) in any way. Specifically, for the internationalization discussion, I’m simply addressing locale issues. Why should a Russian, Japanese, Indian, etc developer that is only ever going to work on code for their region *ever* have to learn english to do their job?
    Introducing new concepts is not currently in my scope. Intentional Programming (IP) or Sergey Dmitriev’s discussions on Meta Programming System (as well as others) are headed down this domain specific concept path. I can certainly understand their needs and desires but I’m starting from the core and working my way out. At this point, the core is de-textifying languages (or at least removing the fact that they’re completely textual).
    I have done a bit of research on graphical languages. I think that that goes to the other extreme — completely textual is on one end and completely graphical is on the other. My current feelings are somewhere in a happy medium. I will admit that I started this whole process by attempting to display all of the concepts of Java graphically. I quickly realized that it would take something succinct and turn it into a page of seemingly meaningless symbols. In other words, it made the problem more complex rather than showing it in a new light. I then back-pedaled to find the happy medium.

  5. Pingback: Simplifying code maintenance and understanding program flow « Rob's Random Ramblings

  6. Pingback: Annotating source code « Rob's Random Ramblings

  7. Pingback: Picture this « Rob's Random Ramblings

  8. Pingback: De-textifying programming languages « Rob's Random Ramblings


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s