## Tuesday, May 20, 2008

### Typing mechanisms redux - Part 1

Everybody and their grandmother has a has a definition of the following terms:

• Static typing
• Dynamic typing
• Strong typing
• Weak typing
• Strict typing
• Loose typing

A quick search on Google will reveal several different (and usually mutually exclusive) definitions for each of the terms, with some egregious offenders offering conflicting definitions on the same site.

# Confusion

Here are just some random examples I found using Google:

• "[Strong typing means] Strict enforcement of type rules with no exceptions. All types are known at compile time." - [dictionary.die.net]
• "Static typed programming languages are those in which variables need not be defined before they're used." - [sitepoint.net]
• "From my POV, strong typing prevents mixing operations between mismatched types" - [artima.com weblogs]
• "[Ruby is] dynamically but strongly typed (strongly typed as: in Ruby "6" + "2" will give you "62" but in a loosely typed language like Perl or TCL that would give you 8)" - [blogs.codehaus.org]
• "MATLAB is dynamically typed, meaning that variables can be assigned without declaring their type, and that their type can change." - [wikipedia.org]
• "Weak typing means that a language implicitly converts (or casts) types when used." - [wikipedia.org]

Awesome, now that we've started off totally confused, I'll try to tidy things up. Rather than starting from the terms and shoehorning definitions, I'll take a look at each of the different definitions, and decide how useful and precise they are at describing certain languages. It turns out that while there is widespread disagreement on what each of the terms mean, everybody seems to "know it when they see it". That is, everybody can agree that Java is static and strong, PHP is dynamic and weak, etc. This instills a modicum of hope that these definitions actually mean something, and we can find them useful.

# Definitions

First, I need to get some definitions out of the way.

• When I say symbol, I mean either a value or expression used in describing a context-free grammar.
• Coercion and casting has its own article.
• Finally, by type checking, I am referring to an algorithm for ensuring that a symbol has the correct type.

Now, I'll discuss each definition in turn, and label each one so I can refer to them later:

# (S-1) Type checking is done during compile time.

Associated languages: Java, C++, Haskell, etc.
Associated terms: Static typing, Strong typing, Strict typing

If we know the types at compile time, this means that the language we are discussing makes use of type annotations to describe symbols, or type can be inferred from the symbols themselves without running the program. These type annotations can be used to perform type checking. However, it should be noted that type checking during compile time cannot completely eliminate run-time errors. For example, while it is possible to check that arithmetic operators only operate on numbers, it is not possible to check for division by zero.

Is a good definition for: Static typing

This property is sufficient for static type checking in Types and Programming Languages (which, by the way, makes no mention of any of the terms above, except for a note acknowledging that "static typing" and "dynamic typing" have become synonymous with "static type checking" and "dynamic type checking").

No surprise here, as people normally agree that this property is necessary for static typing, but bicker endlessly about whether or not an additional property (usually referred to as strong typing) is also necessary. I do not include strong typing as a candidate term, because definitions (T-1) and (T-2) which normally describe strong typing do not require static typing.

Strict typing also applies to this, but just like the term "loose typing", strict typing seems to apply to any of static or strong typing, or both. As such, I will not offer strict typing as an accurate term to describe any of these concepts.

# (D-1) Type checking is done during run time.

Associated languages: Erlang, PHP, Ruby, etc.
Associated terms: Dynamic typing, Weak typing, Loose typing

In this case, absolutely no type checking is done during compile time; it is all done during run time using run time type tags. As a consequence, type annotations are not useful or required in the language. Guards are typically used to test for type during execution.

Is a good definition for: Dynamic typing

By Types and Programming Languages, and from intuition, this would make a good definition for dynamic typing. It is essentially the opposite of definition (S-1).

Loose typing also applies, but just like strict typing, loose typing seems to apply to any of dynamic or weak typing, or both. I will avoid using this term from now on as well.

# (S-2) Symbols can be annotated without being defined.

Associated languages: Java, C++, Haskell, etc.
Associated terms: Static typing, Strong typing, Strict typing

It seems that only languages that have the property (S-1) also have this property, so I am treating this as a possible alternate definition for static typing.

Is a good definition for: None

I would say static typing, but I've changed my mind. Java and C++, both generally regarded as statically typed, allow this. However, in Scala (also statically typed), you cannot do this except in a class definition. Since undefined symbols are then automatically given a default value (as in Java), this definition would not seem to cover Scala. Additionally, the important point is that type checking is done during compilation, and this definition doesn't hit the mark as closely as the previous one.

# (D-2) Symbols cannot be annotated without being defined.

Associated languages: Erlang, PHP, Ruby, etc.
Associated terms: Dynamic typing, Weak typing, Loose typing

As the complement to (S-2), I will consider this as a possible definition for dynamic typing.

Is a good definition for: None

Every single language generally described as dynamically typed has this property. And it also implies that compile time type checking is not possible. However, I could augment any dynamically typed language with (worthless) type annotations that would break this definition. The important bit is that type checking is done during run time, and this definition does not immediately capture this fact.

# (T-1) A symbol's type can at best change to a covariant type.

Associated languages: Java, C++, Haskell, Erlang, etc.
Associated terms: Static typing, Strong typing, Strict typing

A lot of the time, people just say that an identifier's type cannot change. This is trivially the case in Erlang as reassignments are not even allowed in the first place, and Haskell programmers will notice similar effects as there is no Object class that all objects inherit from.

However, in Java and Scala (both supposedly statically and strongly typed), reassignments to covariant types are allowed, while reassignments to other types are not:

// Java example
Object foo = new Integer(5);  // No error
foo = new String("Hello!");   // No error
foo = new Bar();              // No error


Is a good definition for: Strong typing

Note that this is only a good definition for strong typing, not static typing. Disagree? Read on; here comes the confusion.

For one thing, lots of people will use an example like the following in PHP (which is basically the same as the one in Java) to show that PHP is weakly typed:

// PHP example
$foo = 5;$foo = "Hello!"
$foo = new Bar();  ...suggesting that if PHP is weakly typed, then Java must be weakly typed (yet the same people will not agree on this, revealing major inconsistencies in their reasoning). Obviously, looking at covariant types only won't give us the full story. Let's look at assignments to neither covariant nor contravariant types, which PHP allows, but Java does not: // Java example Integer foo = 5; foo = "Hello!" // Compile time error  A problem like this can easily be caught by the compiler, which is another source of confusion; it causes people to think that all such type errors can be caught by the compiler, so that this property and static type checking always go hand in hand. This is simply not true. For example, casting to a contravariant type is allowed in C++ and Java, but in Java, they will result in run time errors (the C++ program may just crash): // Java example Integer foo = 5; foo = (Integer) new Object(); // Run time error  With C++ regarded as a weakly typed language and Java as a strongly typed language (yet they are both static), the fact that Java raises a type error and C++ seems to suggest we have reached a good definition for "strongly typed". One final note: do not confuse shadowing with weak typing. For example, the following Scala code typed in the interpreter raises no errors: // Scala example var x: Int = 5 var x: String = "Hello!"  In fact, I could have done the same thing even if x were a val! This is because the two x's are in different scopes, and the second declaration shadows the previous one. # (W-1) A symbol's type can be changed to any other type. Associated languages: C++, PHP, Ruby Associated terms: Dynamic typing, Weak typing, Loose typing In these languages, no coercion is necessary, although in C++, this behavior may require a cast (with Java, a strongly typed language, the cast will fail at run time). Ruby and PHP will allow you to change a symbols type without a cast or coercion. This behavior is distinct from dynamic typing, as Erlang is dynamically typed, but does not allow changing of a symbol's type at all (suggesting that PHP and Ruby are weakly typed, while Erlang is strongly typed). Is a good definition for: Weak typing Is near the polar opposite of definition (T-1), and deserves the moniker "weak typing". Most people will disagree with this, preferring the following definitions, which I find either subjective or inaccurate... # (T-2) Few or no coercions are automatically applied by the compiler or interpreter. Associated languages: Java, Haskell, Ruby, Erlang, C++, etc. Associated terms: Static typing, Strong typing, Strict typing Many people would ascribe the term "strongly typed" to this definition. Before continuing, read this if you don't know what type coercion is. Now, note that this is a very subjective definition; who is to say what "few" means? Also, in all of the example languages, the compiler provides at least one coercion. For example, in Java, Objects are always automatically coerced into Strings. Is a good definition for: None Despite the subjective issue pointed out above, a bigger problem with this definition exists, and I will explain it in the next section. However, I do feel that there is value in saying that in general, statically typed languages provide few or no coercions. # (W-2) Many coercions are automatically applied by the compiler or interpreter. Associated languages: PHP Associated terms: Dynamic typing, Weak typing, Loose typing Again, a very subjective definition. Is a good definition for: None Many people would ascribe the term "weakly typed" to this definition. ## Saturday, May 17, 2008 ### Using the hosts file and lighttpd to block ads Using the hosts file to block ad sites is more efficient than say, using the adblock plugin for Firefox. With adblock, your computer will still send a request to the spam/advertising site; adblock just hides those ads from your view (at least it did the last time I checked). Also, you are out of luck if a non-web application decides to communicate with a spam site. # Setting up the hosts file The hosts file can be modified to get around these limitations; you can force all spam sites to resolve to your own IP, and Firefox (or any other application) will request some ad from your own machine (and chances are, you won't have it). This cuts down on network traffic and keeps other pesky applications from contacting their parents. On linux, just head over to a site like this and grab an updated hosts file. Just in case you don't know what you're doing, backup your current /etc/hosts, then append the contents of this to it. Finally, delete these lines: 127.0.0.1 localhost 127.0.0.1 localhost.localdomain 255.255.255.255 broadcasthost ::1 localhost  # Fixing the ugly page not found errors Now, wherever you would have seen an ad in Firefox, you get an ugly "Page not found" error instead. To fix this, configure the web server on your computer (assuming you're running one, assuming you're a web developer) to return an empty file on a 404. Assuming you're not a web developer and/or you don't have a web server running, grab lighttpd, or sudo apt-get install lighttpd. Now edit the lighttpd.conf (located at /etc/lighttpd/lighttpd.conf on Ubuntu), and make sure you have these settings by either uncommenting, adding, or editing the config file. Don't touch any of the other settings. server.error-handler-404 = "/404.html" server.bind = "localhost"  Check the server.username and server.groupname parameters. They should both be www-data; keep this in mind. Now go to the /var/www directory and create an empty 404.html: cd /var/www touch 404.html  Then check the permissions of the /var/www directory, and make sure they are owned by www-data (or whatever the server username/groupname for your machine should be). If not, change the ownership: chown -R www-data:www-data /var/www  Now restart lighttpd (or restart your computer), and watch those page not found errors go away. ## Friday, May 16, 2008 ### What's a type conversion? What is a type conversion? What's the difference between an implicit conversion, an explicit conversion, a coercion, or a cast? You've no doubt heard these terms thrown around and you probably know they all loosely describe one (or several) methods to enable a programmer to treat one type as another type. But what do they mean, precisely? Even seasoned programmers may be a little confused about this topic, and this may be a root cause of why many programmers don't know the difference between static, dynamic, strong, and weak typing. Basically, if you think of types in a computer program as mathematical sets, its much easier to understand and remember what all these terms mean. This is not a far stretch if you consider that Type Theory is just one of several versions of Set Theory invented to circumvent Russel's Paradox. Basically, suppose you have two elements a and b, where: a ∈ A b ∈ B  This is congruous to saying an object c is of class C, and an object d is of class D. In Java, we would write: C c = new C(); D d = new D();  Note: The equivalent of "member of" in Java would be the instanceOf operator, as a redditor has pointed out. C c = new C(); is sufficient for c instanceOf C to be true. # Coercions A function f:A->B is an implicit conversion of an element in A to an element in B. This is also called a coercion, because you are starting with an element in A but you really want an element of type B. You can apply f to obtain that element (call it b). In Java, we have a toString method that converts (or more precisely coerces) an Object to a String: System.out.println(c.toString());  # Casts On the other hand, an explicit conversion (or cast) is slightly different, but also converts one type to another. Except imagine sets A and C, where A is a subset of C: a ∈ A A ⊆ C  Obviously, if a is a member of A, and A is a subset of C, then a must be a member of C, and you may use a whenever a member of set A or set C is expected. In programming, when you treat an object as one of its sub-types or super-types, you must perform an action called a cast, and the object itself never changes - just like how the element a never changes. In the following Java example, suppose we have a String. Well, obviously that String is also an Object (a superclass of String), so we can perform a cast to convert a String into an Object: String s = new String("Hello World!"); Object o = (Object) s;  # Confusion Knowing all this, it probably doesn't make sense to say "explicit coercion" or "implicit casting"; however, I do see these terms thrown around a lot. Specifically, "explicit coercion" is usually construed to mean "manual coercion"; eg the developer must manually coerce an object of one type to another. When the compiler does it, people say it is an "implicit coercion" or "implicit conversion"; these terms seem redundant and confusing, so I urge readers to use the terms "automatic coercion" and "manual coercion" instead (unless your intention is to confuse). Some people also draw a distinction between conversion and coercion (defining conversion to mean casting, and coercion to mean ... well, coercion), as if we computer scientists haven't already made enough of a mess by sloppily defining "=" to mean "member of" in Big-Oh notation. Sigh. # Summary So in summary, we know that implicit conversions aka coercions change one type to another type, explicit conversions aka casting change one type to another sub-type or super-type. This is important because the way languages handle type conversions play a part in determining what kind of type system they are using - be it static, dynamic, strong, or weak. I know I wrote a little about type systems earlier, but I'd like to revisit and expand on the topic in the future, with this article at hand to reference. ## Thursday, May 8, 2008 ### Fun with the scala interactive shell Instead of running irb as a calculator, I find myself using the scala interactive shell instead. With the J9 JVM, performance is better, plus you can automatically load files on startup (just like irb). You can also add your own jars to the classpath so they are available in the interactive shell, and automatically load files at startup. Add this to your .bash_profile for the goodness: export CLASSPATH=$CLASSPATH:/path/to/my/lib.jar
alias iss='scala -i ~/.scalarc'


### Better performance with Scala with IBM J9 JVM

The title says it all; Sun's JDK is more stable but slow. I've been using the J9 JDK for a while now and scala (especially the interactive scala shell) performs much faster. You grab the linux JDK from IBM's site here (registration required).

One note though: I couldn't run Maven 2.0.9 or the embedded Maven in Netbeans 6.x (it's the 2.1-SNAPSHOT version of Maven) with the J9 JVM; I have to use Sun's JDK instead.