MathJax

SyntaxHighlighter

Highlight

Custom CSS

Tuesday, May 20, 2008

Typing mechanisms redux - Part 1

Everybody and their grandmother has a has a definition of the following terms:

  • Static typing
  • Dynamic typing
  • Strong typing
  • Weak typing
  • Strict typing
  • Loose typing

A quick search on Google will reveal several different (and usually mutually exclusive) definitions for each of the terms, with some egregious offenders offering conflicting definitions on the same site.

Confusion

Here are just some random examples I found using Google:

  • "[Strong typing means] Strict enforcement of type rules with no exceptions. All types are known at compile time." - [dictionary.die.net]
  • "Static typed programming languages are those in which variables need not be defined before they're used." - [sitepoint.net]
  • "From my POV, strong typing prevents mixing operations between mismatched types" - [artima.com weblogs]
  • "[Ruby is] dynamically but strongly typed (strongly typed as: in Ruby "6" + "2" will give you "62" but in a loosely typed language like Perl or TCL that would give you 8)" - [blogs.codehaus.org]
  • "MATLAB is dynamically typed, meaning that variables can be assigned without declaring their type, and that their type can change." - [wikipedia.org]
  • "Weak typing means that a language implicitly converts (or casts) types when used." - [wikipedia.org]

Awesome, now that we've started off totally confused, I'll try to tidy things up. Rather than starting from the terms and shoehorning definitions, I'll take a look at each of the different definitions, and decide how useful and precise they are at describing certain languages. It turns out that while there is widespread disagreement on what each of the terms mean, everybody seems to "know it when they see it". That is, everybody can agree that Java is static and strong, PHP is dynamic and weak, etc. This instills a modicum of hope that these definitions actually mean something, and we can find them useful.

Definitions

First, I need to get some definitions out of the way.

  • When I say symbol, I mean either a value or expression used in describing a context-free grammar.
  • Coercion and casting has its own article.
  • Finally, by type checking, I am referring to an algorithm for ensuring that a symbol has the correct type.

Now, I'll discuss each definition in turn, and label each one so I can refer to them later:

(S-1) Type checking is done during compile time.

Associated languages: Java, C++, Haskell, etc.
Associated terms: Static typing, Strong typing, Strict typing

If we know the types at compile time, this means that the language we are discussing makes use of type annotations to describe symbols, or type can be inferred from the symbols themselves without running the program. These type annotations can be used to perform type checking. However, it should be noted that type checking during compile time cannot completely eliminate run-time errors. For example, while it is possible to check that arithmetic operators only operate on numbers, it is not possible to check for division by zero.

Is a good definition for: Static typing

This property is sufficient for static type checking in Types and Programming Languages (which, by the way, makes no mention of any of the terms above, except for a note acknowledging that "static typing" and "dynamic typing" have become synonymous with "static type checking" and "dynamic type checking").

No surprise here, as people normally agree that this property is necessary for static typing, but bicker endlessly about whether or not an additional property (usually referred to as strong typing) is also necessary. I do not include strong typing as a candidate term, because definitions (T-1) and (T-2) which normally describe strong typing do not require static typing.

Strict typing also applies to this, but just like the term "loose typing", strict typing seems to apply to any of static or strong typing, or both. As such, I will not offer strict typing as an accurate term to describe any of these concepts.

(D-1) Type checking is done during run time.

Associated languages: Erlang, PHP, Ruby, etc.
Associated terms: Dynamic typing, Weak typing, Loose typing

In this case, absolutely no type checking is done during compile time; it is all done during run time using run time type tags. As a consequence, type annotations are not useful or required in the language. Guards are typically used to test for type during execution.

Is a good definition for: Dynamic typing

By Types and Programming Languages, and from intuition, this would make a good definition for dynamic typing. It is essentially the opposite of definition (S-1).

Loose typing also applies, but just like strict typing, loose typing seems to apply to any of dynamic or weak typing, or both. I will avoid using this term from now on as well.

(S-2) Symbols can be annotated without being defined.

Associated languages: Java, C++, Haskell, etc.
Associated terms: Static typing, Strong typing, Strict typing

It seems that only languages that have the property (S-1) also have this property, so I am treating this as a possible alternate definition for static typing.

Is a good definition for: None

I would say static typing, but I've changed my mind. Java and C++, both generally regarded as statically typed, allow this. However, in Scala (also statically typed), you cannot do this except in a class definition. Since undefined symbols are then automatically given a default value (as in Java), this definition would not seem to cover Scala. Additionally, the important point is that type checking is done during compilation, and this definition doesn't hit the mark as closely as the previous one.

(D-2) Symbols cannot be annotated without being defined.

Associated languages: Erlang, PHP, Ruby, etc.
Associated terms: Dynamic typing, Weak typing, Loose typing

As the complement to (S-2), I will consider this as a possible definition for dynamic typing.

Is a good definition for: None

Every single language generally described as dynamically typed has this property. And it also implies that compile time type checking is not possible. However, I could augment any dynamically typed language with (worthless) type annotations that would break this definition. The important bit is that type checking is done during run time, and this definition does not immediately capture this fact.

(T-1) A symbol's type can at best change to a covariant type.

Associated languages: Java, C++, Haskell, Erlang, etc.
Associated terms: Static typing, Strong typing, Strict typing

A lot of the time, people just say that an identifier's type cannot change. This is trivially the case in Erlang as reassignments are not even allowed in the first place, and Haskell programmers will notice similar effects as there is no Object class that all objects inherit from.

However, in Java and Scala (both supposedly statically and strongly typed), reassignments to covariant types are allowed, while reassignments to other types are not:

// Java example
Object foo = new Integer(5);  // No error
foo = new String("Hello!");   // No error
foo = new Bar();              // No error

Is a good definition for: Strong typing

Note that this is only a good definition for strong typing, not static typing. Disagree? Read on; here comes the confusion.

For one thing, lots of people will use an example like the following in PHP (which is basically the same as the one in Java) to show that PHP is weakly typed:

// PHP example
$foo = 5;
$foo = "Hello!"
$foo = new Bar();

...suggesting that if PHP is weakly typed, then Java must be weakly typed (yet the same people will not agree on this, revealing major inconsistencies in their reasoning). Obviously, looking at covariant types only won't give us the full story. Let's look at assignments to neither covariant nor contravariant types, which PHP allows, but Java does not:

// Java example
Integer foo = 5;
foo = "Hello!" // Compile time error

A problem like this can easily be caught by the compiler, which is another source of confusion; it causes people to think that all such type errors can be caught by the compiler, so that this property and static type checking always go hand in hand. This is simply not true.

For example, casting to a contravariant type is allowed in C++ and Java, but in Java, they will result in run time errors (the C++ program may just crash):

// Java example
Integer foo = 5;
foo = (Integer) new Object();  // Run time error

With C++ regarded as a weakly typed language and Java as a strongly typed language (yet they are both static), the fact that Java raises a type error and C++ seems to suggest we have reached a good definition for "strongly typed".

One final note: do not confuse shadowing with weak typing. For example, the following Scala code typed in the interpreter raises no errors:

// Scala example
var x: Int = 5
var x: String = "Hello!"

In fact, I could have done the same thing even if x were a val! This is because the two x's are in different scopes, and the second declaration shadows the previous one.

(W-1) A symbol's type can be changed to any other type.

Associated languages: C++, PHP, Ruby
Associated terms: Dynamic typing, Weak typing, Loose typing

In these languages, no coercion is necessary, although in C++, this behavior may require a cast (with Java, a strongly typed language, the cast will fail at run time). Ruby and PHP will allow you to change a symbols type without a cast or coercion. This behavior is distinct from dynamic typing, as Erlang is dynamically typed, but does not allow changing of a symbol's type at all (suggesting that PHP and Ruby are weakly typed, while Erlang is strongly typed).

Is a good definition for: Weak typing

Is near the polar opposite of definition (T-1), and deserves the moniker "weak typing". Most people will disagree with this, preferring the following definitions, which I find either subjective or inaccurate...

(T-2) Few or no coercions are automatically applied by the compiler or interpreter.

Associated languages: Java, Haskell, Ruby, Erlang, C++, etc.
Associated terms: Static typing, Strong typing, Strict typing

Many people would ascribe the term "strongly typed" to this definition.

Before continuing, read this if you don't know what type coercion is. Now, note that this is a very subjective definition; who is to say what "few" means? Also, in all of the example languages, the compiler provides at least one coercion. For example, in Java, Objects are always automatically coerced into Strings.

Is a good definition for: None

Despite the subjective issue pointed out above, a bigger problem with this definition exists, and I will explain it in the next section. However, I do feel that there is value in saying that in general, statically typed languages provide few or no coercions.

(W-2) Many coercions are automatically applied by the compiler or interpreter.

Associated languages: PHP
Associated terms: Dynamic typing, Weak typing, Loose typing

Again, a very subjective definition.
Is a good definition for: None

Many people would ascribe the term "weakly typed" to this definition.
Post a Comment