There's an interesting notion going around in some circles of syntactic theorization, especially here at the University of Maryland thanks to Norbert Hornstein, that a fundamental aspect of grammaticality is, all else being equal, of the so called "convergent" structures that can be generated from an initial set of words and morphemes using a typical Minimalist Program framework, the ones that are acceptable sentences are those that delay move operations as long as possible.

The idea originated with Chomsky back in The Minimalist Program, and I think until Hornstein came along, was relegated to merely attempting to account for otherwise rare and possibly not enormously important phenomena, like existential constructions in English. Basically what Hornstein seems to have done is suggest that it could be used to account for all sorts of things, like say binding facts (what can and can't be the controller, or the raised item, or so forth). Alex Drummond's 895 here at UMD also brought Merge over Move, along with Hornstein's conception of Shortest Move (a formulation of various notions of "minimality" that originate with the A-over-A and superiority constraints back in the day, and more contemporarily, with Rizzi's relativized minimality), to bear on the issue of sidewards movement, in an attempt to show that sidewards movement with these constraints doesn't wildly overgenerate as much as one would think it would. There's also some conjecture that Merge over Move could be reduced to Shortest Move, or that Shortest Move can be reduced to something like "minimize copies" so that you keep things laying around for as little time as possible, or maybe both.

There's something deeply tantalizing about this idea. It reaches into the realm of similar phenomena in physics, where we find, for instance, classical mechanics reformulated as Lagrangian mechanics, in which the path that an object travels between two points, assuming you know what the two points are, will always be some sort of path of least effort given the initial setup of things. Or how light reflecting off a surface, or refracting through an interface of two media, will, if the origin and destination are known, take the fastest path between the two points given the properties of the media travelled through. Or how the force of gravity in General Relativity is in fact merely straight-line, least-effort motion through curved space time, even accelerative motion of falling objects. Or perhaps most exotically, the ideas of Julian Barbour and Lee Smolin and those guys trying to discover unificational principles that treat the observed physics of the universe as related to paths of least effort (of a sort) through the space of all possible configurations of the universe (Barbour supposedly can derive even such slippery concepts as time and space as emergent, non-fundamental macroproperties in this way). It's interesting then that such powerful ideas in physics might be related to ideas in linguistics, tho perhaps a crucial difference between the last idea for physics listed there and the rest of them is that the Barbourian ideas are all what we might describe as "inherently non-local". That is to say, the properties under investigation are properties of configurations of the universe as a whole, and sets of universe configurations, and there's no way of reducing or reproducing them without reference to the universe as a whole (if there were such a way, we'd describe it as being "local"). This is in direct contrast to the other theories there, say, Lagrangian mechanics, which can be formulated perfectly well without the whole picture available to us to investigate — it can be made local. Heck, Lagrangian mechanics is just a reformulation of earlier Newtonian mechanics, which is entirely local. This then demands that we ask the simple question, are Merge over Move, Shortest Move, etc. inherently non-local properties of language, or is there some way that they can be reformulated into a strictly local theory, and if the latter, what insights can we gain from doing so. If the former, then how can it be that such a complex phenomena is connected at all to the brain's processing of language, given that the brain surely isn't computing the staggeringly large alternative structures that can be constructed for any given set of words and morphemes. For just 6 words (two determiners, two nouns, a verb, and an agreement head) there could be as many as about 800 possible configurations in the configuration space (about 580 of those configurations being terminal), assuming no significant a priori constraints on what the move operation applies to, nevermind "least effort" of the relevant sort, and this isn't even a full sentence, just a piece of a full sentence. Such an astonishingly large number of possible structures makes it dubious that the brain is doing anything remotely like a search through the derivation space in this fashion.

Some proposals have been made to get around this combinatorial explosion; the aforementioned 895 by Drummond proposes that you can search the derivations in order of their amount of effort (by abstracting over what items were involved in the operations in question and instead using only the kinds of operations involved, namely, merge and move). So for example we would know that any derivation for five items would involve at minimum four operations — four merges, in this case — possible more (more generally, n items require a minimum of n-1 operations), so we can start at the derivations of length four, and work through them in a sort of binary fashion. Thus if the most recent operation is on the right, and we use "M" for merge operations and "O" for move operations, the "least effort" derivation is MMMM (four merges in a row), the next least effort derivation MMMO (three merges and then a final move), then MMOM, MMOO, MOMM, etc. until the derivations that begin with move operations (which can be ruled out entirely due to their impossibility). This certainly makes search easier, because if we simply stop at the first sequence of operations that converges, we know we've got the set of convergent derivations that are least effort.

But search like this is laborious and thoroughly unenlightening in the grand scheme of things, and only feasible in this case due to the discreteness of the system — you try search in a continuous domain, like say integration of functions over the reals, and there's no general method for finding the full set of solutions by search, only by deduction aka calculus. So ideally what we would want is some set of axioms, employing local constraints, that give us, without search, a way of being sure that blind application of the rules will give us either no converging structures, or only converging structures that are the least effort converging structures. What that set of rules is, I don't know, and I don't think it's a simple question to answer, but it's definition an interesting and important one to atleast try to answer, if for no reason other than the simple curiosity of how Merge over Move etc. could be true as far as they are. It might not be useful to use the strictly local formulation, in the same sense that perhaps Lagrangian mechanics is more useful for doing certain kinds of physics theorizing, but it would atleast be interesting to understand the other side of the coin, if it exists.

