Email address obfuscation in effect -- please
click here to turn it off.
[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
On Wed, 31 Oct 2001, Gants, Mark E. (UMC-Student) wrote:
> English is similar to a programming language - it has a very distinct
> syntax or grammar. So you can parse it like it ways a programming
> language depending on the grammar you define for it.
In practice, English text is full of challenges. One of the big ones is
that there is so much ambiguity in the language with respect to things as
basic as the grammatical categories of the lexical items. A word like
"that" can be a demonstrative article, a relative pronoun, a deictic
pronoun, or what is known in the business as a complementizer. Consider:
I know that.
I know that stereo, and it really rocks.
I know a stereo that really rocks.
I know a woman that really rocks. (yes, people do say this)
I told a woman that really rocks.
I told a woman that really rocks a story.
Then there the zillions of cases where the lexical categories are clear,
but the higher order structure is not:
I saw the spy with binoculars (but not the one with a telescope).
I saw the spy with binoculars (but not with just my naked eye).
And then there are things like unbounded dependencies, and the increasing
use of clitics like "'s" to do weird things:
I know the guy with the weird job in St. Joseph's girlfriend.
And then there is the fun we're having with the amazing phenomena
surrounding forms ending with "-ing". (That's a self-referential
sentence.)
I could go on and on. English certainly does have grammatical rules (not
anything like the ones you learned in school), but English, the language,
is not a context-free language, and there ain't a reserved word in sight.
:-) In limited situations, you can often sleaze by with a solution that
gets the job down *without* going through the hell of real parsing, but
that does *not* make English easy to parse. Really parsing English makes
parsing perl seem like complete child's play. It's such a nasty problem
that most usable systems basically punt on it and rely on statistical
regularities to extract whatever is really needed.
jking
--
To unsubscribe, go to http://mlug.missouri.edu/members/edit.php
Archives are available at http://mlug.missouri.edu/list-archives/