Email address obfuscation in effect -- please
click here to turn it off.
[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
> moved this msg from the main list since this is gettign way ot.
Probably should have started here but tis kewl.
> seems to me you're concentrating on things of little relevance to nlp
> here. mapping tokens into dictionary words is a largely mechanical task.
Umm and what would you do. Map them to nothing whatsoever? A word without
meaning means nothing to nobody. You can work off pure semantics but then
you'll just be babbling like ELISA.
> this is the area where i'd like to see much more explained before i can
> accept your claim that "parsing nl is easy". if i am correct, you're
> suggesting a phrase lookup kind of nlp system here (perhaps with some
> lematization, ie stripping words into their word stems). this is all
> fine if you have a domain limited to people asking for directions to
> known places on the map (see yahoo map engine). the moment you're gonna
> try to map a larger domain in this manner your nlp system will start
> printing "42" in response to every other question, because it'll be
> something you have not predicted and put in the lookup table. not to
> mention that once your table gets large, it'll take forever to maintain
> and use. that's why the symbolic models were developed - to translate
> spoken language into some reduced symbol set for the purpose of
> extracting meaning from it. so efficiency in the long run is weakness
> number one.
All your doing is mapping from a very generic representation of a word
down to an easy to work with form. Known words will already have a key in
memory and likely have some sort of definition as to if they are a verb,
noun, etc.. regardless as to if your using a provided db or one generated
by a learning process (or both). Mapping the words to keys in this manner
saves effort in processing and lets you try to match a known grammar.
What evidence do you have that it breaks down in a large system? I've had
web agents that could take very generic requests and go find the needed
files for the user etc. Also had bots with much higher understanding
abilities than anything else I've seen.
Umm this does translate down into a symbolic representation. What's your
point?
*shrugs* The whole concept of parsing is to take something from a hard to
work with format and translate it into an easy to work with format. What
you do with it after you've parsed it is a whole different issue. A
compilers effeciency certainly isn't based only on how well it parses the
source code.
> the system you describe also has one other basic weakness - it'll fail
> miserably on simple tasks of the type:
> - given the fact that I have a truck, 4x4, a bicycle and a horse, how
> many cars do i have?
> or
> - i left the passenger-side window down today, and it was raining. is
> the left car seat wet or dry?
> ...or any other type of task where it is required that meaning be
> extracted from nl information.
> i may be incorrect on my assumptions as to what you were attempting to
> present here, but i'd very much like to see how you'd go about tackling
> this sort of tasks with the system you described.
It depends on the processing engine you put behind the parser. With a
simple scripted bot then unless you've designed it to answer such
questions then yes it'll fail. If these were the questions you were
targeting then it shouldn't be hard to work with. If your using something
like a neural net then it should do fairly well with such questions
especially if it has the ability to learn.
> symbolic systems do that as well. unassigned symbols are called "free
> variables" and the proportrion of free variables is a measure of degree
> of nlp system's success.
You'll always have to deal with odd input. To see if you can totally screw
over a bot type nonsense lines and see what happens. Dr. Suess or garbly
goop like "DSdds dfs to dsfdsf here fgsgg wow!" seems to exercise it
pretty well. While you want the bot to comprehend Dr. Suess (if that can
be possible of anyone) you probably don't want it to comprehend the
garbled crap. A good bot can pick out parts of a garbled string it
understands and toss the rest away.
> once again, you're trivializing the most important part about natural
> language processing. i remember that your original statement was about
> easiness of "parsing," but you cannot parse a statement into a stream of
> words and expect the expert system to make sense of it - how you parse
> determines what kind of information you'll be able to extract. and your
> argument here cannot be verified because you're trivializing the process
> of extracting meaning (while devoting several paragraphs to discussion
> on tokenizing).
I'm not explaining processing at all. The discussion was on parsing. I
find a stream of keys is a very easy to use method. It's reasonably easy
to match groups of keys in a scripted bot using regular expressions if
nothing else and keys make easy input to pass on to various other
processing engines. Since the original discussion was on IRC bots I'll
assume we can simplify it down to a scripted bot..
If you know that your looking for a question about programming you may
first want to check the input stream for a list of keywords.. (in reality
keys would be int's usually but this is just an example of how it might
look.. I'm not good at psuedo code tho so oh well)
if ( "word:conditional" && "word:statement" in $stream ) {
if ( "grammar:1" in $stream ) {
$words = pull_words ( $grammar, $stream);
my_function ( $words );
}
}
Does that make any sense. Basiclly your usually looking for certain words
in the stream.. sometimes in order.. sometimes out... and will usually use
a reg expr to check.. then you want to check the first grammar key in the
stream so that you can know what type of sentence you were working
with. This would mean checking if it's a statement or a question, if a
word was being used asa noun or verb, etc. Then working on both those
conditions you'll probably want to output your list of words so that your
function can do something with them.. such as remembering what was said or
answering a question.
If you have studied neural nets then the concept of feeding them strings
of keys (symbols) shouldn't seem very surprising. Not sure what else I
could say about that wouldn't be obvious.
--
To unsubscribe, go to http://mlug.missouri.edu/members/edit.php
Archives are available at http://mlug.missouri.edu/list-archives/