Email address obfuscation in effect -- please
click here to turn it off.
[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
On Sun, 2 Dec 2007, Michael wrote:
Wget isn't really a scrapper. It's really just a tool for downloading
although they are working on a new version that promises to be more
powerful. e.g. Wget doesn't do anything with style sheets right now but
the new version plans to add support. Wikipedia is also a difficult app
to scrap because it does funky things with URLs. :)
I just figured out a couple of things. If you read the wget man page,
look for the section about these options: -k, --convert-links. The -k
option does not work correctly at least not in wget version 1.10.2 and
earlier versions. It fails when the -O option is used (to name the output
file), and it uses the filename in "name" links to other parts of the file
-- this is a problem because if you rename the file, the links will fail.
It also does not deal with the style sheets correctly (as you said), but
the wget man page says that it will convert links to style sheets.
So wget is a little buggy.
Mike
_______________________________________________
discussion mailing list
EMAIL:PROTECTED
http://mlug.missouri.edu/mailman/listinfo/discussion