Email address obfuscation in effect -- please
click here to turn it off.
[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
> More data:
>
> perl -pe 's/=\n// ; s/=20/ /g ; s/=EA/\352/g ; s/=E5/\345/g ;
> s/=E6/\346/g' msg00032.html > ~/www/test_junk.html
>
> I used perl to translate some of the '=' junk and this is what I got:
>
> http://taxa.epi.umn.edu/~mbmiller/test_junk.html
>
> Which seems to be properly repaired.
>
> Note that when I had an '=' at the end of a line, I removed both the
> '=' and the newline. I converted '=20' to a single space, and I
> converted the '=XX' to the appropriate octal from the extended ascii set.
>
> Another interesting fact that you should note, Michael, is that
> MHonArc (written in perl and GPL'd) seems usually to handle some of
> this weirdness correctly. There are many other messages in my BGnews
> archive that are full of '=' and '=20' in the e-mail text (seen if I
> 'less' the mbox), but they are translated perfectly in the MHonArc
> archive so that it is perfectly readable. So there must be some good
> documentation out there for this craziness.
Looks like you got a good bead on it. That is similar to what I was
doing except I wasn't bothering to convert those special characters
since URLs aren't supposed to have them anyway. Did you try any strings
that have normal '=' characters in them?
I'll keep messing with it. I have probably more stored email than just
about anyone (hundreds of gigs worth) so I have a lot of source data to
work against.
--
Michael <EMAIL:PROTECTED>
http://kavlon.org
_______________________________________________
discussion mailing list
EMAIL:PROTECTED
http://mlug.missouri.edu/mailman/listinfo/discussion