MLUG: Re: [MLUG - DISCUSSION] headers gone wild!
Re: [MLUG - DISCUSSION] headers gone wild!
Email address obfuscation in effect -- please click here to turn it off.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

> More data:
>
> perl -pe 's/=\n// ; s/=20/ /g ; s/=EA/\352/g ; s/=E5/\345/g ; 
> s/=E6/\346/g' msg00032.html > ~/www/test_junk.html
>
> I used perl to translate some of the '=' junk and this is what I got:
>
> http://taxa.epi.umn.edu/~mbmiller/test_junk.html
>
> Which seems to be properly repaired.
>
> Note that when I had an '=' at the end of a line, I removed both the 
> '=' and the newline.  I converted '=20' to a single space, and I 
> converted the '=XX' to the appropriate octal from the extended ascii set.
>
> Another interesting fact that you should note, Michael, is that 
> MHonArc (written in perl and GPL'd) seems usually to handle some of 
> this weirdness correctly.  There are many other messages in my BGnews 
> archive that are full of '=' and '=20' in the e-mail text (seen if I 
> 'less' the mbox), but they are translated perfectly in the MHonArc 
> archive so that it is perfectly readable.  So there must be some good 
> documentation out there for this craziness.

Looks like you got a good bead on it. That is similar to what I was 
doing except I wasn't bothering to convert those special characters 
since URLs aren't supposed to have them anyway. Did you try any strings 
that have normal '=' characters in them?

I'll keep messing with it. I have probably more stored email than just 
about anyone (hundreds of gigs worth) so I have a lot of source data to 
work against.

-- 
Michael <EMAIL:PROTECTED>
http://kavlon.org

_______________________________________________
discussion mailing list
EMAIL:PROTECTED
http://mlug.missouri.edu/mailman/listinfo/discussion