Email address obfuscation in effect -- please
click here to turn it off.
[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
- To: "MLUG Off-Topic Discussion" <EMAIL:PROTECTED>
- Subject: Re: [MLUG - DISCUSSION] Storage Challenge Competition winner
- From: Michael <EMAIL:PROTECTED>
- Date: Sun, 2 Dec 2007 23:47:30 -0700
- Delivery-date: Mon, 03 Dec 2007 00:47:39 -0600
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:in-reply-to:mime-version:content-type:references:x-google-sender-auth; bh=2SVt1lPbjnuzE0ZYfaGNzcUGplU9GawV68xyM0uHZ+U=; b=Y0u4PoQxzZRlYdFqA57rFki9Yx2goZH4BtfCKuookP3XBZ0CVt2qCRI8nJ+IuYm5BKK2PcIdq/uzryoJEcIEmbhkReDQzToXjuxfjog0reGomQDQWRwF6Iuxt3ZYCJx0KvDICagwJSMgkx2bQ8RAkiskzkB0zapEV0fVYJBFAuY=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=received:message-id:date:from:sender:to:subject:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=OPRyXHwFXpbC8YBBxyTBft06WM4jatccilWGn3LaJAoD/T1qhAhjutDos+scJzWJpDnEyAhMAC0hEMyOJG40JemsIjwHLTcqgu0uPQa7dUpLGyJ+vt0xv+CIGse/1NphasRjwP8VgbIL0+gJbSoAOnO16Wk/cEwwqN3egQuFezk=
- Envelope-to: EMAIL:PROTECTED
- In-reply-to: <EMAIL:PROTECTED>
- References: <EMAIL:PROTECTED> <EMAIL:PROTECTED> <EMAIL:PROTECTED> <EMAIL:PROTECTED> <EMAIL:PROTECTED> <EMAIL:PROTECTED>
- Reply-to: MLUG Off-Topic Discussion <EMAIL:PROTECTED>
- Sender: EMAIL:PROTECTED
I wonder how repetitive their data was. You can often compress a 1GB log file into a couple megabytes just by using a normal gzip on it. Seems amazing until you remember how repetitive the log file data is. (And is a good reason to rotate and compress log files.)
I remember playing around with representing data as an algorithm that just treated data as numbers and used simple formulas that could generate the original data with just a couple starting numbers. Great compression but the problem was that it was horribly slow and was often slower than producing the original data or transmitting the original data. An important factor when evaluating compression algorithms I think.
I will look later when my screen is bigger, but this makes it sound
like the petabyte was an intermediate result; maybe they had to store
huge matrices in the middle of the algorithm to optimize the
compression. Most compression algorithms out there are heuristic
rather than optimal for big datasets.
_______________________________________________
discussion mailing list
EMAIL:PROTECTED
http://mlug.missouri.edu/mailman/listinfo/discussion