Email address obfuscation in effect -- please
click here to turn it off.
[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
- To: "MLUG Members" <EMAIL:PROTECTED>
- Subject: Re: Re: [MLUG] Linux time command -- "kernel mode" and "user mode"
- From: "Jonathan King" <EMAIL:PROTECTED>
- Date: Fri, 3 Nov 2006 22:24:43 -0500
- Delivery-date: Fri, 03 Nov 2006 21:25:20 -0600
- Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=tx8oL2VRJleDLtlz2y/+FNA4tNeSJE9fiL2eUz4v5t1TRVS51WjrWY5g3XU2C64ezoJArX8HWImiMZFnOCQN81BHi8KSMgx8dyPwxHBtAD1plExGD6b937dlEdAmxk5gySo7+JTZl01dMWjeiD5uot83grxjgMvm6evNRrLDxfg=
- Envelope-to: EMAIL:PROTECTED
- In-reply-to: <EMAIL:PROTECTED>
- References: <EMAIL:PROTECTED> <EMAIL:PROTECTED> <EMAIL:PROTECTED> <EMAIL:PROTECTED> <EMAIL:PROTECTED>
- Reply-to: MLUG Members <EMAIL:PROTECTED>
- Sender: EMAIL:PROTECTED
On 11/3/06, Stephen Montgomery-Smith <EMAIL:PROTECTED> wrote:
On Fri, 3 Nov 2006, Stephen Montgomery-Smith wrote:
>
>
> On Fri, 3 Nov 2006, Mike Miller wrote:
>
>> On Fri, 3 Nov 2006, Stephen Montgomery-Smith wrote:
>>
>>> On Fri, 3 Nov 2006, Mike Miller wrote:
>>>
>>>> We have written a program in C++ and we have also written it with a
>>>> slightly different algorithm in Python. We want to see how much slower
>>>> the Python program is than the C++ program. We get these results:
>>>>
>>>> Python:
>>>> ------
>>>> real 10m43.357s
>>>> user 9m51.050s
>>>> sys 0m23.420s
>>>>
>>>> C++:
>>>> ------
>>>> real 8m16.668s
>>>> user 5m38.360s
>>>> sys 2m37.780s
>>
>>> This is my understanding.
>>>
>>> You want to add the user and kernel times together.
>>>
>> [snip good info]
>>>
>>> What does your program do?
>>
>> It reads in a large amount of data (a few million lines) and processes
>> every line by replacing certain strings with others based on a hash table.
>> Then it writes the reformatted data into a collection of about 300 files.
>> So one big file comes in and 300 smaller gzipped files go out.
>>
>> The C++ program reads the whole big file into memory, reformats it and
>> writes it out to 300 files, one at a time. The Python program reads in the
>> big file one line at a time and writes each processed line immediately to
>> one of the 300 files, keeping all 300 open at once.
>>
>> The Python script surely uses minimal memory while the C++ program uses
>> lots of memory. The Python program opens many files at once, but this
>> seems not to be a problem under most conditions, while the C++ file keeps
>> open only one output file at a time.
>>
>> Thanks for the tips, Stephen.
>>
>> Mike
>
> I'm trying to account for the huge kernel time. Is the file you read in very
> big? Maybe the C++ program uses a lot of swap space on the drive. That would
> contribute to a huge kernel time.
Or another possibility. The reformatting of the data in the C++ program
involves a lot of calling malloc, which in turn will probably call things
like mmap. This would push a lot of the tasks from the user to the
kernel. But this would not be a bad thing in of itself, if it actually
leads to slightly better performance. (Whereas using the swap space a lot
strikes me as excessively ugly.)
Something does seem pretty screwy here. He says he's keeping *300*
files open in the Python version, while the C++ is doing things one at
a time. I'm thinking the only thing that's keeping Python in the
ballpark is that it's doing something better with caching writes and
then doing all of them at once. Would it be possible to post the code,
or at least a skeleton that shows the i/o? My guess is that as soon as
they saw the code, half of the C++ dudes on the list could point to
some crucial inefficiency you could fix.
Or, maybe Python really can be pretty clever here.
jking
_______________________________________________
members mailing list
EMAIL:PROTECTED
http://mlug.missouri.edu/mailman/listinfo/members