Email address obfuscation in effect -- please
click here to turn it off.
[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
On 11/3/06, Stephen Montgomery-Smith <EMAIL:PROTECTED> wrote:
Jonathan King wrote:
> On 11/3/06, Stephen Montgomery-Smith <EMAIL:PROTECTED> wrote:
>> Or another possibility. The reformatting of the data in the C++ program
>> involves a lot of calling malloc, which in turn will probably call things
>> like mmap. This would push a lot of the tasks from the user to the
>> kernel. But this would not be a bad thing in of itself, if it actually
>> leads to slightly better performance. (Whereas using the swap space a
>> lot
>> strikes me as excessively ugly.)
>
> Something does seem pretty screwy here. He says he's keeping *300*
> files open in the Python version, while the C++ is doing things one at
> a time. I'm thinking the only thing that's keeping Python in the
> ballpark is that it's doing something better with caching writes and
> then doing all of them at once. Would it be possible to post the code,
> or at least a skeleton that shows the i/o? My guess is that as soon as
> they saw the code, half of the C++ dudes on the list could point to
> some crucial inefficiency you could fix.
>
> Or, maybe Python really can be pretty clever here.
My suspicion is that is the kind of problem where C++ isn't going to
give you much of an edge. So I think that to write it in python is a
good way to go.
I'm not sure what the program does, but you could certainly be right.
I've seen situations (especially involving string processing) where
naive perl programs could be faster than the obvious C implementation.
The caching you talk of does take place, but actually it takes place in
just about any decent file-io implementation. For example, 'printf' in
C has caching. It only writes the info to the files once it has
collected about 128 bytes to write out.
Yes, but that's not an especially large buffer. I don't do very much
in Python, but I know in perl you can adjust the caching to some
extent (or at least prevent flushes from happening too often).
The only function that has the
possibility to write a byte at a time is the core function 'write' (or
printf if you first apply the function 'setbuf' which allows you to tune
the caching behavior).
True story: an earlier version of Hugs (the Haskell interpreted
enivronment) actually implemented i/o by writing single characters at
a time. When I pointed out on a mailing list that this was just insane
from an efficiency point of view, one reply was that, well, this was
semantically transparent whereas doing fancier caching was not.
That was about the time I bailed on using Hugs for anthing much.
Getting back to Mike's code, it's stil hard to tell what's going on (I
haven't even seen him post a trace of system call usage or anything)
but it would be interesting to see if tweaking things with setbuf
would or would not improve the C++ times substantially.
jking
_______________________________________________
members mailing list
EMAIL:PROTECTED
http://mlug.missouri.edu/mailman/listinfo/members