Blenderstorm Polls and Ideas:
show ideas | about |
Log in

idea #128: Faster rendering with sub-N tile-splitting



up
30
(+32,-2)
down
Written by mapesdhs the 14 Sep 08 at 09:15. Category: Rendering. Status: New
Description
I've been doing some benchmark tests using SGIs (using
test.blend from www.eofw.org/bench), during which I got
to wondering about how Blender renders the sub-regions
of an image with multiple threads.

Assuming the use of N threads, once there is less than N
areas remaining (call it K) then some threads go unused,
so the tail end of render process is not as fast. Worst
case is if the final region happens to be a complex one
in the image: only one thread is running and it takes
much longer than normal. This wouldn't matter if every
subregion was equally complex, but in real images this is
never the case.

In other words, if there are N threads, the parallelism
drops off as soon as there are N-1 areas left to render.
Unless the overhead kills it, surely it would be better
once K < N to halve the width/height of the remaning K
areas, which would mean being able to use N threads again,
ie. maximum speed. Depending on the resolution of the
image, this could be done once or twice and should speed
up the rendering of the final N-1 pieces quite a lot.

Example: 8 threads (very common these days with the latest
dual/quad-core CPUs). Image split into the default 4 x 4
pieces. When 7 pieces remain, halve the width/height of
the peices, so thus 28 remain, and 8 threads can be used
again. As before, when only 7 pieces of this smaller size
remain, the efficiency will slide, but the final result
will be quicker than without. If the image is large enough,
a further resolution-halving would still be effective. At
some point the thread-management overhead would make
resplitting the remaining pieces not worthwhile (perhaps
this could be monitored in some way and dealt with
automatically), but even 2 splitting stages would be very
beneficial I reckon for a typical PAL/NTSC render.

Alternatively, start the render with a larger no. of
pieces, but Blender's overhead when pieces/threads
start/stop looks kinda highish (if so, better to start
with say 4 x 4 and then subdivide at the end), or is it
just my perception that there's a bit of a pause every
time one thread stops and another starts?

Hmm, would it be possible to render the scenes line by
line instead, using a much higher thread limit? (max
threads just the vertical resolution of the image) This
would maintain maximum parallelism with at worse only a
1% inefficiency if the N-1 remaining lines happen to be
more complex. I've been collating C-Ray render test
times aswell and this is how C-Ray works (just a small
experimental ray-tracing program with a tiny dataset,
but the performance scales very well with threads/cores).
I'm not familiar with how Blender does its rendering
though, so perhaps line-by-line isn't possible.

Yours,

Ian.



Attachments
No attachments.


Duplicates


Comments
No comments.

Post your comment