GridFlow 0.7.7 - Profiling Execution Speed

     
 

What is profiling?

It is about getting empiric metrics about the execution of a program. For example, find out which parts of a program consume the most time and/or memory. Usually it's about the time, and this is what GridFlow allows you to measure.

How to get those stats from GridFlow ?

  • create a "@global" object and connect two messageboxes to it, "profiler_reset" and "profiler_dump". The first one resets all counters to zero. The second one gives a top of the busiest objects, with percentages.
  • note that those results are global to a process. That is, if you load several patches in the same process (program instance), then all those patches will be monitored at once. But if you open jMax (or PD) several times at once, then the profiler will not see everything happening on that machine.
  • How do i interpret those stats?

  • Note that some operations may not be monitored, and some of the monitoring may be buggy. I think it's not buggy as it is now, but I may be wrong.
  • The current profiler uses a thing called RDTSC (Pentium only). This is a very high precision clock that is very fast to use. However, *major* imprecisions may come from the fact that an ordinary multitasking OS will run other tasks without stopping/resuming the clock. This may happen randomly; however, it has a much bigger chance of happening in [@in] or [@out], because that's where all the communication with other stuff is (files, sockets, windows, etc).
  • If you make sure that only the bare minimum is actively running on your computer, then [@out] (using x11) would still include the time spent in the x11 server, except in some conditions. This applies to every kind of window output too, because however the data trickles through libraries (sdl, aalib), it has to reach the x11 server and the display driver.
  • The profiler has an impact on the results of the profiler. The profiler includes half of its own influence in its own results, and disregards the other half (or so). Profiling shouldn't add more than 100-300 ticks per message (of which half is counted).
  • Message-passing time is not counted at all. Only time actually spent inside GridFlow objects is counted. This may skew results. Transmission of a grid requires one message, thus we may speak of "grid messages". However, when the message is received, one or several packets may get transmitted, which is done outside of the message system. Each packet contains at most 2048 numbers (adjustable limit), and normally a packet should be at least one quarter of that size unless it is the last one. On RGB grids of widths 640,320,160, the packet size will usually be 1920.

Getting a frames-per-second measure

This section formerly was describing what can now be obtained using the [fps] object class.

acceleration tricks

  • try the profiler and see what it says.
  • i mean really.
  • you can lose a lot of your time accelerating something that isn't really taking execution time.
  • it's faster to work on big grids than on small grids, for the amount of number-crunching you can do.
  • about numbertypes: uint8 is the fastest, followed by int16, int32, float32. (and the first two are faster when MMX is enabled). However it may be difficult to make some effects use int16 or smaller without overflow happening.
  • [@ <<] is a very fast multiplication by powers of two (1, 2, 4, 8, 16, ...). [@ >>] is a very fast division by powers of two.

    from my little experience, normal integer multiplication and division are rather slow, especially on Intel brand. The gap between *,/ and <<,>> is smaller on Cyrix/AMD brand CPUs, but still, try it yourself. (my experience has been on specific models and may not reflect currently common models)

  • [@ & 255] is a very fast [@ % 256], and likewise for other powers of two.
  • for do-nothing operations, "ignore" and "put" are faster than "+ 0" and such...
  • remember that an image twice smaller in height and twice smaller in height will be processed four times as fast (for most effects) so you can get four times more frames per second. It's the "rows*columns*channels" value that makes the biggest difference (usually).
  • If all fails you may recode a jMax/PD/Ruby abstraction into plain Ruby code or C++ code. If your new class is of generic usefulness then maybe it should be added to the releases of GridFlow. Contact me if you need help extending GridFlow.
  • Put often-used files on fast drives. This means don't use NFS (networked file system) for that. The file-to-ram cache can compensate for that up to a certain amount, but the larger the file is, and the most used the file is, the more important it is to put it on a local drive.
 

GridFlow 0.7.7 Documentation
by Mathieu Bouchard matju@sympatico.ca