   PROFILING VISION (mostly achieved)

   One subdir can encompass all the profiling counts for any notionally grouped
   activity.  Even Linux/BSD kernel activity can be kept there via a difference
   of before/after /proc/profile.  The vmlinux path will need to be hacked in
   specially at .pct section creation time.  Line-number granularity profiling
   of the kernel should be do-able with vmlinux, or symbol-level granularity
   with System.map.

   This kernel profiling *in conjunction* with any number of grouped user-level
   processes and shared objects can give as full an accounting as is possible
   with sampling-based profiling.  The *whole system activity* of some vaguely
   defined CPU-bound "task" can be analyzed.

   The fact that loadable kernel modules are not usually tracked by kernel
   profiling systems casts a slight blemish on this.  Need kernel hacks to add
   multiple kernel profiling buffers.

   This vision induces some utility functionalities for count file manipulation.

o  Need a feature to allow creating cumulative and differential sections,
   summing and differencing counts.  For example, extract libc.so counts from
   several different cooperating programs and accumulate them into a new .pct
   file.

   This is pretty cool.  No CPU-dominated activity will go untracked, and
   separating and combining profiling counts will be flexible.  Even
   user-level programs have "phases" of activity and one might want to use
   a count-differencing tool to make sense of what costs what when...

* pct-getln
	Append lines from src files. Assume pct-ln|pct-lnmrg|sort input Does
	this even work?  Is it even desirable...?

o gdb/adb script to generate file:line data from object files and PCs on stdin?
	  This may be more portable than addr2line.  Not sure how easy this is.

o pct-% needs to optionally print out standard errors on the bin count percent.
        E.g., if the count was 81 out of 810 it should be 10% +/- .9%
	Hmm.  It would be wonderful to rely on the Latin-1 +/- character...
	But retain other possibilities for portability.
	Also keep in mind that a few things depend on our first column being
	just one field, so maybe it should look like 10.0+/-0.9% or even the
	particle physics style of 10.0(9)%  Maybe an underscore could replace
	the parenthesized notation... i.e. 10.0_9%  I did have this old idea
	of retaining both percentages and raw counts, too, as in 10.0%@81
	Hmm.

o Need a "count totaller" and a way to compare the total accumulated counts
  against the elapsed and/or scheduled real-time quanta which expired over
  the activity of interest.  This allows one to sanity check things.

o  A compact format intended for forward linear access only would be very easy
   via the simplest possible elimination of bins with a zero count.  This
   would be a big disk space win since 4..8K chunks will be allocated by the
   kernel for rarely sampled program areas.  Unfortunately it will also make
   the .pct file unusable for further data collection.  This can be easily
   accomplished by combining pct-stat and pct-pr with a trivial translator
   program that just re-emits counts and hex addresses.  Hmm.  What about a
   binary/fixed length record protocol for pct-pr?  Could elide bin->ASCII.

o  An Emacs mode that is like 'grep mode' would be nice.  Emacs could load and
   warp and maybe colorize the file/line for any percentage report that the
   user hits return on.  It should trivial to adapt grep-mode to this end.
   In fact, using pct line $* | pct-% | awk '{print "$2" ":" "$1" }' instead of
   'grep -n' and ordinary grep-mode should do it.  Could probably be even more
   efficient/clever with "pct-pr -3"

o  Files may have differing PC translation requirements.  Maybe the "level" or
   a type mask of debugging information should be a field in 'struct pct'?  This
   program could take a list of translators in some preference order.   Without
   the field, the translator program would have to somehow indicate (via early
   exit) that debugging data is inadequate.  Hmm.  addr2line doesn't do this.
   This is tricky.  Perhaps the right notion is for the translator program to
   always "work", but yield an identity translation to ASCII format when debug
   data is too poor.  Then a script wrapper around addr2line would be easy.

o  There is also the per-file offset subtraction issue.  Do we need a more
   general per-file command-line option syntax?  The easiest thing to do is
   allow stamping a type onto files.  If there is an automatic way to infer
   things, a separate program could set the right stamps.  Otherwise users
   will have to know what sort of debugging level is available for a given
   binary. 

o  Also, in addition to numeric types things like "shared lib with debugging"
   could be used so that pct-stat could be more informative.  'file' may be
   usable to have a script/program automagically set the object types.

*  Another final thought along these lines is that differential '-g' compiling
   of object files in a library mandates multiple translator's per file.

** Automatic PC->Source Data Amount Inference:
	run nm: no symbols -> stamp nosym
		symbols -> stamp symbol
		run addr2line over each symbol:
			at least one match -> stamp debugging

o  Add stdin input iteration for send_data in pct-pr, so that input can come
   either from PCT files or from parsing stdin stream.

   One issue with this is that multiple coprocesses will have to be started as
   the the input streams filenames change around.  To keep efficiency we'll
   probably have to mandate that the input stream is sorted, but it could work
   correctly (just more slowly) if it wasn't sorted by shutting down and
   re-starting translator coprocesses.

   A possibly thorny issue is the more complex IO model induced by having input
   come from a file descriptor which is likely attached to a pipe instead of
   from an mmap()d file.  Is waiting for input inside send_data ok?  Maybe.

o  It would seem nice to abstract both input and output iteration a little more
   and just have a 'coprocess IO library -- libcoproc fast, flexible, portable,
   easy to use delegation of serial computation to subprocesses'.  At first
   glance it'd seem that pipelining IO for multiple independent streams would
   be easy and that only cross pipeline dependencies would make things icky.

o  Can/should we parse replies from coprocess with a sscanf "fmt_rp"?  A regex
   matcher would be even better.
 
o  Need to impl _tab handling options

o  Really need /etc/pctrc, $HOME/.pctrc to specifify type-translator table

