D1_04_prof_01_benchmarking_and_profiling.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "afe43763-c985-4bca-9f45-90393b456f22",
   "metadata": {},
   "source": [
    "# Benchmarking and Profiling"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "09fcf501-2a32-4662-91f0-b59c9a43c0e6",
   "metadata": {},
   "source": [
    "## Definitions\n",
    "\n",
    "Understanding and locating the bottlenecks of a program is crucial for its optimization, since these bottlenecks (or hot spots) are the parts where most of the time is spent. Depending on the specific goal, different methods that give us details about different aspects of a program or code can be used:\n",
    "\n",
    "__(Micro-)benchmarks__ are often synonymous with getting a simple measurement of a certain facet of a certain part of the code, an algorithm or even a whole application (e.g. runtime of a function, PFLOPS of a cluster). This allows people to easily gauge and compare an overall measurement. However benchmarks usually don't provide detailed insight into e.g. how the code/program works.\n",
    "\n",
    "__Profiling__ in general means a dynamic in-depth analysis of a program. For example a developer can obtain CPU time and memory requirements, the frequency of certain calls or even the occurrence of particular instructions to aid in optimization. For this a **profiler** is used to track different parameter in a program *profile*. Multiple approaches to profiling are known, each with their own advantages and drawbacks. They differ not only in the precision of their results but also in the overhead they incur on the runtime and how their insights into the resource consumption of a program are gained.\n",
    "\n",
    "__Statistical profiling__ uses sampling of target properties at regular intervals. It can only provide approximate, statistical information since the resolution depends on the sampling period. Still, this kind of profiler allows a developer to extract meaningful metrics of the profiled program. Statistical profilers are usually not intrusive to the target code and also don't affect the execution speed of the program as much as other methods and as such are able to detect problems that would be hidden by other profiler methods.\n",
    "\n",
    "__Deterministic profiling__ makes use of trace functions that are inserted into the program that are meant to be profiled. They enable the profiler to measure the exact elapsed time between certain events, such as function calls, function returns, thrown exceptions, and more in great detail. Depending on where and how the instrumentation of the target code happens (program, library, runtime/interpreter, binary, ...) deterministic profiling can be further subdivided into different categories.\n",
    "\n",
    "For python itself it is only important to know that the interpreter already includes all mechanisms that are necessary for deterministic profiling. Every *function call*, *function return*, *exception*, etc. ... can be monitored by providing a hook for the event which profiler tools can then use to extract profiling information. According to the writers of the python `profile` tool, deterministic profiling in python doesn't cause as much additional overhead to the already interpreted (and therefor slower) language (compared to compiled languages). In addition it is usually not necessary to modify the program code in any way. So this is usually a good starting point.\n",
    "\n",
    "__Metrics__ to shed light on various program characteristics are often things like the __call count__, which counts how often a certain function is called, __execution time__ and __memory usage__. This way\n",
    "* __Hot loops__, i.e. loops with a high number of iterations, can be identified and subsequently optimized\n",
    "* Functions that potentially benefit from __in-lining__ can be spotted by their high call count\n",
    "* Untypical call counts can indicate __bugs__\n",
    "* High memory usage can help to identify __memory leaks__"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d5cdbaa2-783b-4c1b-80d7-cf398dd9f5b9",
   "metadata": {},
   "source": [
    "## Measuring time\n",
    "\n",
    "Most time measuring code snippets will just use a variant that is more or less similar to the following code"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a452694f-c3e0-498c-8d7a-128e9875d356",
   "metadata": {},
   "outputs": [],
   "source": [
    "def sum_numbers(num = 1_000_000):\n",
    "    sum = 0\n",
    "    for i in range(num):\n",
    "        sum = sum + i\n",
    "    return sum"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "920ff983-bbc5-400b-b19e-59a4d209fee7",
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "\n",
    "start = time.time()\n",
    "sum_numbers()\n",
    "end = time.time()\n",
    "print('Took:', (end-start), 's')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e89c4277-570b-4ee3-8e81-69501d6c5748",
   "metadata": {},
   "source": [
    "The above code is not wrong per-se but it also isn't an ideal approach and this becomes apparent when looking at the function documentation in the [python docs](https://docs.python.org/3/library/time.html#time.time)\n",
    "> Note that even though the time is always returned as a floating point number, not all systems provide time with a better precision than 1 second. While this function normally returns non-decreasing values, it can return a lower value than a previous call if the system clock has been set back between the two calls.\n",
    "\n",
    "For time measurements many processors come with built-in **performance counters** nowadays. Performance counters *may* have a higher resolution than the system clock on some systems, in any case they always return non-decreasing values and cannot be adjusted by the system.\n",
    "\n",
    "Luckily python exposes these performance counters via the functions `time.perf_counter()` and `time.perf_counter_ns()` (also see [time.perf_counter](https://docs.python.org/3/library/time.html#time.perf_counter)). This high resolution clock source is especially suitable for the measurement of smaller code blocks, that only take a small duration to execute, and should be **the default** when measuring performance characteristics manually."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bc3d6542-f812-4da1-b872-197aa1f131de",
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "print('time: ', time.get_clock_info('time'))\n",
    "print('perf_counter: ', time.get_clock_info('perf_counter'))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d0b14e41-8a47-488d-b4a3-dce8564b73be",
   "metadata": {},
   "source": [
    "## (Micro-)benchmarks using [timeit](https://docs.python.org/3/library/timeit.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "22e685fa-9aad-4bd7-9fa0-0e3389490693",
   "metadata": {
    "tags": []
   },
   "source": [
    "`timeit` is another very useful package that ships with the python standard library. It allows the user to easily measure the time of execution of code snippets or even small programs.\n",
    "\n",
    "It can be used from the terminal as well as in the code itself. If timeit is used from the commandline it will automatically determine the number of repetitions. This can be overridden by e.g. specifying `--number` (how many times is the statement executed) or by specifying `--repeat` (how many times is the measurement repeated).\n",
    "\n",
    "Regarding time measurement the `timeit` module automatically uses`time.perf_counter()` which utilizes the processor's **performance counters** to measure time.\n",
    "\n",
    "`timeit` **temporarily turns off garbage collection** when measuring. This can lead to different results than those obtained by simply measuring the execution time. If garbage collection is an important part of the performance being measured it can be re-enabled by adding `gc.enable()` as the first statement in the setup string.\n",
    "\n",
    "It must be noted that `timeit` can only be used for __benchmarking__ purposes only i.e. the program in execution is neither inspected nor instrumented."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c7621296-16d5-4d3e-a38c-24e6b73f216b",
   "metadata": {},
   "outputs": [],
   "source": [
    "import timeit\n",
    "print('Execution took: ', timeit.timeit(sum_numbers, number=100), 's')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9a69baf8-5afe-407d-8c03-8f945d6c4fc7",
   "metadata": {},
   "outputs": [],
   "source": [
    "!python3 -m timeit \"'-'.join(str(n) for n in range(100))\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9ab94141-1d34-43c6-b7c6-01219c49f50e",
   "metadata": {},
   "source": [
    "Please note that timeit itself also comes with a small overhead:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "99a65646-26c2-46ef-ae41-4fea935be6b7",
   "metadata": {},
   "outputs": [],
   "source": [
    "import timeit\n",
    "print('Empty timeit: ', timeit.timeit(\"pass\", number=1000), 's')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "61b3cb0b-d6c6-48b6-b413-dfc3f7ce8a72",
   "metadata": {},
   "source": [
    "### `timeit` in Jupyter Notebooks\n",
    "\n",
    "Jupyter already comes with a builtin magic that lets the user easily measure code from within a notebook. It is not necessary to import anything explicitly and it is available as both cell and line magic.\n",
    "\n",
    "Practically `%timeit` also automatically adjusts the number of repetitions - unless specified otherwise by using the arguments `--n` (number of times to run) and `--r` (number of times to repeat `n` runs)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "33e08b49-3566-4e96-abb4-c0837799c526",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%timeit?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c33d0633-f8bf-49b3-b2ee-11a50e6e6654",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "s0 = 'This is a string'\n",
    "s1 = 'and this is another string'\n",
    "s2 = 'with the number'\n",
    "i = 1337"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2f0d9a44-50af-499a-899a-9002d8b2b35b",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%timeit -n 1_000_000 -r 10 s = f'{s0}, {s1} {s2} {i}.'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "26179def-1d73-442c-8b4b-dcf35db08042",
   "metadata": {},
   "outputs": [],
   "source": [
    "%timeit -n 1_000_000 -r 10 s = s0 + ', ' + s1 + ' ' + s2 + ' ' + str(i) + '.'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "42f87c72-e64a-4aeb-a3c5-fe19803df813",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%timeit -n 1_000_000 -r 10 s = '{}, {} {} {}.'.format(s0, s1, s2, str(i))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "76c31280-8680-45fd-b558-618caf66fec2",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%bash\n",
    "# note: we have to use different variable names here since otherwise the notebook\n",
    "# would already do the string interpolation before passing it to python and timeit\n",
    "python3 -m timeit -n 1_000_000 -r 10 \\\n",
    "   -s \"s0 = 'This is a string'; \\\n",
    "       s1 = 'and this is another string'; \\\n",
    "       s2 = 'with the number'; \\\n",
    "       i = 1337\" \\\n",
    "   \"s = f'{s0}, {s1} {s2} {i}.'\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "55c1ddf6-b423-4918-9525-4ed79301de0f",
   "metadata": {},
   "source": [
    "If used as a module, `timeit` only prints the best time out of all repeats to give the user a lower bound.\n",
    "\n",
    "The argument here is, that the higher values are typically not caused by variability in the speed of the python interpreter itself but rather by other processes and task switching interfering with the execution of the statements."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e9dc24ef-3115-48b8-8d68-c30b537dff66",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}