day2 C

4640e62e · Blaas-Schenner, Claudia · a7eee754 · 4640e62e · 4640e62e · 4640e62e
Commit 4640e62e authored 3 months ago by Blaas-Schenner, Claudia
--- a/C/02_pingpong.ipynb
+++ b/C/02_pingpong.ipynb
@@ -95,7 +95,9 @@
  {
   "cell_type": "markdown",
   "id": "55c185ae-cc9d-4fef-a837-fa43ce81db70",
-   "metadata": {},
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
   "source": [
    "## 0. Step by step (according to the pictures above)"
   ]
@@ -201,7 +203,9 @@
  {
   "cell_type": "markdown",
   "id": "d8383b74-f20b-4864-903a-ab7cd45e25c9",
-   "metadata": {},
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
   "source": [
    "## 1. ping"
   ]
@@ -317,6 +321,7 @@
   "cell_type": "markdown",
   "id": "912708e2-cb48-47d4-bcb7-ab9864d1bd92",
   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true,
    "tags": []
   },
   "source": [
@@ -451,6 +456,7 @@
   "cell_type": "markdown",
   "id": "2e3ccf5a-5a3c-4d46-acf1-cf371adb8c92",
   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true,
    "tags": []
   },
   "source": [
@@ -557,6 +563,7 @@
   "cell_type": "markdown",
   "id": "c7527f1b-a008-4f95-b14b-7c812172f372",
   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true,
    "tags": []
   },
   "source": [
@@ -695,6 +702,7 @@
   "cell_type": "markdown",
   "id": "dda10e76-e3ad-49d4-b354-30e04a4b643b",
   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true,
    "tags": []
   },
   "source": [
@@ -821,7 +829,9 @@
  {
   "cell_type": "markdown",
   "id": "dc77ac5f-5f9d-4898-809f-e9a9dfe943f4",
-   "metadata": {},
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
   "source": [
    "#### Solution (please try to solve the exercise by yourself before looking at the solution)"
   ]
@@ -920,7 +930,9 @@
  {
   "cell_type": "markdown",
   "id": "cfe5e118-ea85-41e7-b147-6c86cad0df96",
-   "metadata": {},
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
   "source": [
    "#### Expected output - What did you measure? Run is a couple of times to see run to run variations!"
   ]
@@ -945,6 +957,7 @@
   "cell_type": "markdown",
   "id": "8430bfc5-c015-4d8b-851d-ac078efed5f4",
   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true,
    "tags": []
   },
   "source": [
@@ -1054,7 +1067,9 @@
  {
   "cell_type": "markdown",
   "id": "04914cc2-fdde-4b18-bacb-e1b080368fa1",
-   "metadata": {},
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
   "source": [
    "#### Solution (please try to solve the exercise by yourself before looking at the solution)"
   ]
@@ -1159,9 +1174,11 @@
  {
   "cell_type": "markdown",
   "id": "52981998-f44a-44f8-85a6-88d3b56fcc63",
-   "metadata": {},
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
   "source": [
-    "#### Expected output - What did you measure? Run is a couple of times to see run to run variations!"
+    "#### Expected output - What did you measure? Run it a couple of times to see run to run variations!"
   ]
  },
  {
@@ -1183,7 +1200,9 @@
  {
   "cell_type": "markdown",
   "id": "e2fb16a7-5e88-4709-994b-2917ee452ade",
-   "metadata": {},
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
   "source": [
    "## 5. finish - who wins the race?"
   ]
@@ -1223,6 +1242,7 @@
   "source": [
    "First name:     ________\n",
    "Measurement on: ________\n",
+    "Programming language: C\n",
    "time for 1 ping in micro seconds with     MPI_Send     MPI_Ssend\n",
    "including first ping pong  in  timing     ________     ________\n",
    "excluding first ping pong from timing     ________     ________"
@@ -1252,14 +1272,6 @@
    "  \n",
    "- <b>Contributing and error reporting:</b> Please send an email to: [training@vsc.ac.at](mailto:training@vsc.ac.at)"
   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f2229195-0e73-4940-97d8-2467b4fb3b6a",
-   "metadata": {},
-   "source": [
-    "&nbsp;"
-   ]
  }
 ],
 "metadata": {

 %% Cell type:markdown id:7b8b3b7c-e15e-466d-b324-a5ce3fd18359 tags:

 # MPI - Let's play ping pong!

 <details>
    <summary markdown="span"><b>Jupyter notebook quick start (click to expand)</b></summary>

 - \<Shift>+\<Return> --> runs a cell
 - ! --> shell escape [! Linux command line]
 - use a (above) and b (below) left of the [ ] to open new cells
 - use m (Markdown) or r (Raw) left of the [ ] to change the cell mode

 <details>
    <summary markdown="span"><b>Click to expand</b></summary>

 If you see a little triangle you can click it to expand further explanations.

 %% Cell type:markdown id:cf431f67-2057-4a85-9030-6c6f1efd193f tags:

 <details>
    <summary markdown="span"><b>Author, acknowledgment, copyright, and license for this notebook</b></summary>

 - <b>Author:</b> Claudia Blaas-Schenner (VSC Research Center, TU Wien), 17 November 2024

 - <b>Based on</b> the [MPI course developed by Rolf Rabenseifner, HLRS](https://www.hlrs.de/training/self-study-materials/mpi-course-material) that is under a quite restrictive copyright by Rolf Rabenseifner and HLRS. The copyrighted material (some images, some exercise descriptions, some code snippets) is used with permission. Some parts taken from the HLRS material are modified and the Jupyter Notebook is extended with own material of the Notebook authors.

 - <b>License:</b> [CC BY-SA 4.0 (Attribution-ShareAlike)](https://creativecommons.org/licenses/by-sa/4.0/)

 - <b>Contributing and error reporting:</b> Please send an email to: [training@vsc.ac.at](mailto:training@vsc.ac.at)

 %% Cell type:markdown id:114509ac-ec55-4c6c-96e3-002f9d571a86 tags:

 #### MPI Standard: [MPI: A Message-Passing Interface Standard Version 4.1 (PDF)](https://www.mpi-forum.org/docs/mpi-4.1/mpi41-report.pdf)

 %% Cell type:markdown id:5d223967-4a35-407d-a2f7-bdbb8c23f2b9 tags:

 #### You can edit the exercises directly in the cells.
 &nbsp;

 %% Cell type:markdown id:01ac1316-a516-477c-b5f0-ae32bbc0bf9e tags:

 ### Contents <br>- explore point-to-point communication with two MPI processes playing ping pong:

 %% Cell type:markdown id:8a6701a3-8e8a-4c96-81c5-0da837632879 tags:

 - 0. Step by step (according to the pictures below):<br>
 - 1. ping - rank 0 sends a message (ping) to rank 1 and rank 1 receives it <br>
 - 2. pingpong - after receiving the ping, rank 1 sends a message (pong) back to rank 0 and rank 0 receives it<br>
 - 3. timing - repeat the ping pong in a loop and add timing calls before and after the loop<br>
 - 4. warmup - don't forget to warmup and do one ping pong before starting the timed loop<br>
 - 5. finish - who wins the race?

 %% Cell type:markdown id:32aed258-f7e0-4b18-a676-1d228f82a644 tags:

 ![02_pingpong](../images/02_pingpong.png "02_pingpong")

 %% Cell type:markdown id:7a0618a1-9194-48ae-9955-d28847eef050 tags:

 &nbsp;

 %% Cell type:markdown id:55c185ae-cc9d-4fef-a837-fa43ce81db70 tags:

 ## 0. Step by step (according to the pictures above)

 %% Cell type:markdown id:e89a732e-5e1b-471d-b13e-89988fd92f42 tags:

 ##### Two MPI processes will play ping pong:
 - Let's write a ping pong benchmark (to measure the latency) step by step
 - Only two MPI processes (rank 0 and rank 1) will be needed (mpirun -np 2 ...)
 - For the ping pong exercise, we'll adopt the MPMD (multiple program multiple data) approach
 - The ping pong ball will be 1 float (but the value is not of interest)
 - Be careful, in the end the two MPI processses should play ping pong with only one ball <br>(not ping-ping pong-pong with two balls)

 <details>
    <summary markdown="span"><b>Further reading - deep dive (blocking point-to-point communication)</b></summary>

 <br>[MPI 4.1 Chapter 3](https://www.mpi-forum.org/docs/mpi-4.1/mpi41-report.pdf#page=73) - Point-to-Point Communication
  - pages 31-69: recommended reading - selected pages from 3.1, 3.2, 3.4, 3.5 <br>

 %% Cell type:markdown id:396a4a40-545a-4086-a9e9-eecb686ad89d tags:

 <details>
    <summary markdown="span"><b>MPI_Send(&buf, count, datatype, dest, tag, comm)</b></summary>

 - **blocking send procedure** (other send modes have the same syntax)
  - source rank sends the message defined by (buf, count, datatype) to the dest(ination) rank
 <br>&nbsp;<br>
 - IN &nbsp;  &nbsp;  &nbsp; buf &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; initial address of send buffer (choice)
 - IN &nbsp;  &nbsp;  &nbsp; count &nbsp; &nbsp; &nbsp; &nbsp; number of elements in send buffer (non-negative
 integer)
 - IN &nbsp;  &nbsp;  &nbsp; datatype &nbsp;  datatype of each send buffer element (handle)
 - IN &nbsp;  &nbsp;  &nbsp; dest &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; rank of destination (integer)
 - IN &nbsp;  &nbsp;  &nbsp; tag &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; message tag (integer)
 - IN &nbsp;  &nbsp;  &nbsp; comm &nbsp;&nbsp; &nbsp; &nbsp; communicator (handle)
 <br>&nbsp;<br>
 - C binding
 <br> int **MPI_Send**(const void ***buf**, int **count**, MPI_Datatype **datatype**, int **dest**, int **tag**, MPI_Comm **comm**)
  - *Usage: &nbsp; MPI_Send(&buffer, 1, MPI_FLOAT, 1, 17, MPI_COMM_WORLD);*
 <br>&nbsp;<br>
 - *Note:*
  - ***MPI_Send** (standard send) is recommended **for production** runs (best speed)*
    <br>--> *let the MPI library decide how to best transfer the message (same risks as MPI_Ssend)*
  - ***MPI_Ssend** (synchronous send) is recommended **for debugging*** (helps to detect deadlocks)
    <br>--> *completes only when the receive has started --> risk of deadlocks and serializations*
  - ***MPI_Bsend** (buffered send) --> not recommended since unnecessarily complicated*
    <br>--> *it's recommended to use MPI_Send or nonblocking communication instead*
  - ***MPI_Rsend** (ready send) --> not recommended because it's highly dangerous to get it wrong*
    <br>--> *it may be started only after the matching receive is already posted (needs additional guaratees)*

 %% Cell type:markdown id:74e26161-4151-4c97-9c03-18128ec78fe0 tags:

 <details>
    <summary markdown="span"><b>MPI_Recv(&buf, count, datatype, source, tag, comm, &status)</b></summary>

 - **blocking receive procedure**
  - dest(ination) rank receives a message from the source rank and stores it at (buf, count, datatype)
 <br>&nbsp;<br>
 - OUT &nbsp; buf &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; initial address of receive buffer (choice)
 - IN &nbsp;  &nbsp;  &nbsp; count &nbsp; &nbsp; &nbsp; &nbsp; number of elements in receive buffer (non-negative
 integer)
 - IN &nbsp;  &nbsp;  &nbsp; datatype &nbsp;  datatype of each receive buffer element (handle)
 - IN &nbsp;  &nbsp;  &nbsp; dest &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; rank of source or MPI_ANY_SOURCE (integer)
 - IN &nbsp;  &nbsp;  &nbsp; tag &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; message tag or MPI_ANY_TAG (integer)
 - IN &nbsp;  &nbsp;  &nbsp; comm &nbsp;&nbsp; &nbsp; &nbsp; communicator (handle)
 - OUT &nbsp; status &nbsp;&nbsp; &nbsp; &nbsp; status object (status)
 <br>&nbsp;<br>
 - C binding
 <br> int **MPI_Recv**(void ***buf**, int **count**, MPI_Datatype **datatype**, int **source**, int **tag**, MPI_Comm **comm**, MPI_Status ***status**)
  - *Usage: &nbsp; MPI_Recv(&buffer, 1, MPI_FLOAT, 0, 17, MPI_COMM_WORLD, &status);*
 <br>&nbsp;<br>
 - *Note:*
  - ***MPI_Recv** completes when the message has arrived*
 <br>--> *only one receive mode is needed that works together with all 4 send modes*

 %% Cell type:markdown id:d6640ca9-8cfa-4c9c-b222-6c1773fb3bed tags:

 **[MPI 4.1 Table 3.2](https://www.mpi-forum.org/docs/mpi-4.1/mpi41-report.pdf#page=76) - Predefined MPI datatypes corresponding to C datatypes**

 %% Cell type:markdown id:fbaa8ca6-6fdc-432c-950d-e8eac2d687c3 tags:

 &nbsp;

 %% Cell type:markdown id:d8383b74-f20b-4864-903a-ab7cd45e25c9 tags:

 ## 1. ping

 %% Cell type:markdown id:03d9d0ce-79ef-4766-9d0a-beb0ff4c9ffc tags:

 ##### The very first ping:

 - rank 0 sends a message (ping) to rank 1
 - rank 1 receives the message (ping) from rank 0
 - the message (ping pong ball) shall be 1 float and please use tag=17 for the ping

 <details>
    <summary markdown="span"><b>What happens if you do NOT modify the cell below? Try it out!</b></summary>

 You can run all cells of part 1 without modifying ping.c in the cell below.<br>
 Give it a try before you actually modify ping.c in the cell below.<br>
 **What happens here? Why is this possible at all?**<br>
 Of course, before you can proceed to the next step (2), you have to modify ping.c in the cell below.<br>

 %% Cell type:code id:adc73ff7-6447-45ab-9344-8ea2967b841d tags:

 ``` python
 %%writefile tmp/ping.c

 #include <stdio.h>
 #include <mpi.h>

 int main(int argc, char *argv[])
 {
  int i, rank;
  float buffer[1];
  MPI_Status status;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);

      printf("I am %i before send ping \n", rank);

      printf("I am %i after  recv ping \n", rank);

  MPI_Finalize();
 }
 ```

 %% Cell type:markdown id:6e1c939f-db58-4c80-8871-aab667ada929 tags:

 ##### Compile:

 %% Cell type:code id:5bf06703-8012-4291-ad59-71d83be99107 tags:

 ``` python
 !cd tmp; mpicc ping.c -o ping
 ```

 %% Cell type:markdown id:5112674f-c2d7-4516-b3c7-ed1121fc4ca4 tags:

 ##### Run:

 %% Cell type:code id:8dd01859-ef72-40d8-a434-b6a499ef2a6c tags:

 ``` python
 !cd tmp; mpirun -np 2 ./ping
 ```

 %% Cell type:markdown id:4e3f19cb-cd03-40f3-a7a4-47ad53c18a93 tags:

 #### Seeing more than 2 output lines?

 %% Cell type:markdown id:8ed5866b-a63a-409c-a388-fa7923b76cad tags:

 If you are seeing more than 2 output lines, please modify / correct ping.c above.
 <br>&nbsp;<br>
 If you have not yet modified ping.c you will see 4 (2 x number of MPI processes) lines of output, i.e.,<br>each MPI process runs the whole code in ping.c which has 2 print statements.

 %% Cell type:markdown id:912708e2-cb48-47d4-bcb7-ab9864d1bd92 tags:

 #### Solution (please try to solve the exercise by yourself before looking at the solution)

 %% Cell type:code id:10a3ae35-acf5-4cf4-9c94-db94e353637a tags:

 ``` python
 %%writefile tmp/ping_solution.c

 #include <stdio.h>
 #include <mpi.h>

 int main(int argc, char *argv[])
 {
  int i, rank;
  float buffer[1];
  MPI_Status status;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    if (rank == 0)
    {
      printf("I am %i before send ping \n", rank);
      MPI_Send(buffer, 1, MPI_FLOAT, 1, 17, MPI_COMM_WORLD);
    }
    else if (rank == 1)
    {
      MPI_Recv(buffer, 1, MPI_FLOAT, 0, 17, MPI_COMM_WORLD, &status);
      printf("I am %i after  recv ping \n", rank);
    }

  MPI_Finalize();
 }
 ```

 %% Cell type:markdown id:34788895-5728-43df-8d90-12a4685d231b tags:

 ##### Compile the solution:

 %% Cell type:code id:6f38f54b-4e24-4d21-bba2-37c322dce283 tags:

 ``` python
 !cd tmp; mpicc ping_solution.c -o ping_solution
 ```

 %% Cell type:markdown id:7566ecac-6878-45aa-b545-36fc0d7e97e3 tags:

 ##### Run the solution:

 %% Cell type:code id:0af3970c-8691-4b43-ae81-3db977614638 tags:

 ``` python
 !cd tmp; mpirun -np 2 ./ping_solution
 ```

 %% Cell type:markdown id:613f39de-a970-4121-981e-b8b13e537296 tags:

 #### Expected output:

 %% Cell type:raw id:28b4a2d7-32e7-4b8b-b579-c80eca0b56b4 tags:

 I am 0 before send ping
 I am 1 after  recv ping

 %% Cell type:markdown id:e0faeaae-2c12-423e-bd19-c650a73d5584 tags:

 #### Unexpected output - but still correct - do you remember why this might happen?

 %% Cell type:raw id:09a0065c-449e-4866-ae1f-6b698e16f633 tags:

 I am 1 after  recv ping
 I am 0 before send ping

 %% Cell type:markdown id:2be9fcee-1be8-47ef-ad53-6ea650bb4365 tags:

 &nbsp;

 %% Cell type:markdown id:2e3ccf5a-5a3c-4d46-acf1-cf371adb8c92 tags:

 ## 2. pingpong

 %% Cell type:markdown id:ef02270b-aa74-4387-976d-5124fedf084f tags:

 ##### Sending back the pong:

 - after receiving the ping, rank 1 sends a message (pong) back to rank 0
 - rank 0 receives the message (pong) from rank 1
 - the message (ping pong ball) shall be 1 float and please use tag=23 for the pong

 %% Cell type:code id:c9258630-975c-41ca-aab4-4e15c76d1e42 tags:

 ``` python
 %%writefile tmp/pingpong.c

 #include <stdio.h>
 #include <mpi.h>

 int main(int argc, char *argv[])
 {
  int i, rank;
  float buffer[1];
  MPI_Status status;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    if (rank == 0)
    {
      printf("I am %i before send ping \n", rank);
      MPI_Send(buffer, 1, MPI_FLOAT, 1, 17, MPI_COMM_WORLD);
      printf("I WILL BE / am %i after  recv ping \n", rank);
    }
    else if (rank == 1)
    {
      MPI_Recv(buffer, 1, MPI_FLOAT, 0, 17, MPI_COMM_WORLD, &status);
      printf("I am %i after  recv ping \n", rank);
      printf("I WILL BE / am %i before send pong \n", rank);
    }

  MPI_Finalize();
 }
 ```

 %% Cell type:markdown id:187c78ba-9dc8-4e43-853a-21a6f4b001ab tags:

 ##### Compile:

 %% Cell type:code id:204eeceb-ec83-4230-b7f7-e75874938018 tags:

 ``` python
 !cd tmp; mpicc pingpong.c -o pingpong
 ```

 %% Cell type:markdown id:87af2510-d138-4a1f-8a6c-0d96830a27c6 tags:

 ##### Run:

 %% Cell type:code id:cd8a2825-0677-4bc0-8ff9-09e6fc190360 tags:

 ``` python
 !cd tmp; mpirun -np 2 ./pingpong
 ```

 %% Cell type:markdown id:c7527f1b-a008-4f95-b14b-7c812172f372 tags:

 #### Solution (please try to solve the exercise by yourself before looking at the solution)

 %% Cell type:code id:9d483267-69ed-44ee-a2fc-57664ecb2524 tags:

 ``` python
 %%writefile tmp/pingpong_solution.c

 #include <stdio.h>
 #include <mpi.h>

 int main(int argc, char *argv[])
 {
  int i, rank;
  float buffer[1];
  MPI_Status status;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    if (rank == 0)
    {
      printf("I am %i before send ping \n", rank);
      MPI_Send(buffer, 1, MPI_FLOAT, 1, 17, MPI_COMM_WORLD);
      MPI_Recv(buffer, 1, MPI_FLOAT, 1, 23, MPI_COMM_WORLD, &status);
      printf("I am %i after  recv pong \n", rank);
    }
    else if (rank == 1)
    {
      MPI_Recv(buffer, 1, MPI_FLOAT, 0, 17, MPI_COMM_WORLD, &status);
      printf("I am %i after  recv ping \n", rank);
      printf("I am %i before send pong \n", rank);
      MPI_Send(buffer, 1, MPI_FLOAT, 0, 23, MPI_COMM_WORLD);
    }

  MPI_Finalize();
 }
 ```

 %% Cell type:markdown id:3b280691-dbba-4814-8ef5-a90dafb02cad tags:

 ##### Compile the solution:

 %% Cell type:code id:bfdfb4e2-fc01-43c0-bff0-3e822d13df10 tags:

 ``` python
 !cd tmp; mpicc pingpong_solution.c -o pingpong_solution
 ```

 %% Cell type:markdown id:8223cf17-10d5-4b05-9d21-dc491b926e80 tags:

 ##### Run the solution:

 %% Cell type:code id:126abe04-69ec-43fb-a64e-75120f785d3a tags:

 ``` python
 !cd tmp; mpirun -np 2 ./pingpong_solution
 ```

 %% Cell type:markdown id:534c3735-a307-46ab-9669-7f58c18c807a tags:

 #### Expected output:

 %% Cell type:raw id:69cbb9bf-8640-4267-be90-c8c05ed4d214 tags:

 I am 0 before send ping
 I am 1 after  recv ping
 I am 1 before send pong
 I am 0 after  recv pong

 %% Cell type:markdown id:e38ea714-e0e4-4cf3-a2af-4809c58923ea tags:

 #### Unexpected output - but still correct - do you remember why this might happen?

 %% Cell type:raw id:3b8bf18e-6c2b-4d27-bfc4-7626da786568 tags:

 I am 0 before send ping
 I am 0 after  recv pong
 I am 1 after  recv ping
 I am 1 before send pong

 %% Cell type:markdown id:0e64b14c-b25c-4ffd-a713-4d6cda8c3d97 tags:

 &nbsp;

 %% Cell type:markdown id:dda10e76-e3ad-49d4-b354-30e04a4b643b tags:

 ## 3. timing

 %% Cell type:markdown id:9f1d84f7-a171-44ea-aa26-2be7d93bd0d3 tags:

 ##### Repeat this in a loop and add timing calls:

 - repeat this ping pong with a loop of length 50
 - add timing calls before and after the loop
 - only rank 0 shall print out the transfer time of one message in micro seconds, i.e., delta_time / (2*50) * 1e6

 Uncomment the 3 // lines and add all other pieces needed in pingpong-bench.c.

 %% Cell type:markdown id:4682e3cb-3fe4-49f6-a7b9-26049402de4f tags:

 <details>
    <summary markdown="span"><b>MPI_Wtime()</b></summary>

 - **timing**
  - returns a floating-point number of seconds, representing elapsed wallclock time since some time in the past
 <br>&nbsp;<br>
 - C binding
 <br> double MPI_Wtime(void)
  - *Usage: &nbsp; time = MPI_Wtime();*

 <details>
    <summary markdown="span"><b>Recommended reading - (timing)</b></summary>

 <br>[MPI 4.1 Section 9.6](https://www.mpi-forum.org/docs/mpi-4.1/mpi41-report.pdf#page=512) - Timers and Synchronization
  - pages 470-471: MPI_Wtime, MPI_Wtick<br>

 %% Cell type:code id:77b59afb-39a3-49c9-849b-f9b2362a3486 tags:

 ``` python
 %%writefile tmp/pingpong-bench.c

 #include <stdio.h>
 #include <mpi.h>

 #define number_of_messages 50

 int main(int argc, char *argv[])
 {
  int i, rank;
  float buffer[1];
  // ??? start, finish, msg_transfer_time;
  MPI_Status status;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    if (rank == 0)
    {
      MPI_Send(buffer, 1, MPI_FLOAT, 1, 17, MPI_COMM_WORLD);
      MPI_Recv(buffer, 1, MPI_FLOAT, 1, 23, MPI_COMM_WORLD, &status);
    }
    else if (rank == 1)
    {
      MPI_Recv(buffer, 1, MPI_FLOAT, 0, 17, MPI_COMM_WORLD, &status);
      MPI_Send(buffer, 1, MPI_FLOAT, 0, 23, MPI_COMM_WORLD);
    }

  if (rank == 0)
  {
    // msg_transfer_time = ((finish - start) / (2 * number_of_messages)) * 1e6 ; // in microsec
    // printf("Time for one messsage: %f micro seconds.\n", msg_transfer_time);
  }

  MPI_Finalize();
 }
 ```

 %% Cell type:markdown id:c59816a2-3f30-4654-a593-033e86f04d35 tags:

 ##### Compile:

 %% Cell type:code id:6b7867e7-20c0-4508-ac1f-b91a1ab01b3b tags:

 ``` python
 !cd tmp; mpicc pingpong-bench.c -o pingpong-bench
 ```

 %% Cell type:markdown id:6544ac43-fc1e-4e7a-995e-bee6bb53fe55 tags:

 ##### Run:

 %% Cell type:code id:f1c99634-771f-4611-9093-3b63b4a54c33 tags:

 ``` python
 !cd tmp; mpirun -np 2 ./pingpong-bench
 ```

 %% Cell type:markdown id:dc77ac5f-5f9d-4898-809f-e9a9dfe943f4 tags:

 #### Solution (please try to solve the exercise by yourself before looking at the solution)

 %% Cell type:code id:80d66d09-debc-4ea2-8094-1334dfa8cfd2 tags:

 ``` python
 %%writefile tmp/pingpong-bench_solution.c

 #include <stdio.h>
 #include <mpi.h>

 #define number_of_messages 50

 int main(int argc, char *argv[])
 {
  int i, rank;
  float buffer[1];
  double start, finish, msg_transfer_time;
  MPI_Status status;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);

  start = MPI_Wtime();
  for (i = 1; i <= number_of_messages; i++)
  {
    if (rank == 0)
    {
      MPI_Send(buffer, 1, MPI_FLOAT, 1, 17, MPI_COMM_WORLD);
      MPI_Recv(buffer, 1, MPI_FLOAT, 1, 23, MPI_COMM_WORLD, &status);
    }
    else if (rank == 1)
    {
      MPI_Recv(buffer, 1, MPI_FLOAT, 0, 17, MPI_COMM_WORLD, &status);
      MPI_Send(buffer, 1, MPI_FLOAT, 0, 23, MPI_COMM_WORLD);
    }
  }
  finish = MPI_Wtime();

  if (rank == 0)
  {
    msg_transfer_time = ((finish - start) / (2 * number_of_messages)) * 1e6 ; // in microsec
    printf("Time for one messsage: %f micro seconds.\n", msg_transfer_time);
  }

  MPI_Finalize();
 }
 ```

 %% Cell type:markdown id:f5d5015c-bf2c-430a-a0c8-2fd9d51067e1 tags:

 ##### Compile the solution:

 %% Cell type:code id:ec6ddfe1-d933-4a54-8981-1b8af84bf3eb tags:

 ``` python
 !cd tmp; mpicc pingpong-bench_solution.c -o pingpong-bench_solution
 ```

 %% Cell type:markdown id:c5297a2d-1912-4df2-8bbf-f885a56cdb48 tags:

 ##### Run the solution:

 %% Cell type:code id:bf03d21b-e9aa-4611-aae3-9d5a891580e8 tags:

 ``` python
 !cd tmp; mpirun -np 2 ./pingpong-bench_solution
 ```

 %% Cell type:markdown id:cfe5e118-ea85-41e7-b147-6c86cad0df96 tags:

 #### Expected output - What did you measure? Run is a couple of times to see run to run variations!

 %% Cell type:raw id:d9c91732-ae33-48fd-8447-3df5be66c40c tags:

 Time for one messsage: 0.440590 micro seconds.

 %% Cell type:markdown id:b3be7b96-afc8-4303-9b5f-ebd0e23d489f tags:

 &nbsp;

 %% Cell type:markdown id:8430bfc5-c015-4d8b-851d-ac078efed5f4 tags:

 ## 4. warmup

 %% Cell type:markdown id:1c6f3a2f-869d-4d63-b69d-b2071b6fc486 tags:

 Don't forget to warmup and do one ping pong before starting the timed loop:

 %% Cell type:code id:316bf1bb-45a2-48b6-ae2e-8aa142a156ee tags:

 ``` python
 %%writefile tmp/pingpong-bench1.c

 #include <stdio.h>
 #include <mpi.h>

 #define number_of_messages 50

 int main(int argc, char *argv[])
 {
  int i, rank;
  float buffer[1];
  double start, finish, msg_transfer_time;
  MPI_Status status;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);

  start = MPI_Wtime();
  for (i = 1; i <= number_of_messages; i++)
  {
    if (rank == 0)
    {
      MPI_Send(buffer, 1, MPI_FLOAT, 1, 17, MPI_COMM_WORLD);
      MPI_Recv(buffer, 1, MPI_FLOAT, 1, 23, MPI_COMM_WORLD, &status);
    }
    else if (rank == 1)
    {
      MPI_Recv(buffer, 1, MPI_FLOAT, 0, 17, MPI_COMM_WORLD, &status);
      MPI_Send(buffer, 1, MPI_FLOAT, 0, 23, MPI_COMM_WORLD);
    }
  }
  finish = MPI_Wtime();

  if (rank == 0)
  {
    msg_transfer_time = ((finish - start) / (2 * number_of_messages)) * 1e6 ; // in microsec
    printf("Time for one messsage: %f micro seconds.\n", msg_transfer_time);
  }

  MPI_Finalize();
 }
 ```

 %% Cell type:markdown id:a7e2ce30-a697-4976-91e0-f09e37fb9ddb tags:

 ##### Compile:

 %% Cell type:code id:3143449a-2416-48ee-b7b9-ce9f35a0c402 tags:

 ``` python
 !cd tmp; mpicc pingpong-bench1.c -o pingpong-bench1
 ```

 %% Cell type:markdown id:17d6afeb-c416-4bd8-9f3e-cc2315434776 tags:

 ##### Run:

 %% Cell type:code id:64fbbd61-0768-4406-a5ec-b110bc2ab7ad tags:

 ``` python
 !cd tmp; mpirun -np 2 ./pingpong-bench1
 ```

 %% Cell type:markdown id:04914cc2-fdde-4b18-bacb-e1b080368fa1 tags:

 #### Solution (please try to solve the exercise by yourself before looking at the solution)

 %% Cell type:code id:0030f750-2786-4ee3-9698-b22700bdc9e7 tags:

 ``` python
 %%writefile tmp/pingpong-bench1_solution.c

 #include <stdio.h>
 #include <mpi.h>

 #define number_of_messages 50

 int main(int argc, char *argv[])
 {
  int i, rank;
  float buffer[1];
  double start, finish, msg_transfer_time;
  MPI_Status status;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);

  if (rank == 0)
  {
    MPI_Send(buffer, 1, MPI_FLOAT, 1, 17, MPI_COMM_WORLD);
    MPI_Recv(buffer, 1, MPI_FLOAT, 1, 23, MPI_COMM_WORLD, &status);
  }
  else if (rank == 1)
  {
    MPI_Recv(buffer, 1, MPI_FLOAT, 0, 17, MPI_COMM_WORLD, &status);
    MPI_Send(buffer, 1, MPI_FLOAT, 0, 23, MPI_COMM_WORLD);
  }

  start = MPI_Wtime();
  for (i = 1; i <= number_of_messages; i++)
  {
    if (rank == 0)
    {
      MPI_Send(buffer, 1, MPI_FLOAT, 1, 17, MPI_COMM_WORLD);
      MPI_Recv(buffer, 1, MPI_FLOAT, 1, 23, MPI_COMM_WORLD, &status);
    }
    else if (rank == 1)
    {
      MPI_Recv(buffer, 1, MPI_FLOAT, 0, 17, MPI_COMM_WORLD, &status);
      MPI_Send(buffer, 1, MPI_FLOAT, 0, 23, MPI_COMM_WORLD);
    }
  }
  finish = MPI_Wtime();

  if (rank == 0)
  {
    msg_transfer_time = ((finish - start) / (2 * number_of_messages)) * 1e6 ; // in microsec
    printf("Time for one messsage: %f micro seconds.\n", msg_transfer_time);
  }

  MPI_Finalize();
 }
 ```

 %% Cell type:markdown id:24cffd34-bc15-4953-a956-1876eae47b24 tags:

 ##### Compile the solution:

 %% Cell type:code id:9e0f7292-79f1-4c40-81ea-eb8f64792480 tags:

 ``` python
 !cd tmp; mpicc pingpong-bench1_solution.c -o pingpong-bench1_solution
 ```

 %% Cell type:markdown id:6fe56122-96ff-41e4-b88d-8f78e2145024 tags:

 ##### Run the solution:

 %% Cell type:code id:85ba029e-e31b-4571-be33-6b8b7580927f tags:

 ``` python
 !cd tmp; mpirun -np 2 ./pingpong-bench1_solution
 ```

 %% Cell type:markdown id:52981998-f44a-44f8-85a6-88d3b56fcc63 tags:

-#### Expected output - What did you measure? Run is a couple of times to see run to run variations!
+#### Expected output - What did you measure? Run it a couple of times to see run to run variations!

 %% Cell type:raw id:9fa5677c-6b45-4ac7-afe5-eae3905030d7 tags:

 Time for one messsage: 0.134900 micro seconds.

 %% Cell type:markdown id:5c9010ae-081f-406a-8b0d-d95900384005 tags:

 &nbsp;

 %% Cell type:markdown id:e2fb16a7-5e88-4709-994b-2917ee452ade tags:

 ## 5. finish - who wins the race?

 %% Cell type:markdown id:e4b9e455-a4f8-442a-bfc2-638393d64d22 tags:

 Please do a couple of time measurements - run a couple of times each and note down your fastest result for:
 - MPI_Send - including the first ping pong in the time measurement (result of 3. timing)
 - MPI_Send - excluding the first ping pong from the time measurements (result of 4. warmup)
 <br>&nbsp;<br>
 - MPI_Ssend - including the first ping pong in the time measurement (you'll have to edit/copy from above)
 - MPI_Ssend - excluding the first ping pong from the time measurements (you'll have to edit/copy from above)

 You can do these measurements on different systems and in different environments, e.g.:
 - VSC JupyterHub using VSC-5 or VSC-4
 - Submitting jobs to VSC-5 or VSC-4 and playing around with pinning (see previous *01_hello.ipynb*)
  - put both processes on the same NUMA domain
  - put the two processes on different NUMA domains but still on the same CPU/socket
  - put the two processes on different CPUs/sockets on the same node
  - put them on different nodes and both on CPU/socket 0
  - put them on different nodes and both on CPU/socket 1
  - put them on different nodes and one on CPU/socket 0 and the other on CPU/socket 1
 - With submitting jobs you can also witch to another MPI library (e.g. Intel-MPI) and do the same.
 - Run the ping pong benchmark on your own laptop and/or on another HPC system you have access to.

 **Record your results below we would like to see who wins the race?**
 <br>(Copy the cell below to record all your measurements on different systems and in different environments.)

 %% Cell type:raw id:78af8285-280b-406d-8b3b-df5f96261867 tags:

 First name:     ________
 Measurement on: ________
+Programming language: C
 time for 1 ping in micro seconds with     MPI_Send     MPI_Ssend
 including first ping pong  in  timing     ________     ________
 excluding first ping pong from timing     ________     ________

 %% Cell type:markdown id:dae2770a-3e88-4d90-bd86-b25486636d31 tags:

 ## &nbsp;

 %% Cell type:markdown id:6ec0d3a4-a768-404d-9b64-93f70fca28b3 tags:

 <details>
    <summary markdown="span"><b>Author, acknowledgment, copyright, and license for this notebook</b></summary>

 - <b>Author:</b> Claudia Blaas-Schenner (VSC Research Center, TU Wien), 17 November 2024

 - <b>Based on</b> the [MPI course developed by Rolf Rabenseifner, HLRS](https://www.hlrs.de/training/self-study-materials/mpi-course-material) that is under a quite restrictive copyright by Rolf Rabenseifner and HLRS. The copyrighted material (some images, some exercise descriptions, some code snippets) is used with permission. Some parts taken from the HLRS material are modified and the Jupyter Notebook is extended with own material of the Notebook authors.

 - <b>License:</b> [CC BY-SA 4.0 (Attribution-ShareAlike)](https://creativecommons.org/licenses/by-sa/4.0/)

 - <b>Contributing and error reporting:</b> Please send an email to: [training@vsc.ac.at](mailto:training@vsc.ac.at)
-
-%% Cell type:markdown id:f2229195-0e73-4940-97d8-2467b4fb3b6a tags:
-
-&nbsp;

--- a/C/03_ring.ipynb
+++ b/C/03_ring.ipynb
--- a/C/04_allreduce.ipynb
+++ b/C/04_allreduce.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "7b8b3b7c-e15e-466d-b324-a5ce3fd18359",
+   "metadata": {},
+   "source": [
+    "# MPI - MPI_Allreduce (ring)!\n",
+    "\n",
+    "<details>\n",
+    "    <summary markdown=\"span\"><b>Jupyter notebook quick start (click to expand)</b></summary>\n",
+    "\n",
+    "- \\<Shift>+\\<Return> --> runs a cell\n",
+    "- ! --> shell escape [! Linux command line]\n",
+    "- use a (above) and b (below) left of the [ ] to open new cells\n",
+    "- use m (Markdown) or r (Raw) left of the [ ] to change the cell mode\n",
+    "\n",
+    "<details>\n",
+    "    <summary markdown=\"span\"><b>Click to expand</b></summary>\n",
+    "\n",
+    "If you see a little triangle you can click it to expand further explanations."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f77b0a6b-3cd6-4b56-b1d1-cd480029188a",
+   "metadata": {},
+   "source": [
+    "<details>\n",
+    "    <summary markdown=\"span\"><b>Author, acknowledgment, copyright, and license for this notebook</b></summary>\n",
+    "\n",
+    "- <b>Author:</b> Claudia Blaas-Schenner (VSC Research Center, TU Wien), 18 November 2024\n",
+    "\n",
+    "- <b>Based on</b> the [MPI course developed by Rolf Rabenseifner, HLRS](https://www.hlrs.de/training/self-study-materials/mpi-course-material) that is under a quite restrictive copyright by Rolf Rabenseifner and HLRS. The copyrighted material (some images, some exercise descriptions, some code snippets) is used with permission. Some parts taken from the HLRS material are modified and the Jupyter Notebook is extended with own material of the Notebook authors.\n",
+    "\n",
+    "- <b>License:</b> [CC BY-SA 4.0 (Attribution-ShareAlike)](https://creativecommons.org/licenses/by-sa/4.0/)\n",
+    "  \n",
+    "- <b>Contributing and error reporting:</b> Please send an email to: [training@vsc.ac.at](mailto:training@vsc.ac.at)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6b3ee1b9-4262-4c30-83d3-7fd51436e705",
+   "metadata": {},
+   "source": [
+    "#### MPI Standard: [MPI: A Message-Passing Interface Standard Version 4.1 (PDF)](https://www.mpi-forum.org/docs/mpi-4.1/mpi41-report.pdf)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5d223967-4a35-407d-a2f7-bdbb8c23f2b9",
+   "metadata": {},
+   "source": [
+    "#### You can edit the exercises directly in the cells.\n",
+    "&nbsp;"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "01ac1316-a516-477c-b5f0-ae32bbc0bf9e",
+   "metadata": {},
+   "source": [
+    "### Contents - explore collective communication:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8a6701a3-8e8a-4c96-81c5-0da837632879",
+   "metadata": {},
+   "source": [
+    "- Note, in 03_ring we focused on halo communication, but used just one integer (the ranks) instead of a true halo.\n",
+    "  <br> Now, we step away from the halo communication and will do the previous dummy computation in the most easy way.<br>&nbsp;<br>\n",
+    "- 1. ring - MPI_Allreduce<br>\n",
+    "- 2. ring - MPI_Scan<br>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fbaa8ca6-6fdc-432c-950d-e8eac2d687c3",
+   "metadata": {},
+   "source": [
+    "&nbsp;"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d8383b74-f20b-4864-903a-ab7cd45e25c9",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
+   "source": [
+    "## 1. ring - MPI_Allreduce"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "03d9d0ce-79ef-4766-9d0a-beb0ff4c9ffc",
+   "metadata": {},
+   "source": [
+    "##### Rrewrite the pass-around-the-ring program:\n",
+    "\n",
+    "- use the MPI global reduction to get the global sum of all ranks of the processes in the ring, print it from all processes\n",
+    "- the pass-around-the-ring communication loop must be substituted by one call to the MPI collective reduction routine\n",
+    "- please look into the MPI standard to see the argument list of [MPI_Allreduce](https://www.mpi-forum.org/docs/mpi-4.1/mpi41-report.pdf#page=282)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "adc73ff7-6447-45ab-9344-8ea2967b841d",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "%%writefile tmp/ring_allreduce.c\n",
+    "\n",
+    "#include <stdio.h>\n",
+    "#include <mpi.h>\n",
+    "\n",
+    "int main (int argc, char *argv[])\n",
+    "{\n",
+    "  int rank, size;\n",
+    "  int snd_buf, rcv_buf;\n",
+    "  int right, left;\n",
+    "  int sum, i;\n",
+    "  MPI_Status  status;\n",
+    "  MPI_Request request;\n",
+    "\n",
+    "  MPI_Init(&argc, &argv);\n",
+    "  MPI_Comm_rank(MPI_COMM_WORLD, &rank);\n",
+    "  MPI_Comm_size(MPI_COMM_WORLD, &size);\n",
+    "\n",
+    "  // ----------------------------------------------\n",
+    "  // ------ please substitute whole algorithm -----\n",
+    "  right = (rank+1)      % size;\n",
+    "  left  = (rank-1+size) % size;\n",
+    "\n",
+    "  sum = 0;\n",
+    "  snd_buf = rank;\n",
+    "  for( i = 0; i < size; i++) \n",
+    "  {\n",
+    "    MPI_Issend(&snd_buf, 1, MPI_INT, right, 17, MPI_COMM_WORLD, &request);\n",
+    "    MPI_Recv  (&rcv_buf, 1, MPI_INT, left,  17, MPI_COMM_WORLD, &status);\n",
+    "    MPI_Wait(&request, &status);\n",
+    "    snd_buf = rcv_buf;\n",
+    "    sum += rcv_buf;\n",
+    "  }\n",
+    "  // ------ by one call to a collective routine ---\n",
+    "  // ------  input is rank, output is sum   -------\n",
+    "  // ----------------------------------------------\n",
+    "\n",
+    "  printf (\"PE%i:\\tSum = %i\\n\", rank, sum);\n",
+    "\n",
+    "  MPI_Finalize();\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6e1c939f-db58-4c80-8871-aab667ada929",
+   "metadata": {},
+   "source": [
+    "##### Compile:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5bf06703-8012-4291-ad59-71d83be99107",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!cd tmp; mpicc ring_allreduce.c -o ring_allreduce"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5112674f-c2d7-4516-b3c7-ed1121fc4ca4",
+   "metadata": {},
+   "source": [
+    "##### Run:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8dd01859-ef72-40d8-a434-b6a499ef2a6c",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "!cd tmp; mpirun -np 4 ./ring_allreduce"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4e3f19cb-cd03-40f3-a7a4-47ad53c18a93",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
+   "source": [
+    "#### Play around with different numbers of MPI processes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "68f76d2b-0516-45ab-bcd8-a5d9f717666e",
+   "metadata": {},
+   "source": [
+    "**Correct result = size * (size-1) / 2**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8ed5866b-a63a-409c-a388-fa7923b76cad",
+   "metadata": {},
+   "source": [
+    "**Run with different numbers of MPI processes:**<br>\n",
+    "E.g., in the VSC Jupyterhub, there are 4 physical cores available for the MPI course.<br>\n",
+    "If you want to run with more MPI processes you have to add the option:\n",
+    "**--oversubscribe**<br>\n",
+    "*Note: For performance runs you want to avoid oversubscribing, but it's okay for our little examples.*<br>\n",
+    "*Note: You can not do oversubscribing and pinning at the same time.*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3940eeee-e665-4ded-b75a-5863f15ee222",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!cd tmp; mpirun -np 12 --oversubscribe ./ring_allreduce"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "912708e2-cb48-47d4-bcb7-ab9864d1bd92",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true,
+    "tags": []
+   },
+   "source": [
+    "#### Solution (please try to solve the exercise by yourself before looking at the solution)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "10a3ae35-acf5-4cf4-9c94-db94e353637a",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "%%writefile tmp/ring_allreduce_solution.c\n",
+    "\n",
+    "#include <stdio.h>\n",
+    "#include <mpi.h>\n",
+    "\n",
+    "int main (int argc, char *argv[])\n",
+    "{\n",
+    "  int rank, size;\n",
+    "  int snd_buf, rcv_buf;\n",
+    "  int right, left;\n",
+    "  int sum, i;\n",
+    "  MPI_Status  status;\n",
+    "  MPI_Request request;\n",
+    "\n",
+    "  MPI_Init(&argc, &argv);\n",
+    "  MPI_Comm_rank(MPI_COMM_WORLD, &rank);\n",
+    "  MPI_Comm_size(MPI_COMM_WORLD, &size);\n",
+    "\n",
+    "  MPI_Allreduce (&rank, &sum, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);\n",
+    "\n",
+    "  printf (\"PE%i:\\tSum = %i\\n\", rank, sum);\n",
+    "\n",
+    "  MPI_Finalize();\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "34788895-5728-43df-8d90-12a4685d231b",
+   "metadata": {},
+   "source": [
+    "##### Compile the solution:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6f38f54b-4e24-4d21-bba2-37c322dce283",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "!cd tmp; mpicc ring_allreduce_solution.c -o ring_allreduce_solution"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7566ecac-6878-45aa-b545-36fc0d7e97e3",
+   "metadata": {},
+   "source": [
+    "##### Run the solution:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0af3970c-8691-4b43-ae81-3db977614638",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "!cd tmp; mpirun -np 4 ./ring_allreduce_solution"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "613f39de-a970-4121-981e-b8b13e537296",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "#### Expected output (4 MPI processes):"
+   ]
+  },
+  {
+   "cell_type": "raw",
+   "id": "28b4a2d7-32e7-4b8b-b579-c80eca0b56b4",
+   "metadata": {},
+   "source": [
+    "PE0:\tSum = 6\n",
+    "PE1:\tSum = 6\n",
+    "PE2:\tSum = 6\n",
+    "PE3:\tSum = 6"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2be9fcee-1be8-47ef-ad53-6ea650bb4365",
+   "metadata": {},
+   "source": [
+    "&nbsp;"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2e3ccf5a-5a3c-4d46-acf1-cf371adb8c92",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true,
+    "tags": []
+   },
+   "source": [
+    "## 2. ring - MPI_Scan"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ef02270b-aa74-4387-976d-5124fedf084f",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "##### Rewrite with MPI_Scan (partial sums) instead of MPI_Allreduce (global sum):\n",
+    "\n",
+    "- please look into the MPI standard to see the argument list of [MPI_Scan](https://www.mpi-forum.org/docs/mpi-4.1/mpi41-report.pdf#page=289)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c9258630-975c-41ca-aab4-4e15c76d1e42",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "%%writefile tmp/ring_scan.c\n",
+    "\n",
+    "#include <stdio.h>\n",
+    "#include <mpi.h>\n",
+    "\n",
+    "int main (int argc, char *argv[])\n",
+    "{\n",
+    "  int rank, size;\n",
+    "  int snd_buf, rcv_buf;\n",
+    "  int right, left;\n",
+    "  int sum, i;\n",
+    "  MPI_Status  status;\n",
+    "  MPI_Request request;\n",
+    "\n",
+    "  MPI_Init(&argc, &argv);\n",
+    "  MPI_Comm_rank(MPI_COMM_WORLD, &rank);\n",
+    "  MPI_Comm_size(MPI_COMM_WORLD, &size);\n",
+    "\n",
+    "  MPI_Allreduce (&rank, &sum, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);\n",
+    "\n",
+    "  printf (\"PE%i:\\tSum = %i\\n\", rank, sum);\n",
+    "\n",
+    "  MPI_Finalize();\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "187c78ba-9dc8-4e43-853a-21a6f4b001ab",
+   "metadata": {},
+   "source": [
+    "##### Compile:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "204eeceb-ec83-4230-b7f7-e75874938018",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "!cd tmp; mpicc ring_scan.c -o ring_scan"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "87af2510-d138-4a1f-8a6c-0d96830a27c6",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "##### Run (and sort output):"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cd8a2825-0677-4bc0-8ff9-09e6fc190360",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "!cd tmp; mpirun -np 4 ./ring_scan | sort"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2d850084-4336-4b54-b0a3-c4b96042b156",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
+   "source": [
+    "#### Play around with different numbers of MPI processes"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bc45973a-e8cb-472f-aa64-e8101f5e3918",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!cd tmp; mpirun -np 12 --oversubscribe ./ring_scan_solution | sort"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c7527f1b-a008-4f95-b14b-7c812172f372",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true,
+    "tags": []
+   },
+   "source": [
+    "#### Solution (please try to solve the exercise by yourself before looking at the solution)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9d483267-69ed-44ee-a2fc-57664ecb2524",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "%%writefile tmp/ring_scan_solution.c\n",
+    "\n",
+    "#include <stdio.h>\n",
+    "#include <mpi.h>\n",
+    "\n",
+    "int main (int argc, char *argv[])\n",
+    "{\n",
+    "  int rank, size;\n",
+    "  int snd_buf, rcv_buf;\n",
+    "  int right, left;\n",
+    "  int sum, i;\n",
+    "  MPI_Status  status;\n",
+    "  MPI_Request request;\n",
+    "\n",
+    "  MPI_Init(&argc, &argv);\n",
+    "  MPI_Comm_rank(MPI_COMM_WORLD, &rank);\n",
+    "  MPI_Comm_size(MPI_COMM_WORLD, &size);\n",
+    "\n",
+    "  MPI_Scan (&rank, &sum, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD );\n",
+    "\n",
+    "  printf (\"PE%i:\\tSum = %i\\n\", rank, sum);\n",
+    "\n",
+    "  MPI_Finalize();\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b280691-dbba-4814-8ef5-a90dafb02cad",
+   "metadata": {},
+   "source": [
+    "##### Compile the solution:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "31a63e97-3b84-48ab-a301-92eab829c09b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!cd tmp; mpicc ring_scan_solution.c -o ring_scan_solution"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f4b34423-8112-468a-8462-bd68c47719d9",
+   "metadata": {},
+   "source": [
+    "##### Run the solution (and sort output):"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2dea2fdb-719d-450b-9cf3-17e19127c405",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "!cd tmp; mpirun -np 4 ./ring_scan_solution | sort"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3d60b10b-fc3b-4e10-9132-876426735494",
+   "metadata": {},
+   "source": [
+    "#### Expected output (4 MPI processes):"
+   ]
+  },
+  {
+   "cell_type": "raw",
+   "id": "c0efbbff-1e47-40d5-b79d-9c700d6a2071",
+   "metadata": {},
+   "source": [
+    "PE0:\tSum = 0\n",
+    "PE1:\tSum = 1\n",
+    "PE2:\tSum = 3\n",
+    "PE3:\tSum = 6"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "42ba0a93-f2ad-467c-81fd-ff7c3bcb72de",
+   "metadata": {},
+   "source": [
+    "## &nbsp;"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "87674322-36e6-46c4-be63-ffef70c9cc30",
+   "metadata": {},
+   "source": [
+    "<details>\n",
+    "    <summary markdown=\"span\"><b>Author, acknowledgment, copyright, and license for this notebook</b></summary>\n",
+    "\n",
+    "- <b>Author:</b> Claudia Blaas-Schenner (VSC Research Center, TU Wien), 18 November 2024\n",
+    "\n",
+    "- <b>Based on</b> the [MPI course developed by Rolf Rabenseifner, HLRS](https://www.hlrs.de/training/self-study-materials/mpi-course-material) that is under a quite restrictive copyright by Rolf Rabenseifner and HLRS. The copyrighted material (some images, some exercise descriptions, some code snippets) is used with permission. Some parts taken from the HLRS material are modified and the Jupyter Notebook is extended with own material of the Notebook authors.\n",
+    "\n",
+    "- <b>License:</b> [CC BY-SA 4.0 (Attribution-ShareAlike)](https://creativecommons.org/licenses/by-sa/4.0/)\n",
+    "  \n",
+    "- <b>Contributing and error reporting:</b> Please send an email to: [training@vsc.ac.at](mailto:training@vsc.ac.at)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
+%% Cell type:markdown id:7b8b3b7c-e15e-466d-b324-a5ce3fd18359 tags:
+
+# MPI - MPI_Allreduce (ring)!
+
+<details>
+    <summary markdown="span"><b>Jupyter notebook quick start (click to expand)</b></summary>
+
+- \<Shift>+\<Return> --> runs a cell
+- ! --> shell escape [! Linux command line]
+- use a (above) and b (below) left of the [ ] to open new cells
+- use m (Markdown) or r (Raw) left of the [ ] to change the cell mode
+
+<details>
+    <summary markdown="span"><b>Click to expand</b></summary>
+
+If you see a little triangle you can click it to expand further explanations.
+
+%% Cell type:markdown id:f77b0a6b-3cd6-4b56-b1d1-cd480029188a tags:
+
+<details>
+    <summary markdown="span"><b>Author, acknowledgment, copyright, and license for this notebook</b></summary>
+
+- <b>Author:</b> Claudia Blaas-Schenner (VSC Research Center, TU Wien), 18 November 2024
+
+- <b>Based on</b> the [MPI course developed by Rolf Rabenseifner, HLRS](https://www.hlrs.de/training/self-study-materials/mpi-course-material) that is under a quite restrictive copyright by Rolf Rabenseifner and HLRS. The copyrighted material (some images, some exercise descriptions, some code snippets) is used with permission. Some parts taken from the HLRS material are modified and the Jupyter Notebook is extended with own material of the Notebook authors.
+
+- <b>License:</b> [CC BY-SA 4.0 (Attribution-ShareAlike)](https://creativecommons.org/licenses/by-sa/4.0/)
+
+- <b>Contributing and error reporting:</b> Please send an email to: [training@vsc.ac.at](mailto:training@vsc.ac.at)
+
+%% Cell type:markdown id:6b3ee1b9-4262-4c30-83d3-7fd51436e705 tags:
+
+#### MPI Standard: [MPI: A Message-Passing Interface Standard Version 4.1 (PDF)](https://www.mpi-forum.org/docs/mpi-4.1/mpi41-report.pdf)
+
+%% Cell type:markdown id:5d223967-4a35-407d-a2f7-bdbb8c23f2b9 tags:
+
+#### You can edit the exercises directly in the cells.
+&nbsp;
+
+%% Cell type:markdown id:01ac1316-a516-477c-b5f0-ae32bbc0bf9e tags:
+
+### Contents - explore collective communication:
+
+%% Cell type:markdown id:8a6701a3-8e8a-4c96-81c5-0da837632879 tags:
+
+- Note, in 03_ring we focused on halo communication, but used just one integer (the ranks) instead of a true halo.
+  <br> Now, we step away from the halo communication and will do the previous dummy computation in the most easy way.<br>&nbsp;<br>
+- 1. ring - MPI_Allreduce<br>
+- 2. ring - MPI_Scan<br>
+
+%% Cell type:markdown id:fbaa8ca6-6fdc-432c-950d-e8eac2d687c3 tags:
+
+&nbsp;
+
+%% Cell type:markdown id:d8383b74-f20b-4864-903a-ab7cd45e25c9 tags:
+
+## 1. ring - MPI_Allreduce
+
+%% Cell type:markdown id:03d9d0ce-79ef-4766-9d0a-beb0ff4c9ffc tags:
+
+##### Rrewrite the pass-around-the-ring program:
+
+- use the MPI global reduction to get the global sum of all ranks of the processes in the ring, print it from all processes
+- the pass-around-the-ring communication loop must be substituted by one call to the MPI collective reduction routine
+- please look into the MPI standard to see the argument list of [MPI_Allreduce](https://www.mpi-forum.org/docs/mpi-4.1/mpi41-report.pdf#page=282)
+
+%% Cell type:code id:adc73ff7-6447-45ab-9344-8ea2967b841d tags:
+
+``` python
+%%writefile tmp/ring_allreduce.c
+
+#include <stdio.h>
+#include <mpi.h>
+
+int main (int argc, char *argv[])
+{
+  int rank, size;
+  int snd_buf, rcv_buf;
+  int right, left;
+  int sum, i;
+  MPI_Status  status;
+  MPI_Request request;
+
+  MPI_Init(&argc, &argv);
+  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
+  MPI_Comm_size(MPI_COMM_WORLD, &size);
+
+  // ----------------------------------------------
+  // ------ please substitute whole algorithm -----
+  right = (rank+1)      % size;
+  left  = (rank-1+size) % size;
+
+  sum = 0;
+  snd_buf = rank;
+  for( i = 0; i < size; i++)
+  {
+    MPI_Issend(&snd_buf, 1, MPI_INT, right, 17, MPI_COMM_WORLD, &request);
+    MPI_Recv  (&rcv_buf, 1, MPI_INT, left,  17, MPI_COMM_WORLD, &status);
+    MPI_Wait(&request, &status);
+    snd_buf = rcv_buf;
+    sum += rcv_buf;
+  }
+  // ------ by one call to a collective routine ---
+  // ------  input is rank, output is sum   -------
+  // ----------------------------------------------
+
+  printf ("PE%i:\tSum = %i\n", rank, sum);
+
+  MPI_Finalize();
+}
+```
+
+%% Cell type:markdown id:6e1c939f-db58-4c80-8871-aab667ada929 tags:
+
+##### Compile:
+
+%% Cell type:code id:5bf06703-8012-4291-ad59-71d83be99107 tags:
+
+``` python
+!cd tmp; mpicc ring_allreduce.c -o ring_allreduce
+```
+
+%% Cell type:markdown id:5112674f-c2d7-4516-b3c7-ed1121fc4ca4 tags:
+
+##### Run:
+
+%% Cell type:code id:8dd01859-ef72-40d8-a434-b6a499ef2a6c tags:
+
+``` python
+!cd tmp; mpirun -np 4 ./ring_allreduce
+```
+
+%% Cell type:markdown id:4e3f19cb-cd03-40f3-a7a4-47ad53c18a93 tags:
+
+#### Play around with different numbers of MPI processes
+
+%% Cell type:markdown id:68f76d2b-0516-45ab-bcd8-a5d9f717666e tags:
+
+**Correct result = size * (size-1) / 2**
+
+%% Cell type:markdown id:8ed5866b-a63a-409c-a388-fa7923b76cad tags:
+
+**Run with different numbers of MPI processes:**<br>
+E.g., in the VSC Jupyterhub, there are 4 physical cores available for the MPI course.<br>
+If you want to run with more MPI processes you have to add the option:
+**--oversubscribe**<br>
+*Note: For performance runs you want to avoid oversubscribing, but it's okay for our little examples.*<br>
+*Note: You can not do oversubscribing and pinning at the same time.*
+
+%% Cell type:code id:3940eeee-e665-4ded-b75a-5863f15ee222 tags:
+
+``` python
+!cd tmp; mpirun -np 12 --oversubscribe ./ring_allreduce
+```
+
+%% Cell type:markdown id:912708e2-cb48-47d4-bcb7-ab9864d1bd92 tags:
+
+#### Solution (please try to solve the exercise by yourself before looking at the solution)
+
+%% Cell type:code id:10a3ae35-acf5-4cf4-9c94-db94e353637a tags:
+
+``` python
+%%writefile tmp/ring_allreduce_solution.c
+
+#include <stdio.h>
+#include <mpi.h>
+
+int main (int argc, char *argv[])
+{
+  int rank, size;
+  int snd_buf, rcv_buf;
+  int right, left;
+  int sum, i;
+  MPI_Status  status;
+  MPI_Request request;
+
+  MPI_Init(&argc, &argv);
+  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
+  MPI_Comm_size(MPI_COMM_WORLD, &size);
+
+  MPI_Allreduce (&rank, &sum, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
+
+  printf ("PE%i:\tSum = %i\n", rank, sum);
+
+  MPI_Finalize();
+}
+```
+
+%% Cell type:markdown id:34788895-5728-43df-8d90-12a4685d231b tags:
+
+##### Compile the solution:
+
+%% Cell type:code id:6f38f54b-4e24-4d21-bba2-37c322dce283 tags:
+
+``` python
+!cd tmp; mpicc ring_allreduce_solution.c -o ring_allreduce_solution
+```
+
+%% Cell type:markdown id:7566ecac-6878-45aa-b545-36fc0d7e97e3 tags:
+
+##### Run the solution:
+
+%% Cell type:code id:0af3970c-8691-4b43-ae81-3db977614638 tags:
+
+``` python
+!cd tmp; mpirun -np 4 ./ring_allreduce_solution
+```
+
+%% Cell type:markdown id:613f39de-a970-4121-981e-b8b13e537296 tags:
+
+#### Expected output (4 MPI processes):
+
+%% Cell type:raw id:28b4a2d7-32e7-4b8b-b579-c80eca0b56b4 tags:
+
+PE0:	Sum = 6
+PE1:	Sum = 6
+PE2:	Sum = 6
+PE3:	Sum = 6
+
+%% Cell type:markdown id:2be9fcee-1be8-47ef-ad53-6ea650bb4365 tags:
+
+&nbsp;
+
+%% Cell type:markdown id:2e3ccf5a-5a3c-4d46-acf1-cf371adb8c92 tags:
+
+## 2. ring - MPI_Scan
+
+%% Cell type:markdown id:ef02270b-aa74-4387-976d-5124fedf084f tags:
+
+##### Rewrite with MPI_Scan (partial sums) instead of MPI_Allreduce (global sum):
+
+- please look into the MPI standard to see the argument list of [MPI_Scan](https://www.mpi-forum.org/docs/mpi-4.1/mpi41-report.pdf#page=289)
+
+%% Cell type:code id:c9258630-975c-41ca-aab4-4e15c76d1e42 tags:
+
+``` python
+%%writefile tmp/ring_scan.c
+
+#include <stdio.h>
+#include <mpi.h>
+
+int main (int argc, char *argv[])
+{
+  int rank, size;
+  int snd_buf, rcv_buf;
+  int right, left;
+  int sum, i;
+  MPI_Status  status;
+  MPI_Request request;
+
+  MPI_Init(&argc, &argv);
+  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
+  MPI_Comm_size(MPI_COMM_WORLD, &size);
+
+  MPI_Allreduce (&rank, &sum, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
+
+  printf ("PE%i:\tSum = %i\n", rank, sum);
+
+  MPI_Finalize();
+}
+```
+
+%% Cell type:markdown id:187c78ba-9dc8-4e43-853a-21a6f4b001ab tags:
+
+##### Compile:
+
+%% Cell type:code id:204eeceb-ec83-4230-b7f7-e75874938018 tags:
+
+``` python
+!cd tmp; mpicc ring_scan.c -o ring_scan
+```
+
+%% Cell type:markdown id:87af2510-d138-4a1f-8a6c-0d96830a27c6 tags:
+
+##### Run (and sort output):
+
+%% Cell type:code id:cd8a2825-0677-4bc0-8ff9-09e6fc190360 tags:
+
+``` python
+!cd tmp; mpirun -np 4 ./ring_scan | sort
+```
+
+%% Cell type:markdown id:2d850084-4336-4b54-b0a3-c4b96042b156 tags:
+
+#### Play around with different numbers of MPI processes
+
+%% Cell type:code id:bc45973a-e8cb-472f-aa64-e8101f5e3918 tags:
+
+``` python
+!cd tmp; mpirun -np 12 --oversubscribe ./ring_scan_solution | sort
+```
+
+%% Cell type:markdown id:c7527f1b-a008-4f95-b14b-7c812172f372 tags:
+
+#### Solution (please try to solve the exercise by yourself before looking at the solution)
+
+%% Cell type:code id:9d483267-69ed-44ee-a2fc-57664ecb2524 tags:
+
+``` python
+%%writefile tmp/ring_scan_solution.c
+
+#include <stdio.h>
+#include <mpi.h>
+
+int main (int argc, char *argv[])
+{
+  int rank, size;
+  int snd_buf, rcv_buf;
+  int right, left;
+  int sum, i;
+  MPI_Status  status;
+  MPI_Request request;
+
+  MPI_Init(&argc, &argv);
+  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
+  MPI_Comm_size(MPI_COMM_WORLD, &size);
+
+  MPI_Scan (&rank, &sum, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD );
+
+  printf ("PE%i:\tSum = %i\n", rank, sum);
+
+  MPI_Finalize();
+}
+```
+
+%% Cell type:markdown id:3b280691-dbba-4814-8ef5-a90dafb02cad tags:
+
+##### Compile the solution:
+
+%% Cell type:code id:31a63e97-3b84-48ab-a301-92eab829c09b tags:
+
+``` python
+!cd tmp; mpicc ring_scan_solution.c -o ring_scan_solution
+```
+
+%% Cell type:markdown id:f4b34423-8112-468a-8462-bd68c47719d9 tags:
+
+##### Run the solution (and sort output):
+
+%% Cell type:code id:2dea2fdb-719d-450b-9cf3-17e19127c405 tags:
+
+``` python
+!cd tmp; mpirun -np 4 ./ring_scan_solution | sort
+```
+
+%% Cell type:markdown id:3d60b10b-fc3b-4e10-9132-876426735494 tags:
+
+#### Expected output (4 MPI processes):
+
+%% Cell type:raw id:c0efbbff-1e47-40d5-b79d-9c700d6a2071 tags:
+
+PE0:	Sum = 0
+PE1:	Sum = 1
+PE2:	Sum = 3
+PE3:	Sum = 6
+
+%% Cell type:markdown id:42ba0a93-f2ad-467c-81fd-ff7c3bcb72de tags:
+
+## &nbsp;
+
+%% Cell type:markdown id:87674322-36e6-46c4-be63-ffef70c9cc30 tags:
+
+<details>
+    <summary markdown="span"><b>Author, acknowledgment, copyright, and license for this notebook</b></summary>
+
+- <b>Author:</b> Claudia Blaas-Schenner (VSC Research Center, TU Wien), 18 November 2024
+
+- <b>Based on</b> the [MPI course developed by Rolf Rabenseifner, HLRS](https://www.hlrs.de/training/self-study-materials/mpi-course-material) that is under a quite restrictive copyright by Rolf Rabenseifner and HLRS. The copyrighted material (some images, some exercise descriptions, some code snippets) is used with permission. Some parts taken from the HLRS material are modified and the Jupyter Notebook is extended with own material of the Notebook authors.
+
+- <b>License:</b> [CC BY-SA 4.0 (Attribution-ShareAlike)](https://creativecommons.org/licenses/by-sa/4.0/)
+
+- <b>Contributing and error reporting:</b> Please send an email to: [training@vsc.ac.at](mailto:training@vsc.ac.at)
--- a/images/03_ring.png
+++ b/images/03_ring.png
--- a/tasks/C/03_ring/ring.c
+++ b/tasks/C/03_ring/ring.c
+
+#include <stdio.h>
+#include <mpi.h>
+
+int main (int argc, char *argv[])
+{
+  int rank, size;
+  int snd_buf, rcv_buf;
+  int right, left;
+  int sum, i;
+  MPI_Status  status;
+  // _____
+
+  MPI_Init(&argc, &argv);
+  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
+  MPI_Comm_size(MPI_COMM_WORLD, &size);
+  right = (rank+1)      % size;
+  left  = (rank-1+size) % size;
+
+  sum = 0;
+  snd_buf = rank;
+  for( i = 0; i < size; i++) 
+  {
+    MPI_Send(&snd_buf, 1, MPI_INT, right, 17, MPI_COMM_WORLD);
+    MPI_Recv(&snd_buf, 1, MPI_INT, left,  17, MPI_COMM_WORLD, &status);
+    // _____ WRONG program, it will deadlock with a synchronous communication protocol
+    // _____
+    sum += snd_buf;
+  }
+  printf ("PE%i:\tSum = %i\n", rank, sum);
+
+  MPI_Finalize();
+}
--- a/tasks/C/03_ring/ring_advanced_irecv_issend.c
+++ b/tasks/C/03_ring/ring_advanced_irecv_issend.c
+
+#include <stdio.h>
+#include <mpi.h>
+
+#define to_right 201
+
+int main (int argc, char *argv[])
+{
+  int rank, size;
+  int snd_buf, rcv_buf;
+  int right, left;
+  int sum, i;
+  MPI_Status  status;
+  MPI_Request request;
+
+  MPI_Init(&argc, &argv);
+  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
+  MPI_Comm_size(MPI_COMM_WORLD, &size);
+  right = (rank+1)      % size;
+  left  = (rank-1+size) % size;
+
+  sum = 0;
+  snd_buf = rank;
+  for( i = 0; i < size; i++) 
+  {
+    MPI_Issend(&snd_buf, 1, MPI_INT, right, to_right, MPI_COMM_WORLD, &request);
+    MPI_Recv  (&rcv_buf, 1, MPI_INT, left,  to_right, MPI_COMM_WORLD, &status);
+    MPI_Wait(&request, &status);
+    snd_buf = rcv_buf;
+    sum += rcv_buf;
+  }
+  printf ("PE%i:\tSum = %i\n", rank, sum);
+
+  MPI_Finalize();
+}
--- a/tasks/C/03_ring/ring_advanced_irecv_issend_solution.c
+++ b/tasks/C/03_ring/ring_advanced_irecv_issend_solution.c
+
+#include <stdio.h>
+#include <mpi.h>
+
+#define to_right 201
+
+int main (int argc, char *argv[])
+{
+  int rank, size;
+  int snd_buf, rcv_buf;
+  int right, left;
+  int sum, i;
+  MPI_Status  arr_status[2];
+  MPI_Request arr_request[2];
+
+  MPI_Init(&argc, &argv);
+  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
+  MPI_Comm_size(MPI_COMM_WORLD, &size);
+  right = (rank+1)      % size;
+  left  = (rank-1+size) % size;
+
+  sum = 0;
+  snd_buf = rank;
+
+  for( i = 0; i < size; i++) 
+  {
+    MPI_Irecv (&rcv_buf, 1, MPI_INT, left,  to_right, MPI_COMM_WORLD, &arr_request[0]);
+    MPI_Issend(&snd_buf, 1, MPI_INT, right, to_right, MPI_COMM_WORLD, &arr_request[1]);
+    MPI_Waitall(2, arr_request, arr_status);
+    snd_buf = rcv_buf;
+    sum += rcv_buf;
+  }
+
+  printf ("PE%i:\tSum = %i\n", rank, sum);
+
+  MPI_Finalize();
+}
--- a/tasks/C/03_ring/ring_advanced_irecv_ssend.c
+++ b/tasks/C/03_ring/ring_advanced_irecv_ssend.c
+
+#include <stdio.h>
+#include <mpi.h>
+
+#define to_right 201
+
+int main (int argc, char *argv[])
+{
+  int rank, size;
+  int snd_buf, rcv_buf;
+  int right, left;
+  int sum, i;
+  MPI_Status  status;
+  MPI_Request request;
+
+  MPI_Init(&argc, &argv);
+  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
+  MPI_Comm_size(MPI_COMM_WORLD, &size);
+  right = (rank+1)      % size;
+  left  = (rank-1+size) % size;
+
+  sum = 0;
+  snd_buf = rank;
+  for( i = 0; i < size; i++) 
+  {
+    MPI_Issend(&snd_buf, 1, MPI_INT, right, to_right, MPI_COMM_WORLD, &request);
+    MPI_Recv  (&rcv_buf, 1, MPI_INT, left,  to_right, MPI_COMM_WORLD, &status);
+    MPI_Wait(&request, &status);
+    snd_buf = rcv_buf;
+    sum += rcv_buf;
+  }
+  printf ("PE%i:\tSum = %i\n", rank, sum);
+
+  MPI_Finalize();
+}
--- a/tasks/C/03_ring/ring_advanced_irecv_ssend_solution.c
+++ b/tasks/C/03_ring/ring_advanced_irecv_ssend_solution.c
+
+#include <stdio.h>
+#include <mpi.h>
+
+#define to_right 201
+
+int main (int argc, char *argv[])
+{
+  int rank, size;
+  int snd_buf, rcv_buf;
+  int right, left;
+  int sum, i;
+  MPI_Status  status;
+  MPI_Request request;
+
+  MPI_Init(&argc, &argv);
+  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
+  MPI_Comm_size(MPI_COMM_WORLD, &size);
+  right = (rank+1)      % size;
+  left  = (rank-1+size) % size;
+
+  sum = 0;
+  snd_buf = rank;
+
+  for( i = 0; i < size; i++) 
+  {   
+    MPI_Irecv(&rcv_buf, 1, MPI_INT, left,  to_right, MPI_COMM_WORLD, &request);
+    MPI_Ssend(&snd_buf, 1, MPI_INT, right, to_right, MPI_COMM_WORLD);
+    MPI_Wait(&request, &status);
+    snd_buf = rcv_buf;
+    sum += rcv_buf;
+  }
+  printf ("PE%i:\tSum = %i\n", rank, sum);
+
+  MPI_Finalize();
+}
--- a/tasks/C/03_ring/ring_solution.c
+++ b/tasks/C/03_ring/ring_solution.c
+
+#include <stdio.h>
+#include <mpi.h>
+
+int main (int argc, char *argv[])
+{
+  int rank, size;
+  int snd_buf, rcv_buf;
+  int right, left;
+  int sum, i;
+  MPI_Status  status;
+  MPI_Request request;
+
+  MPI_Init(&argc, &argv);
+  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
+  MPI_Comm_size(MPI_COMM_WORLD, &size);
+  right = (rank+1)      % size;
+  left  = (rank-1+size) % size;
+
+  sum = 0;
+  snd_buf = rank;
+  for( i = 0; i < size; i++) 
+  {
+    MPI_Issend(&snd_buf, 1, MPI_INT, right, 17, MPI_COMM_WORLD, &request);
+    MPI_Recv  (&rcv_buf, 1, MPI_INT, left,  17, MPI_COMM_WORLD, &status);
+    MPI_Wait(&request, &status);
+    snd_buf = rcv_buf;
+    sum += rcv_buf;
+  }
+  printf ("PE%i:\tSum = %i\n", rank, sum);
+
+  MPI_Finalize();
+}
--- a/tasks/C/04_allreduce/ring_allreduce.c
+++ b/tasks/C/04_allreduce/ring_allreduce.c
+
+#include <stdio.h>
+#include <mpi.h>
+
+int main (int argc, char *argv[])
+{
+  int rank, size;
+  int snd_buf, rcv_buf;
+  int right, left;
+  int sum, i;
+  MPI_Status  status;
+  MPI_Request request;
+
+  MPI_Init(&argc, &argv);
+  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
+  MPI_Comm_size(MPI_COMM_WORLD, &size);
+
+  // ----------------------------------------------
+  // ------ please substitute whole algorithm -----
+  right = (rank+1)      % size;
+  left  = (rank-1+size) % size;
+
+  sum = 0;
+  snd_buf = rank;
+  for( i = 0; i < size; i++) 
+  {
+    MPI_Issend(&snd_buf, 1, MPI_INT, right, 17, MPI_COMM_WORLD, &request);
+    MPI_Recv  (&rcv_buf, 1, MPI_INT, left,  17, MPI_COMM_WORLD, &status);
+    MPI_Wait(&request, &status);
+    snd_buf = rcv_buf;
+    sum += rcv_buf;
+  }
+  // ------ by one call to a collective routine ---
+  // ------  input is rank, output is sum   -------
+  // ----------------------------------------------
+
+  printf ("PE%i:\tSum = %i\n", rank, sum);
+
+  MPI_Finalize();
+}
--- a/tasks/C/04_allreduce/ring_allreduce_solution.c
+++ b/tasks/C/04_allreduce/ring_allreduce_solution.c
+
+#include <stdio.h>
+#include <mpi.h>
+
+int main (int argc, char *argv[])
+{
+  int rank, size;
+  int snd_buf, rcv_buf;
+  int right, left;
+  int sum, i;
+  MPI_Status  status;
+  MPI_Request request;
+
+  MPI_Init(&argc, &argv);
+  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
+  MPI_Comm_size(MPI_COMM_WORLD, &size);
+
+  MPI_Allreduce (&rank, &sum, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
+
+  printf ("PE%i:\tSum = %i\n", rank, sum);
+
+  MPI_Finalize();
+}
--- a/tasks/C/04_allreduce/ring_scan.c
+++ b/tasks/C/04_allreduce/ring_scan.c
+
+#include <stdio.h>
+#include <mpi.h>
+
+int main (int argc, char *argv[])
+{
+  int rank, size;
+  int snd_buf, rcv_buf;
+  int right, left;
+  int sum, i;
+  MPI_Status  status;
+  MPI_Request request;
+
+  MPI_Init(&argc, &argv);
+  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
+  MPI_Comm_size(MPI_COMM_WORLD, &size);
+
+  MPI_Allreduce (&rank, &sum, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
+
+  printf ("PE%i:\tSum = %i\n", rank, sum);
+
+  MPI_Finalize();
+}
--- a/tasks/C/04_allreduce/ring_scan_solution.c
+++ b/tasks/C/04_allreduce/ring_scan_solution.c
+
+#include <stdio.h>
+#include <mpi.h>
+
+int main (int argc, char *argv[])
+{
+  int rank, size;
+  int snd_buf, rcv_buf;
+  int right, left;
+  int sum, i;
+  MPI_Status  status;
+  MPI_Request request;
+
+  MPI_Init(&argc, &argv);
+  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
+  MPI_Comm_size(MPI_COMM_WORLD, &size);
+
+  MPI_Scan (&rank, &sum, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD );
+
+  printf ("PE%i:\tSum = %i\n", rank, sum);
+
+  MPI_Finalize();
+}