Hands-on Git


In this hands-on session, you will collaborate on a project of your choice: Bash scripting, High-Performance Computing (HPC), or R programming. The session is designed to introduce practical collaboration techniques using GitHub.

You will learn how to:

  1. set up SSH access to GitHub using a public-private key pair for secure authentication,
  2. collaborate with direct write access to a shared repository,
  3. contribute via pull requests (PRs) when you do not have write access.

By the end of this session, you will be able to confidently participate in or manage collaborative projects on GitHub, with a deeper understanding of workflows that suit different access levels.

Date:
Thursday, June 12, 2025

Time:
9:30am–12:30pm (morning session)

Location:
SFU’s Big Data Hub Presentation Studio (ASB 10900)

Instructor:
Marie-Hélène Burle

Prerequisites:
This session is a follow-up to the introduction to Git.

Software:
We will provide access to one of our Linux systems. To make use of it, attendees will need a remote secure shell (SSH) client installed on their computer. On Windows we recommend the free Home Edition of MobaXterm. On Mac and Linux computers, SSH is usually pre-installed (try typing ssh in a terminal to make sure it is there).


Course notes can be found here.

Workflow

We will begin with a 30–60 minute lecture covering collaboration on GitHub in two scenarios: (1) when you have direct write access to a repository, and (2) when you contribute via pull requests without write access. Then we proceed with the hands-on part:

  1. Create GitHub accounts.
  2. Create a private-public ssh key pair, upload your public key to GitHub.
  3. Get into pairs by topic (Bash, HPC, R).
  4. One person in each team creates a GitHub repo and gives write access to the other person.
  5. Each team works on a series of exercises splitting them between the two and collaborating via a single GitHub repo.
  6. Each pair submits a pull request (PR) to the central Bash+HPC+R repository with their solutions.

Bash exercises

Bash exercise 1

Write a function check() that complains when it does not receive any arguments.

Bash exercise 2

Write a function countfiles() to count files in all directories passed to it as arguments (need to loop through all arguments). At the beginning add the check printing the command syntax if nothing was passed to the function:

$ countfiles
No arguments given. Usage: countfiles dir1 dir2 ...
$ countfiles <dir1> <dir2>
dir1 has 10 files
dir2 has 5 files

Bash exercise 3

Write a function to swap two file names. Add a check that both files exist, before renaming them.

Bash exercise 4

Write a function archive() to replace directories with their gzipped archives.

$ ls -F
chapter1/  chapter2/  notes/ data/
$ archive chapter* notes/
$ ls -F
chapter1.tar.gz  chapter2.tar.gz  notes.tar.gz  data/

If for any reason the function fails to create an archive, do not remove the original directory.

Bash exercise 5

Write the reverse function unarchive() that replaces a gzipped tarball with a directory.

Bash exercise 6

Imagine that the directory /project/def-sponsor00/shared/toyModel contains results from a numerical simulation. Write a command to copy every 10th file (starting from yB31_oneblock_00000.vti) from this directory to one of your own directories. Hint: create an alphabetically sorted list of files in that directory and then use awk’s NR variable.

Bash exercise 7

Similarly to the previous exercise, write a command to create a tar archive that includes every 20th file from the simulation directory /project/def-sponsor00/shared/toyModel. Is it possible to do this in one command? Why does it remove leading ‘/’ from file paths?

Bash exercise 8

Write a one-line command that finds 5 largest files in the current directory (and all of its subdirectories) and prints only their names and file sizes in the human-readable format (indicating bytes, kB, MB, GB, …) in the decreasing file-size order. Hint: use find, xargs, and awk.

Bash exercise 9

Write a loop to replace spaces to underscores in all file names in the current directory.

touch hello "first phrase" "second phrase" "good morning, everyone"
ls -l
ls *\ *

Bash exercise 10

Write a one-line command that will search for a string in all files in the current directory and all its subdirectories, and will hide errors (e.g. due to permissions).

Bash exercise 11

Using brace expansion ({...}), write a one-line command to create 365 empty Markdown files in the format YEAR-MON-DD.md, i.e. 2025-Jan-01.md, 2025-Jan-02.md, ..., 2025-Dec-31.md.

HPC exercises

HPC exercise 1

Implement pi.c (from the HPC course) in Julia or R, run it via Slurm, compare timing to C and Python versions.

HPC exercise 2

Using distributedPi.c as an example, write a C program with a ping-pong communication pattern between two MPI processes: rank 0 sends an integer to rank 1, and rank 1 sends it back to rank 0. Use MPI_Send and MPI_Recv functions. Repeat this \(N=1000\) times and measure the average round-trip time. Try with (1) two cores on one node and (2) two cores on two different nodes.

HPC exercise 3

Consider the following compact multi-threaded Julia code to compute a Julia set:

using ThreadsX, BenchmarkTools

function pixel(z)
    c = 0.355 + 0.355im
    z *= 1.2   # zoom out
    for i = 1:255
        z = z^2 + c
        if abs(z) >= 4
            return i
        end
    end
    return 255
end

const height, width = repeat([2_000],2)   # 2000^2 image

println("Computing Julia set ...")
point = zeros(Complex{Float32}, height, width);
stability = zeros(Int32, height, width);
for i in 1:height, j in 1:width
    point[i,j] = (2*(j-0.5)/width-1) + (2*(i-0.5)/height-1)im # rescale to -1:1 in the complex plane
end
stability = @btime ThreadsX.map(pixel, point);

Store this code as juliaSet.jl and write a Slurm script to run it as a shared-memory job. On the training cluster, run this code on 1 core, 2 cores, and so on up to the maximum number of cores per node. You will need to install the 3rd-party Julia libraries ThreadsX.jl and BenchmarkTools.jl.

HPC exercise 4

Write the function usage() described in the “Workflows” section in the HPC slides from Tuesday.

R exercises

R exercise 1

Consider the following code:

f1 <- function(n) {
  squares_sum <- 0
  for(i in 1:length(n)) {
    squares_sum <- squares_sum + n[i]^2
  }
  squares_sum
}

n <- 1:10000

f1(n)

Write a function f2() that is more efficient than f1() (make sure that both functions give you the same result and that f2() is indeed faster).

R exercise 2

Consider the following code:

f1 <- function(n) {
  cum_sum <- numeric(0)
  for (i in 1:length(n)) {
    cum_sum <- c(cum_sum, sum(n[1:i]))
  }
  cum_sum
}

n <- 1:10000

f1(n)

Write a more efficient code that gives the same result.

R exercise 3

Consider the code:

f1 <- function(n, threshold) {
  count <- 0
  for (i in 1:length(n)) {
    if (n[i] > threshold) {
      count <- count + 1
    }
  }
  count
}

set.seed(42)
n <- runif(100000, min = 0, max = 1000)
threshold <- 500

f1(n, threshold)

Write a function f2() that is more efficient than f1() (make sure that both functions give you the same result and that f2() is indeed faster).