My Spark Adventure
Overview
This project was inspired by conversations with Matei Zaharia while I was in his operating systems class at UC Berkeley. GitHub Project Repo.
Post Table of Contents
- Part 1: Setup — Ethernet, IPs, SSH
- Part 2: Docker Conatainers
- Part 3: Configuring Spark workers, coordinator and executors
Goals
The concepts/tools I’m getting hands-on with in this project:
- Distributed systems
- SSH
- Docker
- Apache Spark (PySpark)
Filetree
├── README.md
├── compose.yml
├── conf
│ └── spark-env.sh
├── master
│ ├── Dockerfile
│ └── start-master.sh
├── scripts
│ ├── git-autopilot.sh
│ ├── pyspark.sh
│ └── start_spark_cluster.sh
├── spark
│ ├── bin
│ ├── include
│ ├── lib
│ ├── pyvenv.cfg
│ └── share
└── worker
├── Dockerfile
└── start-worker.sh