More examples on GitHub
Python is now the most widely used language on Spark. PySpark has more than 5 million monthly downloads on PyPI, the Python Package Index. This release improves its functionalities and usability, including the pandas UDF API redesign with Python type hints, new pandas UDF types, and more Pythonic error handling.
Windows Subsystem for Linux with ubuntu 20.04 (For offline usage on Windows 10)
Old Windows native with GOW. GOW allows the use of Linux commands on Windows.
Ubuntu native, or Macos (best choice but not part of this workshop)
Visual Studio code an open source EDI on windows and Linux
Git and GitHub account. For collaborating and sharing source code
Java : spark is written in scala a java dialect
Python : for using PySpark (Spark API in python)
Complete the following steps to prepare PySpark on a Windows machine:
Download Gnu on Windows (GOW) from https://github.com/bmatzelle/gow/releases/download/v0.8.0/Gow-0.8.0.exe.
GOW allows the use of Linux commands on Windows. We can use the following command to see the basic Linux commands allowed by installing GOW:
gow --list