More examples on GitHub
Prerequisites
A system running Windows 10
A user account with administrator privileges (required to install software, modify file permissions, and modify system PATH)
Command Prompt or Powershell, I prefer Powershell
A tool to extract .tar files, i will use GOW (Gnu on Windows)
Ultra light: Small, light subset (about 18 MB) of very useful UNIX binaries that do not have decent installers (until now!).
Shell window from any directory: Adds a Windows Explorer shell window so that you can right-click on any directory and open a command (cmd.exe) window from that directory.
Simple install/remove: Easy to install and remove, all files contained in a single directory in a standard C:\Program Files path.
Included in PATH: All binaries are conveniently installed into the Windows PATH so they are accessible from a command-line window.
Stable binaries: All commands are stable and tested.
Get the latest version here https://github.com/bmatzelle/gow/releases/tag/v0.8.0 and install it
Installing Apache Spark on Windows 10 may seem complicated to novice users, but this simple tutorial will have you up and running. If you already have Java 8 and Python 3 installed, you can skip the first two steps.
Apache Spark requires Java 8. You can check to see if Java is installed using the command prompt.
Open the powershell by clicking Start > type powershell > click Windows PowerShell.
Type the following command in the command prompt:
> java -version
If you get something like this, you have java
If you don’t have Java installed:
1. Open a browser window, and navigate to https://cdn.azul.com/zulu/bin/zulu8.54.0.21-ca-jdk8.0.292-win_x64.msi and install it
1. To install the Python package manager, navigate to https://www.python.org/ in your web browser.
2. Mouse over the Download menu option and click Python 3.8.3. 3.8.3 is the latest version at the time of writing the article.
3. Once the download finishes, run the file.
1. Open a browser and navigate to https://spark.apache.org/downloads.html.
2. Under the Download Apache Spark heading, there are two drop-down menus. Use the current non-preview version.
Installing Apache Spark involves extracting the downloaded file to the desired location.
Download the winutils.exe file for the underlying Hadoop version for the Spark installation you downloaded.
1. Navigate to this URL https://github.com/cdarlint/winutils and inside the bin folder, locate winutils.exe, and click it.
Put it in the directory bin of your installed spark version
Configuring environment variables in Windows adds the Spark and Hadoop locations to your system PATH. It allows you to run the Spark shell directly from a command prompt window.
1. Click Start and type environment.
2. Select the result labeled Edit the system environment variables.
3. A System Properties dialog box appears. In the lower-right corner, click Environment Variables and then click New in the next window.
SPARK_HOME
JAVA_HOME
HADOOP_HOME