More examples on GitHub

Spark on Windows 10

Prerequisites

A system running Windows 10
A user account with administrator privileges (required to install software, modify file permissions, and modify system PATH)
Command Prompt or Powershell, I prefer Powershell
A tool to extract .tar files, i will use GOW (Gnu on Windows)

Features and Benefits of Gow

Ultra light: Small, light subset (about 18 MB) of very useful UNIX binaries that do not have decent installers (until now!).
Shell window from any directory: Adds a Windows Explorer shell window so that you can right-click on any directory and open a command (cmd.exe) window from that directory.
Simple install/remove: Easy to install and remove, all files contained in a single directory in a standard C:\Program Files path.
Included in PATH: All binaries are conveniently installed into the Windows PATH so they are accessible from a command-line window.
Stable binaries: All commands are stable and tested.

Get the latest version here https://github.com/bmatzelle/gow/releases/tag/v0.8.0 and install it

Install Apache Spark on Windows

Installing Apache Spark on Windows 10 may seem complicated to novice users, but this simple tutorial will have you up and running. If you already have Java 8 and Python 3 installed, you can skip the first two steps.

Step 1: Install Java 8

Apache Spark requires Java 8. You can check to see if Java is installed using the command prompt.

Open the powershell by clicking Start > type powershell > click Windows PowerShell.

Type the following command in the command prompt:

> java -version

If you get something like this, you have java

If you don’t have Java installed:

1. Open a browser window, and navigate to https://cdn.azul.com/zulu/bin/zulu8.54.0.21-ca-jdk8.0.292-win_x64.msi and install it

Step 2: Install Python

1. To install the Python package manager, navigate to https://www.python.org/ in your web browser.

2. Mouse over the Download menu option and click Python 3.8.3. 3.8.3 is the latest version at the time of writing the article.

3. Once the download finishes, run the file.

Step 3: Download Apache Spark

1. Open a browser and navigate to https://spark.apache.org/downloads.html.

2. Under the Download Apache Spark heading, there are two drop-down menus. Use the current non-preview version.

Step 4: Install Apache Spark

Installing Apache Spark involves extracting the downloaded file to the desired location.

Step 5: Add winutils.exe File

Download the winutils.exe file for the underlying Hadoop version for the Spark installation you downloaded.

1. Navigate to this URL https://github.com/cdarlint/winutils and inside the bin folder, locate winutils.exe, and click it.

Put it in the directory bin of your installed spark version

Step 6: Configure Environment Variables

Configuring environment variables in Windows adds the Spark and Hadoop locations to your system PATH. It allows you to run the Spark shell directly from a command prompt window.

1. Click Start and type environment.

2. Select the result labeled Edit the system environment variables.

3. A System Properties dialog box appears. In the lower-right corner, click Environment Variables and then click New in the next window.

SPARK_HOME

JAVA_HOME

HADOOP_HOME

Page updated

Report abuse