Configure PyCharm CE to work with Apache Spark

This guide should help you to setup PyCharm CE to work with Python3 and Apache Spark (tested with version 2.1) First, Create a new Pure Python PyCharm project. Now copy the content of https://github.com/apache/spark/blob/master/examples/src/main/python/wordcount.py to your project. Your IDE should complain at the following line from pyspark.sql import SparkSession because it doesn’t know where is pyspark.sql which […]

Read More »

How to: Install a Virtual Apache Hadoop Cluster with Vagrant and Cloudera Manager on a Mac

Feel free to skip some of the steps if you already have certain packages installed Get Cask brew install caskroom/cask/brew-cask Get Vagrant & Vagrant plugins brew cask install virtualbox brew cask install vagrant brew cask install vagrant-manager vagrant plugin install vagrant–hostmanager Install Hadoop git clone [email protected]:richardhe-awin/vagrant-hadoop-cluster.git cd vagrant-hadoop-cluster vagrant up Configure Cloudera Manager (mostly referenced from http://blog.cloudera.com/blog/2014/06/how-to-install-a-virtual-apache-hadoop-cluster-with-vagrant-and-cloudera-manager/) […]

Read More »