Installing Hadoop on Mac part 1

A step by step guide to get your running with Hadoop today! In Hadoop on Mac part 2 we actually walk through the creation and compilation process of Java Hadoop Wordcount from beginning to end and automating it with .pom files. hadooplogo

Install HomeBrew
Install Hadoop
Configuring Hadoop
SSH Localhost
Starting and Stopping Hadoop
Good to know

Install HomeBrew

Download it from the website at http://brew.sh/ or simply paste the script inside the terminal

$ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Install Hadoop

$ brew install hadoop

Hadoop will be installed in the following directory
/usr/local/Cellar/hadoop

Configuring Hadoop

Edit hadoop-env.sh

The file can be located at /usr/local/Cellar/hadoop/2.6.0/libexec/etc/hadoop/hadoop-env.sh
where 2.6.0 is the hadoop version.

Find the line with

export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"

and change it to

export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="

Edit Core-site.xml

The file can be located at /usr/local/Cellar/hadoop/2.6.0/libexec/etc/hadoop/core-site.xml .

<configuration>  
<property>
     <name>hadoop.tmp.dir</name>
     <value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
     <description>A base for other temporary directories.</description>
  </property>
  <property>
     <name>fs.default.name</name>                                     
     <value>hdfs://localhost:9000</value>                             
  </property>
</configuration>    

Edit mapred-site.xml

The file can be located at /usr/local/Cellar/hadoop/2.6.0/libexec/etc/hadoop/mapred-site.xml and by default will be blank.

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9010</value>
</property>
</configuration>

Edit hdfs-site.xml

The file can be located at /usr/local/Cellar/hadoop/2.6.0/libexec/etc/hadoop/hdfs-site.xml .

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

To simplify life edit your ~/.profile using vim or your favorite editor and add the following two commands. By default ~/.profile might not exist.

alias hstart="/usr/local/Cellar/hadoop/2.6.0/sbin/start-dfs.sh;/usr/local/Cellar/hadoop/2.6.0/sbin/start-yarn.sh"
alias hstop="/usr/local/Cellar/hadoop/2.6.0/sbin/stop-yarn.sh;/usr/local/Cellar/hadoop/2.6.0/sbin/stop-dfs.sh"

and execute

$ source ~/.profile

in the terminal to update.

Before we can run Hadoop we first need to format the HDFS using

$ hdfs namenode -format

SSH Localhost

Nothing needs to be done here if you have already generated ssh keys. To verify just check for the existance of ~/.ssh/id_rsa and the ~/.ssh/id_rsa.pub files. If not the keys can be generated using

$ ssh-keygen -t rsa

Enable Remote Login
“System Preferences” -> “Sharing”. Check “Remote Login”
Authorize SSH Keys
To allow your system to accept login, we have to make it aware of the keys that will be used

$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Let’s try to login.

$ ssh localhost
Last login: Fri Mar  6 20:30:53 2015
$ exit

Running Hadoop

Now we can run Hadoop just by typing

$ hstart

and stopping using

$ hstop

Download Examples

To run examples, Hadoop needs to be started.

Hadoop Examples 1.2.1 (Old)
Hadoop Examples 2.6.0 (Current)

Test them out using:

$ hadoop jar  pi 10 100

Good to know

We can access the Hadoop web interface by connecting to

Resource Manager: http://localhost:50070
JobTracker: http://localhost:8088
Specific Node Information: http://localhost:8042

Command
$ jps 
7379 DataNode
7459 SecondaryNameNode
7316 NameNode
7636 NodeManager
7562 ResourceManager
7676 Jps

$ yarn  // For resource management more information than the web interface. 
$ mapred  // Detailed information about jobs

This we can use to access the HDFS filesystem, for any resulting output files.

HDFS viewer

Errors


To resolve ‘WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable’
 (Stackoverflow.com)

Connection Refused after installing Hadoop

$ hdfs dfs -ls
15/03/06 20:13:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ls: Call From spaceship.local/192.168.1.65 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:   http://wiki.apache.org/hadoop/ConnectionRefused

The start-up scripts such as start-all.sh do not provide you with specifics about why the startups failed. Some of the time it won’t even notify you that a startup failed… To troubleshoot the service that isn’t functioning execute it manually.

$ hdfs namenode
15/03/06 20:18:31 WARN namenode.FSNamesystem: Encountered exception loading fsimage
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /usr/local/Cellar/hadoop/hdfs/tmp/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
15/03/06 20:18:31 FATAL namenode.NameNode: Failed to start namenode.

and the problem is…

$ hadoop namenode -format

To verify the problem is fixed run

$ hstart
$ hdfs dfs -ls /

If ‘hdfs dfs -ls’ gives you a error

ls: `.': No such file or directory

then we need to create the default directory structure Hadoop expects (ie. /user/whoami_output/)

$ whoami
 spaceship
$ hdfs dfs -mkdir -p /user/spaceship 
 15/03/06 20:31:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
$ hdfs dfs -ls
 15/03/06 20:31:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
$ hdfs dfs -put book.txt
 15/03/06 20:32:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
$ hdfs dfs -ls 
 15/03/06 20:32:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
 Found 1 items
 -rw-r--r--   1 marekbejda supergroup      29578 2015-03-06 20:32 book.txt

JPS and Nothing Works…

Seems like certain builds of Java 1.8 (i.e.. 1.8_40) are missing a critical package that breaks Yarn. Check your logs at

$ jps
 5935 Jps
$ vim /usr/local/Cellar/hadoop/2.6.0/libexec/logs/yarn-*
 2015-03-07 16:21:32,934 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain java.lang.NoClassDefFoundError: sun/management/ExtendedPlatformComponent
..
 2015-03-07 16:21:32,937 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
 2015-03-07 16:21:32,939 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:

http://mail.openjdk.java.net/pipermail/core-libs-dev/2014-November/029818.html

Either downgrade to Java 1.7 or I’m currently running 1.8.0_20

$ java -version
 java version "1.8.0_20"
 Java(TM) SE Runtime Environment (build 1.8.0_20-b26)
 Java HotSpot(TM) 64-Bit Server VM (build 25.20-b23, mixed mode)

I’ve Done Everything!! SSH still asks for a password!!!! OMGG!!!!

So I’ve ran across this problem today, all of a sudden ssh localhost started requesting a password. I’ve generated new keys and searched all day for an answer, thanks to this Apple thread.

$ chmod go-w ~/        
$ chmod 700 ~/.ssh       
$ chmod 600 ~/.ssh/authorized_keys       
Advertisements

137 thoughts on “Installing Hadoop on Mac part 1”

  1. Distributed Cache file not accessible..throwing below error
    java.lang.Exception: java.io.FileNotFoundException: file:/usr/local/Cellar/hadoop/hdfs/tmp/mapred/local/

    could you please help me.

  2. “To simplify life edit your ~/.profile using vim or your favorite editor and add the following two commands. By default ~/.profile might not exist.”
    With all due respect, if “~/.profile” doesn’t exist then how can we edit it?

    1. Thats interesting typically brew install adds the hadoop bin path to your PATH variable. But in your case I guess you’ll have to do “export PATH=hadoop_bin_path:$PATH ”
      where hadoop_bin_path is something like
      /usr/local/Cellar/hadoop/2.7.1/bin
      so your terminal knows where hadoop has been installed.

      1. Hey Marek, thanks for all the help, but i got stuck again while generating the ssh key. It’s asking me to enter the name of the file in which the keys can be stored [Enter file in which to save the key (/Users/tsemixgoat/.ssh/id_rsa): ]. What should i do?

  3. Really nice tutorial, Thank you!
    It works until I try to format: $ hdfs namenode -format
    I run OS X El Capitan, can it be a reason for the error? There seems to be an error with core-site.xml.

    core-site.xml contains following info:

    hadoop.tmp.dir
    /usr/local/Cellar/hadoop/hdfs/tmp
    A base for other temporary directories.

    fs.default.name
    hdfs://localhost:9000

    15/10/30 23:31:32 ERROR namenode.NameNode: Failed to start namenode.
    java.lang.RuntimeException: org.xml.sax.SAXParseException; systemId: file:/usr/local/Cellar/hadoop/2.7.1/libexec/etc/hadoop/core-site.xml; lineNumber: 30; columnNumber: 1; XML document structures must start and end within the same entity.

    15/10/30 23:31:32 FATAL conf.Configuration: error parsing conf core-site.xml
    org.xml.sax.SAXParseException; systemId: file:/usr/local/Cellar/hadoop/2.7.1/libexec/etc/hadoop/core-site.xml; lineNumber: 30; columnNumber: 1; XML document structures must start and end within the same entity.

    Caused by: org.xml.sax.SAXParseException; systemId: file:/usr/local/Cellar/hadoop/2.7.1/libexec/etc/hadoop/core-site.xml; lineNumber: 30; columnNumber: 1; XML document structures must start and end within the same entity.

  4. I have realised my configuration mistake after posting a question, where I did not close the statement in core-site.xml property with .
    After correction it runs better, but still there is an issue
    after running $ hdfs namenode -format in the terminal I get a warning msg and there seem to be an issue with loading library for my platform.

    I run hadoop on OS X El Capitan. Can new operation system cause this?

    15/10/31 00:02:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
    Formatting using clusterid: CID-300e836c-c481-4fa9-9104-4c87b790d739

  5. Trying to solve namenode – format issue, by following suggestion on the forum:

    $ hadoop namenode -format
    To verify the problem is fixed run

    $ hstart
    $ hdfs dfs -ls /
    If ‘hdfs dfs -ls’ gives you a error

    ls: `.’: No such file or directory

    When i try to create a new directory instead of finding book.txt I receive:

    put: `book.txt’: No such file or directory

    Not sure how could I proceed further.

    Nadias-MacBook-Pro:~ nadiastraton$ hdfs dfs -ls
    15/10/31 00:43:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
    ls: `.’: No such file or directory
    Nadias-MacBook-Pro:~ nadiastraton$ whoami
    nadiastraton
    Nadias-MacBook-Pro:~ nadiastraton$ hdfs dfs -mkdir -p /user/nadiastraton
    15/10/31 00:44:34 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
    Nadias-MacBook-Pro:~ nadiastraton$ hdfs dfs -ls
    15/10/31 00:44:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
    Nadias-MacBook-Pro:~ nadiastraton$ hdfs dfs -put book.txt
    15/10/31 00:45:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
    put: `book.txt’: No such file or directory
    Nadias-MacBook-Pro:~ nadiastraton$ hdfs dfs -ls
    15/10/31 00:45:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
    Nadias-MacBook-Pro:~ nadiastraton$ jps
    3873 DataNode
    3987 SecondaryNameNode
    2118 ResourceManager
    3784 NameNode
    2218 NodeManager
    4172 Jps
    Nadias-MacBook-Pro:~ nadiastraton$ java -version
    java version “1.8.0_31”
    Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
    Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)
    Nadias-MacBook-Pro:~ nadiastraton$ hdfs dfs -put book.txt
    15/10/31 00:48:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
    put: `book.txt’: No such file or directory

    1. Sorry I am confused by this “When i try to create a new directory instead of finding book.txt I receive:” Should need to create a user directory like it states, by running:
      $ whoami
      spaceship
      $ hdfs dfs -mkdir -p /user/spaceship

      1. I am following below tutorial exactly, but it does not solve an issue.

        still receive a warning that prevents from starting hadoop:

        WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
        Formatting using clusterid: CID-300e836c-c481-4fa9-9104-4c87b790d739

        Tutorial steps I have done to fix a warning:

        ‘hdfs dfs -ls’ gives you a error

        ls: `.’: No such file or directory
        then we need to create the default directory structure Hadoop expects (ie. /user/whoami_output/)

        $ whoami
        spaceship
        $ hdfs dfs -mkdir -p /user/spaceship
        15/03/06 20:31:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
        $ hdfs dfs -ls
        15/03/06 20:31:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
        $ hdfs dfs -put book.txt
        15/03/06 20:32:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
        $ hdfs dfs -ls
        15/03/06 20:32:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
        Found 1 items
        -rw-r–r– 1 marekbejda supergroup 29578 2015-03-06 20:32 book.txt

  6. Hi, when i run the hstart it says no such file or directory. I am pretty new to this. What am i missing out ? is it the step where i do the ~./profile wrongly ? i just added the 2 commands into ~./profile

  7. Hi Marek, thanks for the tutorial. Similarly to Nadia, $ hdfs namenode -format returns the error hdfs command not found. I am struggling to find a solution with google. I am surprised not to see the hdfs executable in /usr/bin, Is that normal ? did I miss something?

    1. Hello, from my experience Homebrew symlinks the appropriate bins into /usr/local/bin, so make sure that path is in the PATH environmental variable.

      echo $PATH
      > /opt/local/bin:/opt/local/sbin:/usr/local/bin

      1. Thanks for your reply. Indeed, I had not the aforementionned path in my PATH. I have added it in the .bash_profile. What should I do next ? delete and reinstall hadoop ? sorry for my naive questions but I am kind of a newbie in this business. (maybe I should also mention that I am working under El Capitan).

      2. Just try “export PATH=/usr/local/bin:$PATH” that should fix it. If not try brew uninstall Hadoop and brew install Hadoop one more time

      3. Hi Marek, thanks for the tips. It turned out that my problem was not related to path but to the permission on the /usr/local/ folder. I don’t know why but /usr/local was not initially writteable. After changing the permission and reinstalling hadoop, things seem ok. Just wanna inform that the current version of hadoop is 2.7.1 so I have to change the alias to make hstart/ hstop work. I still have some weird issues with id_rsa password. When I try to log ssh localhost a password window pop up so I followed your suggestions on $ chmod go-w ~/ . It solves partially the problem as when I use $ hstart the same window asking for pwd pops up again. If I give my admin pwd it is not accepted. However if I click 3 times on cancel it redirects me through the password asked this time on the terminal… which accepts my admin pwd. It is kind of annoying as when loggin/loggout via hstart/hstop I have to repeat several times this operation. Do you have maybe an idea how to solve this? Grégoire

  8. Thank you very much for this blog. I have installed successfully. I have a little problem. When i run the wordcount, I can get the result in the localhost:50070, but the problem is that there is no job in the localhost:8088, do you know how to solve it?

  9. This worked for me. Thanks. One would think installing hadoop on mac might be non-trivial. Turned out easier than on ubuntu. Perhaps because of this post!

  10. Thanks for the information.

    There are certain things which I need to mention here.

    With Yarn, there won’t be job tracker and task tracker. In that case
    setting mapred.job.tracker parameter will be ignored by Hadoop.

  11. Firstly, Thanks a lot for the tutorial. When I try to generate keys, which file do I save the keys? what ever file I give, the error says no file found.
    please help. thanks again!

    1. You shouldn’t need to provide any file to generate the ssh keys, just press enter until it stops asking questions.

      Then it’ll save the keys in the default path ~/.ssh/id_rsa.pub

  12. I get this error. The error is with connection exception. Did anyone get this?
    MacBook-Pro-2:~ myname$ hdfs dfs -ls
    16/04/27 13:37:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
    ls: Call From dhcp-hol-3802.redrover.cornell.edu/10.145.14.218 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused

    Thank you!

  13. I just want to say a big thanks for this post. After struggling for hours with various tutorials on the net, I came across this page. Everything worked perfectly (I updated the references to Hadoop 2.7.3) as you provided it. Thank you!

    1. Hello Saul! It’s definitely a lot faster on my MacBook, it’s usually less than 2 seconds.
      I’m not sure what could be wrong unfortunately.

  14. in Sierra , I am not able to login to the local host , I am getting the below error .

    AAAA:~ XXXX$ ssh localhost
    Connection to localhost closed by remote host.
    Connection to localhost closed.

    System Preferences -> Sharing -> Remote Login enabled , verified .

    Any idea?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s