Every interviewer is different and their questions may vary. So, boost your confidence with top Hadoop interview questions. Demonstrate your enthusiasm towards the work. Be prepare in advance.
$ hadoop fs -copyToLocal $ hadoop fs -copyFromLocal $ hadoop fs -put
Below are the main tasks of JobTracker:
Following are the three configuration files in Hadoop:
NameNode- It is also known as Master node. It maintains the file system tree and the metadata for all the files and directories present in the system. NameNode is a very highly available server that manages the File System Namespace and controls access to files by clients. It records the metadata of all the files stored in the cluster i.e. location of blocks stored, size of the files, hierarchy,permissions etc .
NameNode is the master daemon that manages and maintains all the DataNodes (slave nodes).
There are two files associated with the metadata:
FsImage: It is the snapshot of the file system when Name Node is started.
EditLogs: It is the sequence of changes made to the file system after the Name Node is started.
Checkpoint node- Checkpoint node is the new implementation of Secondary NameNode . It is used to create periodic checkpoints of file system metadata by merging edits file with fsimage file and finally it uploads the new image back to the active NameNode
It is structured in the same directory as the NameNode and stores the latest checkpoint .
Backup Node - Backup Node is an extended checkpoint node that performs checkpointing and also supports online streaming of file system edits.
Its main role is to act as the dynamic Backup for the Filesystem Namespace (Metadata )in the Primary Namenode of the Hadoop Ecosystem.
The Backup node keeps an in-memory, up-to-date copy of the file system namespace which is always synchronized with the active NameNode state.
Backup node does not need to download fsimage and edits files from the active NameNode to create a checkpoint, as it already has an up-to-date state of the namespace in it’s own main memory. So, creating checkpoint in backup node is just saving a copy of file system meta-data (namespace) from main-memory to its local files system.
ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -distcp hdfs://namenodeA/apache_hadoop hdfs://namenodeB/Hadoop
The replication factor in HDFS can be modified /overwritten in 2 ways-
$hadoop fs –setrep –w 2 /my/sample.xml
sample.xml is the filename whose replication factor will be set to 2
$hadoop fs –setrep –w 6 /my/sample_dir
sample_dir is the name of the directory and all the files in this directory will have a replication factor set to 6.
ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -touchz /hadoop/sample ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -ls /hadoop
Found 2 items
-rw-r--r-- 2 ubuntu supergroup
0 2018-11-08 00:57 /hadoop/sample
-rw-r--r-- 2 ubuntu supergroup
16 2018-11-08 00:45 /hadoop/test
fsck a utility to check health of the file system, to find missing files, over-replicated, under-replicated and corrupted blocks.
Command for finding the blocks for a file:
$ hadoop fsck -files -blocks –racks
Hadoop distributed file system (HDFS) is the primary storage system of Hadoop. HDFS stores very large files running on a cluster of commodity hardware. It works on the principle of storage of less number of large files rather than the huge number of small files.
HDFS stores data reliably even in the case of hardware failure. It provides high throughput access to the application by accessing in parallel. Components of HDFS:
Update the network addresses in the dfs.include and mapred.include
$ hadoop dfsadmin -refreshNodes and hadoop mradmin -refreshNodes Update the slave file.
Start the DataNode and NodeManager on the added Node.
By default, the HDFS block size is 64MB
Default replication factor is 3
Task Tracker 50060
Job Tracker 50030
It dsplays the Access Control Lists (ACLs) of files and directories. If a directory has a default ACL, then getfacl also displays the default ACL.
ubuntu@ubuntu-VirtualBox:~$ hdfs dfs -getfacl /hadoop
This exception means there is no communication between the DataNode and the DataNode due to any of the below reasons :
You can provide dfs.block.size on command line :
hadoop fs -D dfs.block.size=<blocksizeinbytes> -cp /source /destination
hadoop fs -D dfs.block.size=<blocksizeinbytes> -put /source /destination
Below command is used to enter Safe Mode manually –
$ Hdfs dfsadmin -safe mode enter
Once the safe mode is entered manually, it should be removed manually.
Below command is used to leave Safe Mode manually –
$ hdfs dfsadmin -safe mode leave
The two popular utilities or commands to find HDFS space consumed are
HDFS provides reliable storage by copying data to multiple nodes. The number of copies it creates is usually referred to as the replication factor which is greater than one.
Today professionals are switching their careers to Big Data and Hadoop. Due to a sudden demand in the market, it’s creating huge job opportunities. There are a lot of opportunities from much-reputed companies around the world.
These Hadoop developer interview questions have been designed specially to get you familiarized with the nature of questions which you might face during your interview and will help you to crack Hadoop Interview easily & acquire your dream career as a Hadoop Developer. Top big data Hadoop interview questions will surely boost your confidence to face an interview and will prepare you to answer to your interviewer’s questions in the best manner. These interview questions on Hadoop are suggested by the experts.
Turn yourself into a Hadoop Developer. Live your dream career!