You are migrating a cluster from MApReduce version 1 (MRv1) to MapReduce version 2 (MRv2) on YARN. You want to maintain your MRv1 TaskTracker slot capacities when you migrate. What should you do?
Configure yarn.applicationmaster.resource.memory-mb and yarn.applicationmaster.resource.cpu-vcores so that ApplicationMaster container allocations match the capacity you require.
You don't need to configure or balance these properties in YARN as YARN dynamically balances resource management capabilities on your cluster
Configure mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum ub yarn-site.xml to match your cluster's capacity set by the yarn-scheduler.minimum-allocation
Configure yarn.nodemanager.resource.memory-mb and yarn.nodemanager.resource.cpu-vcores to match the capacity you require under YARN for each NodeManager
Correct answer: D
Question 2
You have a Hadoop cluster HDFS, and a gateway machine external to the cluster from which clients submit jobs. What do you need to do in order to run Impala on the cluster and submit jobs from the command line of the gateway machine?
Install the impalad daemon statestored daemon, and daemon on each machine in the cluster, and the impala shell on your gateway machine
Install the impalad daemon, the statestored daemon, the catalogd daemon, and the impala shell on your gateway machine
Install the impalad daemon and the impala shell on your gateway machine, and the statestored daemon and catalogd daemon on one of the nodes in the cluster
Install the impalad daemon on each machine in the cluster, the statestored daemon and catalogd daemon on one machine in the cluster, and the impala shell on your gateway machine
Install the impalad daemon, statestored daemon, and catalogd daemon on each machine in the cluster and on the gateway node
Correct answer: D
Question 3
You observed that the number of spilled records from Map tasks far exceeds the number of map output records. Your child heap size is 1GB and your io.sort.mb value is set to 1000MB. How would you tune your io.sort.mb value to achieve maximum memory to disk I/O ratio?
For a 1GB child heap size an io.sort.mb of 128 MB will always maximize memory to disk I/O
Increase the io.sort.mb to 1GB
Decrease the io.sort.mb value to 0
Tune the io.sort.mb value until you observe that the number of spilled records equals (or is as close to equals) the number of map output records.