The data engineering team manages Azure HDInsight clusters. The team spends a large amount of time creating and destroying clusters daily because most of the data pipeline process runs in minutes.
You need to implement a solution that deploys multiple HDInsight clusters with minimal effort.
What should you implement?
Azure Databricks
Azure Traffic Manager
Azure Resource Manager templates
Ambari web user interface
Correct answer: C
Explanation:
A Resource Manager template makes it easy to create the following resources for your application in a single, coordinated operation:HDInsight clusters and their dependent resources (such as the default storage account). Other resources (such as Azure SQL Database to use Apache Sqoop). In the template, you define the resources that are needed for the application. You also specify deployment parameters to input values for different environments. The template consists of JSON and expressions that you use to construct values for your deployment. References:https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-create-linux-clusters-arm-templates
A Resource Manager template makes it easy to create the following resources for your application in a single, coordinated operation:
HDInsight clusters and their dependent resources (such as the default storage account).
Other resources (such as Azure SQL Database to use Apache Sqoop).
In the template, you define the resources that are needed for the application. You also specify deployment parameters to input values for different environments. The template consists of JSON and expressions that you use to construct values for your deployment.
You are a data engineer implementing a lambda architecture on Microsoft Azure. You use an open-source big data solution to collect, process, and maintain data. The analytical data store performs poorly.
You must implement a solution that meets the following requirements:
Provide data warehousing
Reduce ongoing management activities
Deliver SQL query responses in less than one second
You need to create an HDInsight cluster to meet the requirements.
Which type of cluster should you create?
Interactive Query
Apache Hadoop
Apache HBase
Apache Spark
Correct answer: D
Explanation:
Lambda Architecture with Azure:Azure offers you a combination of following technologies to accelerate real-time big data analytics:Azure Cosmos DB, a globally distributed and multi-model database service. Apache Spark for Azure HDInsight, a processing framework that runs large-scale data analytics applications. Azure Cosmos DB change feed, which streams new data to the batch layer for HDInsight to process. The Spark to Azure Cosmos DB Connector Note: Lambda architecture is a data-processing architecture designed to handle massive quantities of data by taking advantage of both batch processing and stream processing methods, and minimizing the latency involved in querying big data.References:https://sqlwithmanoj.com/2018/02/16/what-is-lambda-architecture-and-what-azure-offers-with-its-new-cosmos-db/
Lambda Architecture with Azure:
Azure offers you a combination of following technologies to accelerate real-time big data analytics:
Azure Cosmos DB, a globally distributed and multi-model database service.
Apache Spark for Azure HDInsight, a processing framework that runs large-scale data analytics applications.
Azure Cosmos DB change feed, which streams new data to the batch layer for HDInsight to process.
The Spark to Azure Cosmos DB Connector
Note: Lambda architecture is a data-processing architecture designed to handle massive quantities of data by taking advantage of both batch processing and stream processing methods, and minimizing the latency involved in querying big data.
You develop data engineering solutions for a company. The company has on-premises Microsoft SQL Server databases at multiple locations.
The company must integrate data with Microsoft Power BI and Microsoft Azure Logic Apps. The solution must avoid single points of failure during connection and transfer to the cloud. The solution must also minimize latency.
You need to secure the transfer of data between on-premises databases and Microsoft Azure.
What should you do?
Install a standalone on-premises Azure data gateway at each location
Install an on-premises data gateway in personal mode at each location
Install an Azure on-premises data gateway at the primary location
Install an Azure on-premises data gateway as a cluster at each location
Correct answer: D
Explanation:
You can create high availability clusters of On-premises data gateway installations, to ensure your organization can access on-premises data resources used in Power BI reports and dashboards. Such clusters allow gateway administrators to group gateways to avoid single points of failure in accessing on-premises data resources. The Power BI service always uses the primary gateway in the cluster, unless it’s not available. In that case, the service switches to the next gateway in the cluster, and so on. References:https://docs.microsoft.com/en-us/power-bi/service-gateway-high-availability-clusters
You can create high availability clusters of On-premises data gateway installations, to ensure your organization can access on-premises data resources used in Power BI reports and dashboards. Such clusters allow gateway administrators to group gateways to avoid single points of failure in accessing on-premises data resources. The Power BI service always uses the primary gateway in the cluster, unless it’s not available. In that case, the service switches to the next gateway in the cluster, and so on.