What should you recommend as a batch processing solution for Health Interface?
What should you recommend as a batch processing solution for Health Interface?
Azure Data Factory is an ideal solution for batch processing, particularly when dealing with a variety of data sources and formats. It can ingest, prepare, and transform data on a large scale. Azure Data Factory integrates with the Azure Cosmos DB bulk executor library to offer high performance when writing to Azure Cosmos DB. This makes it well-suited to support a scalable batch processing solution, aligning with the requirements to efficiently add data from new hospitals.
How come batch processing using Azure Stream analytics why not Azure data bricks? seems like wrong answer this should be D.
it's actually ADF as per their explanation, they marked it wrong. Bricks would also do I guess, there's little that ADF can do that databricks can't, if anything.
ok, ADF can use copy data from on-premise source, spark, which is used by ADF data fows and data bricks can't do that
ok, ADF can use copy data from on-premise source, spark, which is used by ADF data fows and data bricks can't do that
Technology choices for batch processing are 1. Azure Synapse Analytics 2. Azure HDInsight 3. Azure Data Lake Analytics 4. Azure Databricks https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/batch-processing
ADF has Data Flows, why is ADF not listed as part of Batch Processing? Secondly, changing the Units, will scale ADF as well... Sending data from On-Premise cant be done via DataBricks, DataBricks can act on it once data is in Azure, ADF seems to be the option
most of the questions and discussions in DP-201 are so confusing.. not sure which answer is correct unless having subject knowledge
ADF should be used for batch processing. Ans should be C
If you check this link "https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/batch-processing", ADF is not an option answer is D(Azure data bricks).
should be D
Stream Analytics is a streaming solution, not a batch processing solution. Data factory is an orchestration solution with data copy capabilities. Have no idea what the Azure Cycle thingy is. So Databricks is the only solution here qualified as Batch processing solution.
It require "Support a more scalable batch processing solution in Azure". So Databricks is the only auto-autoscaling option. (https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/batch-processing)
the comment below the 'answer' suggests the answer should be ADF, not the highlighted answer 'B'. But ADF is not really a batch processing solution, per the MS docs (as Luke97 clearly references).
"Reduce the amount of time it takes to add data" = Real-Time that means the answer is Azure Stream Analytics, so the answer is correct B
i don't think 'reduce the time it takes to add data' means make it real time... it just means speed it up! (the data load was getting slower, per the case study.) I think the answer should be databricks.
The more reactions I read, the more confused I get. My 2 cents: in this case, the hospitals send the data in batch. This means not message-by-message, but a file containing several messages or records. Most of the discussion here looks at "batch processing", which is another story to do with analysing big data stored in files. To me, batch processing is not the correct context of this case. What we need is to ingest files coming from the hospital from time to time. Azure Data Factory seems right to me. The answer's comment also seems to point to this solution, so the answer itself might be a typo.
I also agree that it should be Databricks
if input is cosmos DB, it should be data factory. as Azure stream analysis only support event hub, IOT hub and Blog storage as input. And the provided explain also mentioned a link: https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-cosmos-db that use data factory to connect to cosmos db
Adf vs Bricks which would be ideal? as its batch I feel its should be databricks
With ADF you can add a notebook from ADB. With ADF you can do batch processing and moreover both ADF and ADB has underlying architecture of apache spark. Performance wise both are almost same...the requirement can be achieved by both but ADF is less in terms of coding compared to ADB
Azure Databricks
They mentioned, health interface application received data in batches (group of messages as batch from existing c# application). If ADF is answer how solution is expecting to receive data (http source / json files on blob store?) with varying schema and perform bulk insert into cosmodb? It has to be ADB receiving messages / batches on stream and ingesting them into cosmodb.
The answer should be D: Databricks. Purely because of Scalability factor. ADF can be used but Databricks is better when it comes to scaling.
ADF can call databricks notebook in its pipeline
Which product would provide the best performance?
Don't go by word "batch". read this: Health Interface - ADatum has a critical application named Health Interface that receives hospital messages related to patient care and status updates. So stream analytics seems to be correct.
Correct Answer: B Explanation/Reference: Explanation: Scenario: ADatum identifies the following requirements for the Health Interface application: Support a more scalable batch processing solution in Azure. Reduce the amount of time it takes to add data from new hospitals to Health Interface. Data Factory integrates with the Azure Cosmos DB bulk executor library to provide the best performance when you write to Azure Cosmos DB. Reference: https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-cosmos-db
Why is it that no body is choosing Azure stream analytics as the input of the processing solution is messages generated by the website.
DataBricks dont support C#, Analytics is Correct
The C# is deprecated and will be removed.
clearly Azure Databricks https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/batch-processing
ADF makes more sense here as the requirement is to load the data from branches to target DB (Most likely Cosmos DB). Databrcks is more for bigdata analytics processing.
According to the given info, for the Health interface, cosmos DB is appropriate storage solutions. it's been stated the messages are sent in batches, in that case, Stream analytics is the best bet here as it can stream messages directly to Cosmos DB Sink. The given answer is right
Databricks is more for big data analytics. In this case batch processing is needed to load data into Cosmos DB. So ADF makes more sense.
I think the answer should be ADF. Eventhough it is not a batch processing solution per sè, if you have a look on the documentation link it also refers to ADF. However, Databricks would also sound plausible here in my opinion since it is the only "real" designated batch processing solution. I would stick with ADF but also think Databricks would be plausible. Azure Stream Analytics just does not make any sense at all here
The question says "More scalable batch processing" so if you refer the link only 'Azure Databricks' is scalable from the list. So this should be the answer https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/batch-processing
"Minimize the number of services required to perform data processing, development, scheduling, monitoring, and the operationalizing of pipelines." I would pick Data Factory as the answer
Disregard this; Databricks for batch processing
It has B showing as the answer, but then the description underneath implies C where it talks about data Factory and Cosmos DB. Data Factory is scalable.
Can I use ADF only for solution of both Health Insights and Health Interface?
https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/batch-processing it seems Databricks
Not sure if databricks can access on prem data source. If yes, then no question D. If not, then you have to use ADF copy data activity to opy from on prem to staging. But as different hospitals have different data formats then you have to transform it to common format. ADF can use mappng data flow or call databricks notebook to do that (but only from staged data already in Azure). dataflow unfortunately is not auto scalable, you have to redefine how many cores you want to use, so I would call databricks notebook from ADF after copy data in ADF. Cosest anwer seems C - ADF.
D. Azure Databricks
I would choose ADF. https://devblogs.microsoft.com/cosmosdb/migrating-relational-data-into-cosmos-db-using-azure-data-factory-and-azure-databricks/
for batch processing is databricks