You need to copy millions of sensitive patient records from a relational database to BigQuery. The total size of the database is 10 TB. You need to design a solution that is secure and time-efficient. What should you do?
You need to copy millions of sensitive patient records from a relational database to BigQuery. The total size of the database is 10 TB. You need to design a solution that is secure and time-efficient. What should you do?
To securely and efficiently transfer 10 TB of sensitive patient records, the Transfer Appliance is the best option. The Transfer Appliance allows for secure physical transfer of data, bypassing potential bandwidth limitations and network congestion that could occur with large data transfers over the internet. While Avro compression reduces the file size, the total data remains substantial, and the Transfer Appliance ensures that the data is transferred in a secure and timely manner without relying on network speeds.
You are transferring sensitive patient information, so C & D are ruled out. Choice comes down to A & B. Here it gets tricky. How to choose Transfer Appliance: (https://cloud.google.com/transfer-appliance/docs/2.0/overview) Without knowing the bandwidth, it is not possible to determine whether the upload can be completed within 7 days, as recommended by Google. So the safest and most performant way is to use Transfer Appliance. Therefore my choice is B.
https://cloud.google.com/solutions/migration-to-google-cloud-transferring-your-large-datasets The table shows for 1Gbps, it takes 30 hrs for 10 TB. Generally, corporate internet speeds are over 1Gbps. I'm inclined to pick A
SAY MY NAME! You need to Transfer Sensitive Patient information, over public ISP you shouldn't do that.
If you transfer 10TBs over the wire, your network will be blocked for the entire transfer time. This isn't something a company would be happy to swallow.
Answer is B,gsutil has a limit of 1TBaccording to Google documentation,if data is morethan 1TBthen we have to use Transfer Appliance.
The answer is clearly seen here: https://cloud.google.com/architecture/migration-to-google-cloud-transferring-your-large-datasets#transfer-options
B is right answer
Answer should be B: A is also correct but it has its own limit. It allows only 5TB data upload at a time to cloud storage. https://cloud.google.com/storage/quotas I will go with B
5Tb "for individual objects". Create smaller AVRO files.
AVRO compression can reduce file size to a tenth
A is the answer, the question states the following facts: - Total size of database 10TB. - Solution needs to be: * Secure * Time-efficient Total size of database: will be significantly reduced in an avro file compression (up to 90% compression) Secure transfer: Even if we are dealing with sensitive data, data is encrypted when in transit while using `gsutils cp` to upload the data to GCS. https://cloud.google.com/storage/docs/gsutil/addlhelp/SecurityandPrivacyConsiderations#transport-layer-security Time-Efficient: gsutil could upload 10TB of data in 30 hours (or 1TB if its avro compressed first in 3 hours)
10 TB is nothing. With a single 10 GB interconnect you could transfer the data in 3 hours or even with a 1 GB speeds without interconnect you could transfer it in one weekend. The transfer appliance will take 25 days to get the appliance and then 25 days while you wait for the data to be available that is not "time-efficient" at all. I go with A instead of B.
I got the 25 days + 25 days from here: https://cloud.google.com/transfer-appliance/docs/4.0/overview#transfer-speeds
Transfer Appliance is not as time-efficient when you have enough bandwitdh. https://cloud.google.com/architecture/migration-to-google-cloud-transferring-your-large-datasets#transfer_appliance_for_larger_transfers
bhaii ek baar mera point sun lo and khud ki research karo... option A,because dekho 10tb hai ye mat dekho file ko compress kiya ja raha hai Avro me jo ki 90%-92% compress kar deta hai, to finaly hamare pass 1TB ya esase bhi kam ka file data hai jisko transfer karna hai , ab batao Transfer Appliance kyo use karu bhaisahab transfer appliance ki catagory hai 40tb aur 300TB ki , kyo offline ja rahe ho jo ki 7 din ya usase jyada time lega tumhara data online aane me, aur GSUTIL use karoge aur ye 100MB pe hi chala without dedicated bandwidth tab bhi ye ,1TB 100MB/S ki speed se 1 din me pura data online la dega .kyoki avro se file pahale hi 10tb se 1tb ho chuki hai. to GSUTIL is the best,bhale hi cost effective nahi bola hai question me but time bhi to dekho
transfer appliance will take time more than gsutil. and we did not mention yet if the location of the organization has google data centre
There is no "cost effective", if this is not a clear case for the appliance than what is?
A will take crazy time if the organization didnt have a dedicated link
I will go with " A" because of the transition time to take transfer appliance to Google and that also depends in the organisation location. gsutil works anywhere internet is available.
Transfer Appliance would take 20 days for epected turnaround time. https://cloud.google.com/architecture/migration-to-google-cloud-transferring-your-large-datasets#expected%20turnaround:~:text=The%20expected%20turnaround%20time%20for%20a%20network%20appliance%20to%20be%20shipped%2C%20loaded%20with%20your%20data%2C%20shipped%20back%2C%20and%20rehydrated%20on%20Google%20Cloud%20is%2020%20days. The best answer would be A. If gsutil consume/leverage 100MB it would take 12 days and more time-efficient than B. This is a reasonable assumption. https://cloud.google.com/static/architecture/images/big-data-transfer-how-to-get-started-transfer-size-and-speed.png
As per Google recommendation above 1TB of transfer from onprem or from Google cloud or other cloud storage like s3 etc we need to use storage transfer service.
Option B combines security, efficiency, and ease of use, making it a suitable choice for transferring sensitive patient records to BigQuery.
Given the sensitivity of the patient records and the large size of the data, using Google's Transfer Appliance is a secure and efficient method. The Transfer Appliance is a hardware solution provided by Google for transferring large amounts of data. It enables you to securely transfer data without exposing it over the internet.
IMO "A" is the most suitable option since the transfer appliance could take 25 days to get the appliance and then 25 days to ship it back and have the data available. https://cloud.google.com/transfer-appliance/docs/4.0/overview#transfer-speeds
to securely transfer data and looking at the size of data B is the correct option.
while Option A is feasible and could work depending on specific requirements and security measures implemented, Option D (exporting as Avro, using Storage Transfer Service, and then loading into BigQuery) generally offers a more secure, efficient, and managed approach for transferring sensitive patient records into BigQuery from a relational database.Avro files uploaded to GCS will need to be secured. While GCS itself offers security features like IAM policies and access controls, using a public URL (as suggested in Option A) introduces additional security concerns.