Three DataSync tasks have been configured, one for each scenario, using the following task options: The target location is an S3 bucket with versioning enabled. The source locations are a set of Network File System (NFS) shares, hosted by an on-premises Linux server. The following diagram walks through the architecture for my testing environment. For detailed information on configuring DataSync, please visit the user guide. When you create a task, you define both a source and destination location. The agent communicates with the DataSync service in the AWS Cloud, which performs the actual reading and writing of the data to AWS Storage services.Ī DataSync task consists of a pair of locations that data will be transferred between. A DataSync agent is a virtual machine (VM) that is used to read or write data from on-premises storage systems. Let’s take a moment to review how DataSync can be used to transfer data from on premises to AWS storage services. You can use DataSync to migrate active datasets to AWS, archive data to free up on-premises storage capacity, replicate data to AWS for business continuity, or transfer data to the cloud for analysis and processing. Solution overviewĭataSync is an online data transfer service that simplifies, automates, and accelerates moving data between on-premises storage systems and AWS Storage services, in addition to between AWS Storage services. Using DataSync to synchronize data that was written by a utility other than DataSync in which the storage class is: S3 Glacier, S3 Glacier Deep Archive, or S3 Intelligent-Tiering (Archive Access or Deep Archive Access tiers).Īfter reviewing the detailed results of each scenario, you will be better prepared to decide how to use DataSync to efficiently migrate and synchronize your data to Amazon S3 without unexpected charges.Using DataSync to synchronize data that was written by a utility other than DataSync in which the storage class is: S3 Standard, S3 Intelligent-Tiering (Frequent Access or Infrequent Access tiers), S3 Standard-IA, or S3 One Zone-IA.Using DataSync to perform the initial copy and all incremental changes.In this post, I dive deep into copying on-premises data to an S3 bucket by exploring the following three distinct scenarios, each of which will produce a unique result: Depending upon which storage class was used for the initial transfer, this could result in unexpected costs. In that case, DataSync will need to perform additional operations to properly transfer incremental changes to S3. If the data was transferred using a utility other than DataSync, this metadata will not be present. DataSync uses object metadata to identify incremental changes. To avoid additional time, costs, and bandwidth consumption, it is important to fully understand exactly how DataSync identifies “changed” data. How will DataSync respond when copying data to an S3 bucket than contains files that were written by a different data transfer utility? Will DataSync recognize that the existing files match the on-premises files? Will a second copy of the data be created in S3, or will the data need to be retransmitted? In this type of scenario, where data is first copied to S3 using one tool and incremental updates are applied using DataSync, there are a few questions to consider. Sometimes, those same customers then use AWS DataSync to capture ongoing incremental changes. I often see cases in which customers start with a free data transfer utility, or an AWS Snow Family device, to get their data into S3. There are many factors to consider when migrating data from on premises to the cloud, including speed, efficiency, network bandwidth and cost. A common challenge many organizations face is choosing the right utility to copy large amounts of data from on premises to an Amazon S3 bucket.
0 Comments
Leave a Reply. |