8. How Will You Upload a File Greater Than 100 Megabytes in Amazon S3
EXPEDIA Grouping TECHNOLOGY — SOFTWARE
How to Upload Large Files to AWS S3
Using Amazon's CLI to reliably upload up to v terabytes
In a single operation, you can upload up to 5GB into an AWS S3 object. The size of an object in S3 can be from a minimum of 0 bytes to a maximum of v terabytes, and so, if you are looking to upload an object larger than 5 gigabytes, you demand to utilize either multipart upload or split the file into logical chunks of up to 5GB and upload them manually equally regular uploads. I will explore both options.
Multipart upload
Performing a multipart upload requires a process of splitting the file into smaller files, uploading them using the CLI, and verifying them. The file manipulations are demonstrated on a UNIX-like system.
- Before you upload a file using the multipart upload process, we demand to calculate its base64 MD5 checksum value:
$ openssl md5 -binary test.csv.gz| base64 a3VKS0RazAmJUCO8ST90pQ==
2. Carve up the file into pocket-sized files using the split
control:
Syntax
split up [-b byte _ count[one thousand|chiliad]] [-l line _ count] [file [name]] Option -b Create smaller files byte_count bytes in length.
`k' = kilobyte pieces
`m' = megabyte pieces. -50 Create smaller files line _ count lines in length.
Splitting the file into 4GB blocks:
$ carve up -b 4096m test.csv.gz test.csv.gz.part- $ ls -fifty exam*
-rw-r--r--@ one user1 staff 7827069512 Aug 26 xvi:20 test.csv.gz
-rw-r--r-- 1 user1 staff 4294967296 Aug 26 xvi:36 test.csv.gz.part-aa
-rw-r--r-- 1 user1 staff 3532102216 Aug 26 16:36 test.csv.gz.part-ab
three. At present, multipart upload should be initiated using the create-multipart-upload
command. If the checksum that Amazon S3 calculates during the upload doesn't lucifer the value that you entered, Amazon S3 won't shop the object. Instead, yous receive an error message in response. This step generates an upload ID, which is used to upload each part of the file in the next steps:
$ aws s3api create-multipart-upload \
--bucket bucket1 \
--key temp/user1/test.csv.gz \
--metadata md5=a3VKS0RazAmJUCO8ST90pQ== \
--profile dev {
"AbortDate": "2020-09-03T00:00:00+00:00",
"AbortRuleId": "deleteAfter7Days",
"Bucket": "bucket1",
"Primal": "temp/user1/examination.csv.gz",
"UploadId": "qk9UO8...HXc4ce.Vb"
}
Explanation of the options:
--bucket
saucepan proper noun
--key
object proper noun (can include the path of the object if you desire to upload to any specific path)
--metadata
Base64 MD5 value generated in step 1
--profile
CLI credentials contour proper name, if you take multiple profiles
4. Next upload the first smaller file from pace 1 using theupload-function
control. This step will generate an ETag, which is used in later steps:
$ aws s3api upload-function \
--saucepan bucket1 \
--fundamental temp/user1/test.csv.gz \
--part-number 1 \
--body test.csv.gz.role-aa \
--upload-id qk9UO8...HXc4ce.Vb \
--profile dev {
"ETag": "\"55acfb877ace294f978c5182cfe357a7\""
}
In which:
--role-number
file part number
--trunk
file name of the part being uploaded
--upload-id
upload ID generated in stride 3
5. Upload the second and final part using the same upload-part
control with --function-number 2
and the second part's filename:
$ aws s3api upload-part \
--bucket bucket1 \
--key temp/user1/test.csv.gz \
--office-number 2 \
--body test.csv.gz.function-ab \
--upload-id qk9UO8...HXc4ce.Vb \
--contour dev {
"ETag": "\"931ec3e8903cb7d43f97f175cf75b53f\""
}
6. To make sure all the parts have been uploaded successfully, y'all can use the list-parts
command, which lists all the parts that take been uploaded then far:
$ aws s3api list-parts \
--bucket bucket1 \
--key temp/user1/test.csv.gz \
--upload-id qk9UO8...HXc4ce.Vb \
--profile dev {
"Parts": [
{
"PartNumber": 1,
"LastModified": "2020-08-26T22:02:06+00:00",
"ETag": "\"55acfb877ace294f978c5182cfe357a7\"",
"Size": 4294967296
},
{
"PartNumber": ii,
"LastModified": "2020-08-26T22:23:thirteen+00:00",
"ETag": "\"931ec3e8903cb7d43f97f175cf75b53f\"",
"Size": 3532102216
}
], "Initiator": {
"ID": "arn:aws:sts::575835809734:assumed-role/dev/user1",
"DisplayName": "dev/user1"
}, "Possessor": {
"DisplayName": "aws-account-00183",
"ID": "6fe75e...e04936"
}, "StorageClass": "STANDARD"
}
seven. Next, create a JSON file containing the ETags of all the parts:
$ true cat partfiles.json
{
"Parts" : [
{
"PartNumber" : 1,
"ETag" : "55acfb877ace294f978c5182cfe357a7"
},
{
"PartNumber" : 2,
"ETag" : "931ec3e8903cb7d43f97f175cf75b53f"
}
]
}
8. Finally, cease the upload process using the complete-multipart-upload
control equally below:
$ aws s3api complete-multipart-upload \
--multipart-upload file://partfiles.json \
--bucket bucket1 \
--key temp/user1/test.csv.gz \
--upload-id qk9UO8...HXc4ce.Vb --profile dev {
"Expiration": "expiry-appointment=\"Fri, 27 Aug 2021 00:00:00 GMT\", rule-id=\"deleteafter365days\"",
"VersionId": "TsD.L4ywE3OXRoGUFBenX7YgmuR54tY5",
"Location":
"https://bucket1.s3.usa-eastward-1.amazonaws.com/temp%2Fuser1%2Ftest.csv.gz",
"Bucket": "bucket1",
"Cardinal": "temp/user1/test.csv.gz",
"ETag": "\"af58d6683d424931c3fd1e3b6c13f99e-two\""
}
9. Now our file object is uploaded into S3.
The following tabular array provides multipart upload core specifications. For more than information, see Multipart upload overview.
Finally, Multipart upload is a useful utility to make the file one object in S3 instead of uploading it as multiple objects (each less than 5GB).
Dissever and upload
The multipart upload procedure requires you to have special permissions, which is sometimes fourth dimension-consuming to obtain in many organizations. Y'all tin dissever the file manually and do a regular upload of each part every bit well.
Here are the steps:
- Unzip the file if it is a nothing file.
- Split the file based on the number of lines in each file. If it is a CSV file, you can use
parallel --header
to copy the header to each dissever file. I am splitting here afterwards every 2M records:
$ cat exam.csv \
| parallel --header : --pipe -N2000000 'cat >file_{#}.csv'
3. zip the file dorsum using gzip <filename>
control and upload each file manually as a regular upload.
http://lifeatexpediagroup.com
Source: https://medium.com/expedia-group-tech/how-to-upload-large-files-to-aws-s3-200549da5ec1
0 Response to "8. How Will You Upload a File Greater Than 100 Megabytes in Amazon S3"
Post a Comment