Part 3 in our series, our final blog post tackles uploading files from your Ruby web app to S3 as a background process using the Shrine gem, referenced only by a remote URL. Based on our experience as a top Ruby on Rails development company, expect detailed instructions and code inside!
Welcome to the 3rd and final chapter of our series, Store Your Files on S3 Using the Ruby Shrine Gem.
- In Part 1 - Setup & Configuration of this series on Amazon S3 integration with Ruby, we learned how to set ourselves up to use Amazon S3 with Shrine and to get ready for uploading,
- In Part 2 - Direct File Uploads, we covered directly uploading a file from our web app.
- Today's tutorial will involve the trickier case of uploading files from a remote URL.
Ruby application: uploading files to Amazon S3 from a remote URL
At first glance, you'd be convinced that the `remote_url` Shrine plugin would be your best choice for uploading from a remote URL by allowing you to attach files directly from a remote location. The plugin also provides validation for incorrect URLs or unreachable remote files, but - it also has one drawback that makes it unfit for our purposes. Using the plugin, files are always downloaded in the foreground immediately after assigning the URL to a model's field (`attachment.file_remote_url = http://example.com/example.pdf`). Unfortunately, the backgrounding plugin is not able to provide background processing here.
However, there is an alternative to the remote_url plugin, an external gem - shrine-url. It provides an additional storage class for Shrine in `Shrine::Storage::Url`. This storage allows us to treat a remote location as a cache location, so the remote file doesn't have to be fetched before saving the record in the database.
Grab the gem:
# Gemfile
gem 'shrine-url'
{
id: 'http://example.com/example.pdf',
storage: 'cache',
metadata: { ... }
}
# config/initializers/shrine.rb
# ...
Shrine.storages[:cache_url] = Shrine::Storage::Url.new
We basically provide additional stores besides the default cache and store.
To keep the convention with the remote_url plugin, we assign remote files using the `@file_remote_url=` setter. It gets the URL as its parameter, generates file data like in the following example, and then assigns it using the file setter in the model.
def file_remote_url=(url)
return if url.blank?
@file_remote_url = url
file_attacher(cache: :cache_url)
self.file = JSON.dump(
id: url,
storage: :cache_url,
metadata: { filename: File.basename(URI(url).path) }
)
rescue URI::InvalidURIError, Down::Error
file_attacher.errors << "invalid URL"
end
We also dynamically select a store by passing the `cache` value to `file_attacher` - which we explain in the How can we dynamically select storage? section.
Since we need to re-validate data that comes from the client, we reach out for the restore_cache_data plugin. Now, when we assign a new value to a file, Shrine will automatically re-validate the file's metadata:
# app/uploaders/attachment.rb
class AttachmentUploader < Shrine
...
plugin :restore_cached_data
end
How can we dynamically select storage?
The biggest problem with integrating the `Shrine::Storage::Url` storage was dynamically selecting the cache storage class, depending on the type of file (remote or physical) uploaded.
After doing some research into all the possibilities offered by Shrine, we concluded that, for this problem, the default_storage and dynamic_storage plugins offered by Shrine would not be good enough for our purposes. The problem is that `Shrine::Attacher` (which the attachment methods delegate to) is instantiated before the attachment is assigned, so we can not conditionally select it based on the assigned value. We submitted the issue to the Shrine GitHub here, and Janko Marohnic, the author, was very forthcoming with a workaround and a permanent fix.
The most appropriate solution has been added to Shrine recently but has not been released yet. We've decided to use that, so a slight modification in the Gemfile was required:
gem 'shrine', git: 'https://github.com/janko-m/shrine.git', ref: 'd8b763f'
parsed_data = JSON.parse(file_data)
cache_type = parsed_data['storage']
attachment.file_attacher(cache: cache_type)
attachment.file_remote_url = parsed_data['id']
{
id: 'http://example.com/example.pdf',
storage: 'cache_url',
metadata: { ... }
}
# config/initializers/shrine.rb
# ...
Shrine.storages[:cache_url] = Shrine::Storage::Url.new
attachment = Attachment.new
attachment.remote_file_url = "http://example.com/file.pdf"
attachment.file
# => #<AttachmentUploader::UploadedFile:0x007f8f05bee740 @data={"id"=>"http://example.com/file.pdf", "storage"=>"cache_url", "metadata"=>{"filename"=>"file.pdf", "size"=>1024, "mime_type"=>"application/pdf"}}>
attachment.file.storage
# => #<Attachments::Storage::Url:0x007f8f0c586e50 @downloader=Down::NetHttp>
attachment.save
# => true
attachment.reload.file
# => #<AttachmentUploader::UploadedFile:0x007f8f0d74dcd8 @data={"id"=>"504e412892b9e869136abe645e15e3c8", "storage"=>"store", "metadata"=>{"filename"=>"file.pdf", "size"=>1024, "mime_type"=>"application/pdf"}}>
attachment.file.storage
# => #<Attachments::Storage::S3WithRemoteable:0x007f80d9847978
# @bucket=#<Aws::S3::Bucket:0x007f80d5ca5b68 @client=#<Aws::S3::Client>, @data=nil, @name="ironin-bucket">,
# @client=#<Aws::S3::Client>,
# @host=nil,
# @multipart_threshold={:upload=>15728640, :copy=>104857600},
# @prefix="store",
# @upload_options={}>
file_data = {
"id": "488c026ab9b0b36bdbb3d60963556b97",
"storage": "cache",
"metadata": {
"filename": "file.pdf",
"size": 1024,
"mime_type": "application/pdf"
}
}
attachment.file = file_data.to_json
attachment.file
# => #<AttachmentUploader::UploadedFile:0x007f80db3f9590
@data={"id"=>"488c026ab9b0b36bdbb3d60963556b97", "storage"=>"cache", "metadata"=>{"filename"=>"file.pdf", "size"=>1024, "mime_type"=>"application/pdf"}}>
attachment.file.storage
# => #<Attachments::Storage::S3WithRemoteable:0x007f80d98d2730
@bucket=#<Aws::S3::Bucket:0x007f80d9847b58 @client=#<Aws::S3::Client>, @data=nil, @name="ironin-bucket">,
@client=#<Aws::S3::Client>,
@host=nil,
@multipart_threshold={:upload=>15728640, :copy=>104857600},
@prefix="cache",
@upload_options={}>
a.save
# => true
attachment.reload.file
# => #<AttachmentUploader::UploadedFile:0x007f80db3f9590
@data={"id"=>"533d026ab9b0b36bdbb3d60963556b98", "storage"=>"store", "metadata"=>{"filename"=>"file.pdf", "size"=>1024, "mime_type"=>"application/pdf"}}>
attachment.file.storage
# => #<Attachments::Storage::S3WithRemoteable:0x018f80d8737978
# @bucket=#<Aws::S3::Bucket:0x007f80d5ca5b68 @client=#<Aws::S3::Client>, @data=nil, @name="ironin-bucket">,
# @client=#<Aws::S3::Client>,
# @host=nil,
# @multipart_threshold={:upload=>15728640, :copy=>104857600},
# @prefix="store",
# @upload_options={}>
It should be mentioned that to clear cached files, we have used the mechanism offered by AWS to manage an object's lifecycle. In our case, all files from the cache directory are permanently deleted after seven days from the creation date.
Need help managing or building your Ruby web app? Need Ruby development with Amazon S3 - or any other web app development that needs integrating with Amazon S3? At iRonin, we have a team of expert Ruby developers ready to augment your team and provide efficient solutions with a fresh, experienced set of eyes on your project. Call us today to learn more about how we can help spur your web development efforts.