See all articles

Store Your Files on S3 Using the Ruby Shrine Gem

Part 3 in our series, our final blog post tackles uploading files from your Ruby web app to S3 as a background process using the Shrine gem, referenced only by a remote URL. Detailed instructions and code inside!

Welcome to the 3rd and final chapter of our series, Store Your Files on S3 Using the Ruby Shrine Gem. In Part 1 - Setup & Configuration of this series on Amazon S3 integration with Ruby we learned how to set ourselves up to use Amazon S3 with Shrine and to get ready for uploading, and in Part 2 - Direct File Uploads, we covered directly uploading a file from our web app. Today’s tutorial will involve the trickier case of uploading files from a remote URL.

Ruby application: uploading files to Amazon S3, from a remote URL

At first glance, you’d be convinced that the `remote_url` Shrine plugin would be your best choice for uploading from a remote URL, by allowing you to attach files directly from a remote location. The plugin also provides validation for incorrect URLs or unreachable remote files, but - it has also one drawback that makes it unfit for our purposes. Using the plugin, files are always downloaded in the foreground, immediately after assigning the URL to a model’s field (`attachment.file_remote_url = http://example.com/example.pdf`). Unfortunately, the backgrounding plugin is not able to provide background processing here.

However, there is an alternative to the remote_url plugin, an external gem - `shrine-url`(https://github.com/janko-m/shrine-url) . It provides an additional storage class for Shrine in `Shrine::Storage::Url`. This storage allows us to treat a remote location as a cache location, so the remote file doesn't have to be fetched before saving the record in the database.

Grab the gem:

1 2 # Gemfile gem 'shrine-url'

The main idea of this gem is storing file data like below:

1 2 3 4 5 { id: 'http://example.com/example.pdf', storage: 'cache', metadata: { ... } }

The `Shrine::Storage::Url` class should be used as the storage class for the remote file, so a small modification in the Shrine configuration is needed:

1 2 3 # config/initializers/shrine.rb # ... Shrine.storages[:cache_url] = Shrine::Storage::Url.new

We basically provide additional stores besides the default `cache` and `store`.

To keep the convention with the `remote_ur`l plugin, we assign remote files using the `@``file_remote_url=` setter. It gets the URL as its parameter, generates file data like in the following example, and then assigns it using the `file` setter in the model.

1 2 3 4 5 6 7 8 9 10 11 12 def file_remote_url=(url) return if url.blank? @file_remote_url = url file_attacher(cache: :cache_url) self.file = JSON.dump( id: url, storage: :cache_url, metadata: { filename: File.basename(URI(url).path) } ) rescue URI::InvalidURIError, Down::Error file_attacher.errors << "invalid URL" end

We also dynamically select a store by passing the `cache` value to `file_attacher` - which we explain in the `How can we dynamically select storage?` section.

Since we need to re-validate data that comes from the client we reach out for the restore_cache_data plugin. Now when we assign a new value to `file`, Shrine will automatically re-validate the file’s metadata:

1 2 3 4 5 # app/uploaders/attachment.rb class AttachmentUploader < Shrine ... plugin :restore_cached_data end

<a name="how-can-we-dynamically-select-storage"></a>How can we dynamically select storage?

The biggest problem with integrating the `Shrine::Storage::Url` storage was to dynamically select the cache storage class, depending on what type of file (remote file or physical file) is uploaded.

After doing some research into all the possibilities offered by Shrine, we came to the conclusion that, for this problem, the `default_storage` (http://shrinerb.com/rdoc/classes/Shrine/Plugins/DefaultStorage.html) and `dynamic_storage` (http://shrinerb.com/rdoc/classes/Shrine/Plugins/DynamicStorage.html) plugins offered by Shrine would not be good enough for our purposes. The problem here is that `Shrine::Attacher` (which the attachment methods delegate to) is instantiated *before* the attachment is assigned, so we can not conditionally select it based on the assigned value. We submitted the issue to the Shrine GitHub here, and Janko Marohnic, the author was very forthcoming with first a workaround and then a permanent fix.

The most appropriate solution has been added to Shrine recently but not released yet. We’ve decided to use that, so a small modification in the Gemfile was required:

1 gem 'shrine', git: 'https://github.com/janko-m/shrine.git', ref: 'd8b763f'

The above will use a not yet released version of the Shrine gem from the `d8b763f` commit. The latest stable version at the time of writing the blog post (which doesn’t contain support for the solution we have used) is `2.8.0`.

Now file attaching should look like so:

1 2 3 4 parsed_data = JSON.parse(file_data) cache_type = parsed_data['storage'] attachment.file_attacher(cache: cache_type) attachment.file_remote_url = parsed_data['id']

File data should be stored as such:

1 2 3 4 5 { id: 'http://example.com/example.pdf', storage: 'cache_url', metadata: { ... } }

And adding additional storage in the Shrine configuration is done like this:

1 2 3 # config/initializers/shrine.rb # ... Shrine.storages[:cache_url] = Shrine::Storage::Url.new

Examples of use

Remote URL

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 attachment = Attachment.new attachment.remote_file_url = "http://example.com/file.pdf" attachment.file # => #<AttachmentUploader::UploadedFile:0x007f8f05bee740 @data={"id"=>"http://example.com/file.pdf", "storage"=>"cache_url", "metadata"=>{"filename"=>"file.pdf", "size"=>1024, "mime_type"=>"application/pdf"}}> attachment.file.storage # => #<Attachments::Storage::Url:0x007f8f0c586e50 @downloader=Down::NetHttp> attachment.save # => true attachment.reload.file # => #<AttachmentUploader::UploadedFile:0x007f8f0d74dcd8 @data={"id"=>"504e412892b9e869136abe645e15e3c8", "storage"=>"store", "metadata"=>{"filename"=>"file.pdf", "size"=>1024, "mime_type"=>"application/pdf"}}> attachment.file.storage # => #<Attachments::Storage::S3WithRemoteable:0x007f80d9847978 # @bucket=#<Aws::S3::Bucket:0x007f80d5ca5b68 @client=#<Aws::S3::Client>, @data=nil, @name="ironin-bucket">, # @client=#<Aws::S3::Client>, # @host=nil, # @multipart_threshold={:upload=>15728640, :copy=>104857600}, # @prefix="store", # @upload_options={}>

Direct upload

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 file_data = { "id": "488c026ab9b0b36bdbb3d60963556b97", "storage": "cache", "metadata": { "filename": "file.pdf", "size": 1024, "mime_type": "application/pdf" } } attachment.file = file_data.to_json attachment.file # => #<AttachmentUploader::UploadedFile:0x007f80db3f9590 @data={"id"=>"488c026ab9b0b36bdbb3d60963556b97", "storage"=>"cache", "metadata"=>{"filename"=>"file.pdf", "size"=>1024, "mime_type"=>"application/pdf"}}> attachment.file.storage # => #<Attachments::Storage::S3WithRemoteable:0x007f80d98d2730 @bucket=#<Aws::S3::Bucket:0x007f80d9847b58 @client=#<Aws::S3::Client>, @data=nil, @name="ironin-bucket">, @client=#<Aws::S3::Client>, @host=nil, @multipart_threshold={:upload=>15728640, :copy=>104857600}, @prefix="cache", @upload_options={}> a.save # => true attachment.reload.file # => #<AttachmentUploader::UploadedFile:0x007f80db3f9590 @data={"id"=>"533d026ab9b0b36bdbb3d60963556b98", "storage"=>"store", "metadata"=>{"filename"=>"file.pdf", "size"=>1024, "mime_type"=>"application/pdf"}}> attachment.file.storage # => #<Attachments::Storage::S3WithRemoteable:0x018f80d8737978 # @bucket=#<Aws::S3::Bucket:0x007f80d5ca5b68 @client=#<Aws::S3::Client>, @data=nil, @name="ironin-bucket">, # @client=#<Aws::S3::Client>, # @host=nil, # @multipart_threshold={:upload=>15728640, :copy=>104857600}, # @prefix="store", # @upload_options={}>

It should be mentioned that to clear cached files we have used the mechanism offered by AWS to manage an object’s lifecycle (http://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-lifecycle.html). In our case, all files from the cache directory are permanently deleted after seven days from the creation date.

Need help managing or building on your Ruby web app? Need Ruby development with Amazon S3 - or any other web app development that needs integrating with Amazon S3? At iRonin, we have a team of expert Ruby developers on hand, ready to augment your team and provide efficient solutions with a fresh, experienced set of eyes on your project. Call us today to find out more about how we can help spur on your web development efforts.

Read Similar Articles