Web UI Manipulation with Ruby for testing and beyond

iRonin IT Team•2018-05-25

Web UI manipulation for testing, data extraction, or web scraping, is a complicated and comprehensive process. Websites that contain an abundance of JavaScript code and unprecedented iframe structures can be hard to extract through simple requests. Capybara, paired with a Poltergeist driver, makes retrieving complicated data from any kind of website simple and hassle-free. In this tutorial, our experts at iRonin, a top Ruby on Rails development company, will explain web scraping using Ruby and Capybara. We will highlight in detail how to use Capybara cookies in RestClient Gem Requests for smooth data extraction.

What exactly is Capybara?

Capybara is a web-based test automation tool that simulates scenarios and automates web app testing for behavior-driven app development. It basically simulates web browser testing from the real user's perspective.

Capybara is programmed with Ruby, which makes data extraction easy (i.e., how users interact with the application). It can communicate with various types of browsers, allowing you to execute tests through a simple and clean interface.

Apart from testing your web application, you can send forms, fill fields, execute JavaScript code, and so on. In other words, with Capybara, you have the power to act like a real user and test the browser in different scenarios.

Capybara serves many purposes. For this tutorial, our web harvesting experts have focused on its prominent feature: data extraction. Search engines like Google already use this technique to crawl millions of web pages. Today, we will learn how to use Capybara cookies to get the authorization details required for quick and hassle-free web UI manipulation.

Using Capybara Cookies in RestClient Requests

In the following tutorial, we will go through the process of data extraction using Capybara cookies.

We will start by creating a new Capybara object and making a request for a sample page. Get started with the following:

page = Capybara::Session.new(:poltergeist)
page.visit("http://somesamplesite.com")

At this point, you will need to sign in to a user account. The sign-in form is submitted by the JavaScript code, which may also generate security tokens not available in the page source.

page.fill_in 'user[email]', with: 'email'
page.fill_in 'user[password]', with: 'password'
page.click_button 'Sign in'

The above code generally works well; Capybara readily simulates the scenario for user behavior in the browser and submits the sign-in form even if the JavaScript code is responsible for the form submission or the security token addition.

With the above code, you will have access to the user account, and the account will be ready for website data extraction.

Let's take an example. Suppose you want to scrap a list of user blog posts from a website. This is what you will need to do in Capybara:

page.visit("http://somesamplesite.com/account/posts")
body = page.body
html = Nokogiri::HTML(body)
posts = []
html.css("#blog_posts p").each do |blog_paragraph|
  title = blog_paragraph.css("h2").first.text
  content = blog_paragraph.css("#content p").text
  posts << {
    title: title,
    content: content
  }
end
posts

The mentioned code may not be as fast as you may anticipate; you will have to wait for the page to load to see the results. Note the use of Nokogiri. It is a useful library for parsing data from HTML documents.

It is also important to remember that JavaScript codes can easily manipulate the page's content, making the scraping process harder. In this unique scenario, you must find an endpoint and access the data before it is formatted by the JavaScript codes. To do so, you can use Dev Tools in your browser.

Let's take an example to understand this better.

Let's say that you found an endpoint with a URL: http://somesamplesite.com/users/posts.json

This endpoint will be secure, which means that you will have to sign in to access the user data. Now the authorization credentials are stored in the browser cookies, and here's how you can use Capybara to get those cookies:

cookies = {}
# each Capybara cookie is a two elements array. The first element is the cookie name
# the second element is Capybara::Poltergeist::Cookie object where the object attributes
# are mapped to the cookie attributes
page.driver.cookies.each do |cookie|
  cookies[cookie.first] = cookie.last.value
end

Once in, ensure you collect all the cookies because it is impossible to tell which cookies the website is utilizing for authorization.

The cookies will help you to authenticate in the system in order to retrieve the blog posts' data.

Once we have the cookies, we can use the following code to request the blog posts:

response = RestClient.get("http://somesamplesite.com/users/posts.json", {cookies: cookies})
json_posts = JSON.parse(response.body)
posts = []
json_posts.each do |post|
  posts << {
    title: post['title'],
    content: post['content']
  }
end
posts

And that is it!

The above steps can help you retrieve the blog post data without any hassle.

All you have to do is to ensure that you have entered the right code at the right places. The mentioned code is fast and effective and allows easy testing. In addition, it is not dependent on website UI changes.

Final thoughts

Web UI manipulation with Capybara is simpler and time-saving. It effectively works with all websites and allows you to retrieve data through endpoints, so it remains unaffected by complicated JavaScript codes that essentially manipulate the content.

Ruby is a powerful programming language key in myriad web development tasks, including web data extraction. Ruby uses 'gems', including Capybara, that offer already implemented functions to speed up the development process. We specialize in programming languages like Ruby and technologies like Capybara to help your enterprise extract large amounts of data that are crucial for critical data analysis. If you require any assistance regarding web and software development, contact us right away. Our software experts would love to help you with your projects.

Let’s get in touch

Read Similar Articles

RRUG #16 - Rzeszów Ruby User Group meeting recap

10 tips for finding the perfect Ruby on Rails job

Improve Performance and Save Money with These Software Development Tricks

From Data to Decisions: The Practical Applications of Predictive Analytics in Real Estate