If you need an html, xml, etc… parser for your latest Rails application, nokogiri may be the tool for you.
Allow me to introduce a real life example for which I found it quite useful.
For a site I manage, Universeindie.com, I wanted to collect a daily list of free ebooks for my users who were utilizing Amazon’s KDP select free book promotion feature. Now all I had was an ASIN to work with, which is the unique identifier for the Amazon ebook. Unfortunately, Amazon’s API to fetch ebook information did not return the free book price as I’d hoped. So, I figured, why not just programmatically browse to a particular book’s Amazon product page and look to see if the book was free.
I inspected a few freebie ebook pages on Amazon and discovered a pattern. If the data in the .priceLarge span was was $0, the book was a freebie. Simple enough!
Nokogiri to the rescue!
First, I simply had to add the nokogiri gem to my gemfile and bundle.
After that, a simple rake task using the following code pulled back whether or not a specific book was free:
#assumes book is an instance of a Book class with attribute asin
page = Nokogiri::HTML(open("http://www.amazon.com/dp/#{book.asin}"))
price = page.css('.priceLarge')
This sets price to a string like “$0”, for which I could then trim the “$”, convert to int, and perform necessary checks for price ranges, including the coveted 0 price 🙂
Hope that this simple real life application of Nokogiri inspires you to use it for your html parsing needs!