I took a quick gig that required scrapping Facebook pages for keywords and collecting all the page URLs. Having never worked with Facebook’s Open Graph I thought it might be a little challenging but I was pleasantly surprised at how easy it turned out to be.
I used Facebook’s Open Graph Explorer to get a list of all the pages with the keyword yoga. This turned out to be very simple and easy to do.
Two things to note here:
1) To search pages you will need to get an app token. To do that you need to click on the Graph API Explorer button and select an app. If you don’t have an app registered to your name then you need to go ahead and do that.
2) Take the app token code and add
Then you can hit submit and you’ll get the results returned back to you in what looks like JSON but has HTML in it. What I did next was copy all of that and save it as an HTML file (I’m using Nokogiri and wanted to make this easy). Since I wanted the id numbers and these linked to Facebook pages I needed to scrape my document for @href tags. The problem is the URLs were pointed at graph.facebook.com/idnumber so I needed to remove “graph.” from the strings. Here is the code I used to accomplish all of this and export the links to a CSV file.
f = File.open("fbpages.html")
@doc = Nokogiri::HTML(f)
fbarray = 
fblinks = @doc.search('//a/@href')
$i = 0
while $i < fblinks.length
fbarray[$i] = fblinks[$i].to_s.sub('graph.','')
$i = $i + 1
$x = 0
while $x < fbarray.length
CSV.open("fbpages.csv", "ab") do |csv|
csv << [$x, fbarray[$x]]
$x = $x + 1
If you aren’t familiar with Nokogiri it is used for scraping data similar to BeautifulSoup for Python. It searches for all of the @href tags and adds all of the links it finds to an array. Then I needed to iterate over this array and remove ‘graph.’ from each link. Ruby’s fantastic .sub(‘graph’, ”) accomplishes this easily. The next thing I needed to do was export each one to a CSV file on its own row.
Remember when using CSV.open to use “ab” as your option so that each new entry is added as its own row.
This is fairly simple stuff but might be useful for anyone that is looking to do some research using Facebook Open Graph!