15th ~ 16th OCT 2015 MADRID, SPAIN #BDS15
THE 4th EDITION OF BIG DATA IN Oct 2015 WAS A RESOUNDING SUCCESS.
Thursday 15th
from 17:00 pm to 17:45 pm
Room 25
-
Technical
In this talk Alex is going to introduce new open source framework Frontera. Frontera is a crawl frontier framework, telling your web crawler what to crawl and when.
It's basically the brain of your web crawler. Frontera allows to build real-time, large scale, distributed web crawlers. Offering:
Along with framework description Alex will share with you technical problems he faced developing framework and demonstrate how to build a distributed crawler using Scrapy, Apache Kafka and HBase. The talk is organized in funny and exciting form of a story.
ScrapinghubData Scientist