Lets try if the performance is acceptable
Using a old spider on highend hardware i could process/read 200 pages per second.
Looking forward to tests if i can improve this using the pi-cluster.
The biggest challenges where - Ensuring the parallel spiders to try to get the same url. - Caching(shared) to increase the speed
After a fun weekend with lots of data. The bottleneck (as expected) remains to be the network connection. The speed remains the same, maybe i can still improve the caching, next time!