Good old Arts & Letters Daily — a loving curated museum of press sources of all kind, dating back to the early 1990s, it resembles one of those kiosks on the Avenida Paulista selling everything that is fit to print, from everywhere — and may also keep a few Romeo y Julietas in a drawer somewhere for those who appreciate the legendary thighs of mulato women.
The sight makes for the basis of an meaningful and useful Web crawl, and that is what I have done with it. I obtained a sampling of its network neighborhood with
mkdir ALD cd ALD wget -rH http://aldaily.com
and then used some tricks to format them into a list readable by the Web crawler WIRE.
cd .. ls ALD > ALD.list
and used some simple regular expressions to format the input file as required by WIRE.
This list is then used to seed the Web crawl:
wire-bot-seeder –start ALD.list
The bot proceeds to catalog the literary and intellectual wealth of the site. A second run within the same topic can be performed, beginning with
wire-info-extractor --seeds > next.seeds
Here is a pretty picture showing the central point of reference in the context of partitions of various thematic unities. I am basically putting myself through the Coursera source on social network analysis, which requires more Gephi than yEd, the diagrammer I normally use but which crashes a lot.
Filed under: Brazil