This Month
December 2004
Sun Mon Tue Wed Thu Fri Sat
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31
Login
User name:
Password:
Remember me 
Year Archive
Recent Visitors
study abroad - Thu 12 Mar 2009 10:23 PM PDT 
sabinuta - Fri 31 Aug 2007 01:36 PM PDT 
bysturyu - Mon 18 Jun 2007 03:38 AM PDT 
Fay - Tue 28 Feb 2006 04:51 PM PST 
fuzzy04 - Tue 07 Feb 2006 12:07 AM PST 


ICANN Dispute Policy

Domain Registration Agreement
View Article  conversation engine: the next step

conversation engine: the next step:

Since the recent integration of Feedster results into the conversation engine, I stopped coding for a bit and while doing other stuff I've been thinking of how to make it more scalable, covering more weblogs, and not wasting resources in looking at pages with no meaning (read: make it more useful) --- in short, how to solve the problems I mentioned in that entry.

The crucial problem is that Feedster provides only part of the picture. Scott Rafer (Feedster CEO) mentioned in the comments that I could use the Feedster links output, which provides a list of the references to a particular weblog. This doesn't quite do what I need however. The reason is simple: Feedster indexes RSS feeds, not entire sites, and so if someone is providing summary feeds, then Feedster will not be able to find links between weblogs, even if they exist. Because, many, many weblogs provide summary feeds, it is clear that the only way to get the links between entries is to get the actual contents of the HTML page. But.

But what I can do is use Feedster as the source point for the list of pages to index. Right now I am indexing everything on a given website. This has two drawbacks. First, I am forced to download, store, and analyze, waaay more content than I need (which accounts for the small amount of sites the bot is crawling at the moment), particularly when weblogs point to other parts of a site, including Wikis, dynamic apps, etc. Second, it slows down the processing for conversations, which depends on walking the link graph between two sites. This is a problem now, but if I move in the direction of adding multiple-participant conversations (as Don suggests in a comment to my previous conv. engine post, linked above) then this will be even more important.

So.

Next step, then, is to use Feedster as the data source for the entries of a given weblog. Then download/process the pages for each entry's permalink. Then analyze that and combine the results with the Feedster information.

Stay tuned! More in the next few days.

Comment on this entry

View Article  Musicians Sing Different Tune on File Sharing

Musicians Sing Different Tune on File Sharing:
Aside from the few who speak out publicly, musicians typically sit out the debate over file sharing. Now, a new survey has found that most artists don't view unauthorized swapping as a threat to their livelihood. -washingtonpost.com