Counting Subscribers
Yesterday, I sent Tim Bray an email following up on his post about gathering subscriber metrics from aggregator usage.
I've attached a copy below to share my thoughts, and open them for discussion. I believe the whole area still needs some collective thought to come to a solution that meets the initial need and scales to future identity-oriented requirements.
Tim, Just wanted to add a few comments to your conversation regarding resolution of unique subscribers. First, I'm in total agreement with Brent Simmons in respect to the usage of the User-Agent header to contain a reference to the subscriber. As this field is configured to be present in the access log for most standard web servers, this immediately provides an advantage in terms of mining logs for subscriber-centric data. It certainly wouldn't take much work to enhance existing web log anazlyers such as Webalizer or Analog to generate additional metrics in terms regarding RSS usage, e.g.Any thoughts?Using parameters or other HTTP headers certainly is disadvantageous in comparison, as additional effort would be required to capture such data. It would also be possible to leverage this unique ID field for other purposes - to the benefit of the content producer and the subscriber
- Total unique subscribers
- Average poll interval
- New subscribers (assuming that a cache of unique hash codes is maintained)
- Feed subscriber 'Churn'
I could imagine a trusted scenario where the subscriber could register at the content-providers site, and therefore enable the possibility of configurable profiles. One standard RSS feed could now be customized based upon the specific desires of an individual subscriber. This also could enable a consent-based bidirectional flow of information between the subscriber and a content producer. A simple example of this is where a user's newsreader is too-aggresively polling an RSS feed. This may just be due to a misconfiguration, and an ability to reconcile an incoming request to a particular subscriber would be useful. One thing to consider is whether, in the long term, there would be additional requirements that would drive the need for authentication. There are various possible solutions that would complement the described identification scenario, including HTTP authentication (basic / digest, SSL client-side certificates). Would also be interesting to consider the implications of initiatives such as the Liberty Alliance - there might be a longer term requirement that might drive closer integration to such projects. Certainly the whole idea of certified identity would be useful in the realm of trackbacks, comments, etc, where the current model is (maybe too generously) based upon trust. Anyway, plenty of food for thought! Best regards, Jason
- Poll throttling - at a more granular level than the IP address (useful for scenarios with multiple aggregators / newsreaders behind a proxy)
- Content specialization - granularizing the content of the RSS feed based upon a particular subscriber's ID