Brainfood and Mozilla’s Open Innovation Team Kick Off Text Classification Open Source Experiment
FilterBubbler is a WebExtension that turns your browser into a laboratory for distributed, collaborative text analysis. What does it do and how does it work? FilterBubbler lets you collaboratively “tag” pages with descriptive labels and then analyze any page you visit to see how similar it is to pages you have already classified.
Curious about how much you visit work-related web sites and how often you argue about whether Batman could beat up Spider-Man? Interested in where your favorite mainstream sites got their ideas before those ideas got so mainstream? You’ll be able to use a filter bubble to keep an eye on your personal information space.
You could classify content by: ratings like “G”, “PG”, “PG-13”, “R”; category like “current events” or “fishing”, or even how much you trust the source like “trustworthy” or “urban legend”. The system doesn’t have any bias and it doesn’t limit the number of tags you apply. Once you build up a set of classifications you can visit any page and the system will show you which classification has the closest statistical match. We are building an easy to install WordPress plugin that allows groups to collaborate on classifying pages. Once you have a good set of content built up you can select a classifier and share the entire configuration as a recipe for others to use. Initially we will provide the same kind of Bayesian classifier which most spam detection systems use but bubble filter recipes will allow authors to plug in their own classifier implementations.
FilterBubbler has two major goals. We want to show the amazing capabilities that the WebExtension API lets you add to browsers and we want to provide a platform that helps users experiment and explore collaborative text classification. FilterBubbler can serve as a jumping off point for all kinds of specific applications that can be built on top of its techniques. Ratings systems, content suggestion, fact checking and many other areas of interest can all use the classifiers and corpora that the FilterBubbler community will generate.
How you can help
There are lots of ways to help with this project. First and foremost the project is about content. Once we have the WordPress plug-in ready one of the easiest ways to help will be to install the plug-in and make your classifications public. We also need help with programming, text analysis theory and testing:
- Help with the WordPress plugin: Initially we will be building a WordPress plug-in that allows you to store and manage your corpora and classifier recipes in WordPress but we think there are many additional features that would be worth implementing such as inheriting a corpus and overriding certain values, pushing changes between corpora and recipes, automatically classifying posts in WordPress and so on.
- Host a FilterBubbler repository: If you are part of a group or organization that can benefit from collaborative text classification then we need your help! Select “public” and “publish to repository” in the InfoBubble WordPress plug-in and other people will be able to use your corpora and classifier recipes.
- Blog about FilterBubbler: This project thrives on community participation so if you can write an article about the tool itself or the way your group has put it to work then we would love to hear about it and reference your work on our blog.
- WebStorage friendly text classifiers: Many existing Javascript text analysis tools have strong opinions about the nature of their secondary storage. Very few tools target the WebStorage API that WebExtensions must use. We will be porting text processing libraries to use the WebStorage API starting with the naive Bayesian classifier but we would love to see contributions for other classifiers from the community which can use the same storage strategy.
- WebExtension platform testing: Initial development is being done using FireFox on Linux but there are many other platform combinations available that can run WebExtensions. If you have Chrome on MacOS or Edge on Windows please try out the extension and file issues on GitHub if things don’t work for you.
Getting Involved
User mailing list: This group is for FilterBubbler users and discussion of classification schemes, analysis recipe designs and other related items of interest. This is also a great place for beginning users to ask “getting started” kinds of questions.
https://groups.google.com/a/filterbubbler.org/d/forum/filterbubbler-user
Developer mailing list: This group is for discussing the development of the FilterBubbler Web Extension, WordPress plug-in and related text analysis tools.
https://groups.google.com/a/filterbubbler.org/d/forum/filterbubbler-dev
Git repositories: The Git repositories for the web extension, WordPress plug-in and other tools are hosted on GitHub.
https://github.com/filterbubbler
We look forward to working with you!
The FilterBubbler Team