Installing and using the Web Extension from a local build
To get started with FilterBubbler you first need to install the extension.
To download the extension follow this link: http://filterbubbler.org/releases/filterbubbler-1.0.1-an+fx.xpi
If you are building FilterBubbler from the Git repo, then you will load the resulting extension into FireFox using the “about:debugging” page:
Once you are on the page you will check the “enable add-on debugging” box so that you can debug your improvements, and then click the “Load Temporary Add-on” button which will bring up a file dialog:
In the file dialog, navigate to your checkout of FilterBubbler and select the manifest.json file. This will add the extension to your browser. You should then see the extension in the list of Add-ons, and see the new FilterBubbler button in the Firefox toolbar.
When you select the button you should see the FilterBubbler pop-up with its four tabs: matches, corpora, recipes and settings.
Now you are ready to get started making corpora and recipes!
Creating a new corpus
Now that you have the web extension installed you can build your first corpus. That may make you wonder, what is a corpus? A corpus (plural: corpora) is a term used in data science for a collection of information. In the case of FilterBubbler a corpus is a group of labels that can be applied to web pages. These labels can be anything you want. You could classify pages into “Sports” and “Politics” or “R” and “PG-13”. These pages are then used to drive a classifier recipe which combines your classifications with automated actions based on how closely new pages match the content of a given group of URLs.
Starting a corpus
To create a new corpus, open the FilterBubbler pop-up and select the Corpora tab:
Then click the “Add corpus” menu entry:
This will allow you to enter a name for your corpus. When you press enter, the new corpus is created:
You can now add classifications to your corpus by selecting the “Add classification” menu item.
Press enter and the new classification is created. You can continue to create as many classifications as you want.
Once you have created all your classifications you can start applying them to some URLs. Navigate to any page and then click the checkbox for the proper classification.
You can continue to classify as many URLs as you want. The more URLs you classify the more flexible and accurate you system will be at identifying content. Once you have added all the URLs you want you can move on to creating a recipe.
A recipe connects your corpus to the browser. Currently the actions you can have a recipe take are very limited but we will provide a pluggable architecture for adding new functionality. The recipe has four components. Those are the source, sink, classifier and corpus. You’ve already learned about the corpus so lets talk about the rest.
Sources represent a stream of URLs that need classification. We currently only provide one source and that is the current page you have loaded in the browser when you activate the FilterBubbler extension. Sources can basically be any stream of URLs so other possibilities might include your browsing history, your incoming Facebook friend posts or anything else that you can fetch using the WebExtensions API.
The classifier examines URLs that come in from the sink and then use an algorithm to judge the similarity of the page content to the contents of the pages in each classification group in the corpus. Currently we only provide a Naive Bayesian classifier which is similar to the algorithm that spam classification uses. Instead of just identifying ham or spam the classifier will match against all the classifications in your corpus.
A sink receives the output of the classifier and can take action on it. The only action we currently support is the display of the classification in the “classifications” tab of the extension.
Using the recipe
Once you have configured your recipe you can start visiting other URLs and see what you classifier makes of them. It will look like this:
[need a configured screenshot]
Working with remote servers
FilterBubbler lets you upload your corpora and recipes to remote servers. Our initial server implementation is a WordPress plug-in to make it as easy as possible to set up your own server. Just install the WordPress plug-in and you’ll be ready to go. Once you have set up a server you can attach the extension to it with the “Settings” panel.
Click “Add server” and enter the URL of your server in the input field:
Once you have configured a server you will be able to select it when you use the “upload corpus” or “upload recipe” functions:
When you click the “Upload recipe” entry you will be asked to select the server you want to upload to:
This completes our tour of the FilterBubbler system. We hope you enjoy this extension and find it useful for analyzing your own browsing habits. We invite you to sign up on the community mailing list so that you will be informed when we deploy new sources, sinks and classifiers. Thank you!