A StumbleUpon-style Wikipedia explorer

Lightning Rails app #2

December 16, 2021 · Felipe Vogel ·

Occasionally I get the urge to read Wikipedia. Not on a particular topic, just… whatever is out there. But how can a person explore Wikipedia? Here are some existing approaches:

What I really wanted, though, was a way to get personalized recommendations of articles, like StumbleUpon but for Wikipedia. That way I could avoid some of the hit-or-miss results of browsing random articles, and I wouldn’t have to wade through topic lists either.

So I’ve built Wiki Stumble, a little app that suggests Wikipedia articles based on user-selected categories and also based on the user’s reaction (thumbs up or down) to previous recommendations.

This is my second “lightning app” this month, so named because I’m making them in the spare hours of a day or two each. In the end I’ll choose one or two to continue expanding while I learn better Rails testing skills. These lightning apps are simple and intentionally leave out a lot of features, but I’m still trying to do something new in each one.

New things I did in this app

The technical challenge

Unlike my first lightning app, which was nothing more than a simple interface to an API, this second app posed a real challenge: how to retrieve Wikipedia articles based on category preferences? You’d think it would be straightforward, just a matter of an API call to get a random article within a set of given categories. But it is not so simple, for two reasons.

First, the Wikipedia API has endpoints for retrieving a specific page and a random page, but nothing in between. The random page in category tool fills the gap somewhat, but it is slower than the API. There is an API endpoint for getting an article’s categories (e.g. https://en.wikipedia.org/w/api.php?format=json&action=query&prop=categories&titles=Chartwell&clshow=!hidden&cllimit=100), which might have been useful, except there’s the other problem…

Wikipedia’s categories are a mess. An article’s categories are usually very specific and therefore useless, and the only way to get more general categories for an article is to traverse the category graph upward. But since the graph is not a tree or even a DAG, you would need to come up with a complicated and time-consuming algorithm to do this. Here’s someone’s valiant attempt. No, thanks.

The surprising answer to this conundrum was not far away—just below that same StackOverflow answer, in fact. Evidently, Wikimedia uses AI to analyze each article and guess at its general categories, using a different taxonomy than Wikipedia’s categories. These category predictions are conveniently accessible via an API, and they are a crucial part of how Wiki Stumble works. When the user presses “Next article”, here’s what the app does:

  1. Get several random articles over the Wikipedia REST API. The exact number of requested articles varies, depending on whether the app finds a good match soon or not. (Also, if the user has chosen to see only Good or Featured articles, then the app takes the extra step of getting article URLs using the “random page in category” tool.)
  2. Get these articles’ category predictions via the ORES API.
  3. Choose a best match by comparing each article’s predicted categories to the user’s category preferences, which the user has previously expressed either by choosing some starter categories, or by giving feedback (thumbs up or down) to previously recommended articles, or both.
  4. Show the best-matching article to the user.
  5. If the user gives a thumbs up or down to the article, then adjust the user’s category preferences with a +1 or -1 to the article’s predicted categories.

This is a lot of back-and-forth using several APIs, but still the app is reasonably fast, and certainly a lot faster than using the “random page” links, with their frequent duds that you’re not interested in.

The verdict: will I continue work on this app?

Yes, I will. I abandoned my first lightning app because after the proof-of-concept stage I didn’t have a strong sense of where to take it next, or what value it would provide other than pure entertainment that can already be had on other similar sites. Plus, even that value was provided not by my app, but by the GPT-3 API for which my app was merely a convenient interface—and in the long run I would have to pay to use that API. But this time, I’m solving a problem in a way that no one else has done (as far as I know), and the app provides clear value in improved access to knowledge, all while using Wikipedia’s free API.

Also, this time I more clearly see what features need to be added on to this minimal proof of concept. Just to name a few:

But first, on to the third (and final) lightning app!

👉 Next: A "Pass the Story" collaborative writing game 👈 Previous: An AI story writer 🚀 Back to top