The main subject of this site is a text research application I call, for now, Deep Probe.
The purpose of Deep Probe is somewhat similar to typical search engines, but its operation is different. You enter your query, but instead of a list of places that seem to be relevant, you get a list of main recurring themes.
For example, if you ask Deep Probe about particular headphones, you may see as the main themes:
- many mentions of how bass-heavy they are
- complaints about them fitting on a specific shape or size of a head, but being uncomfortable or coming off for differently built people
- an ongoing flamewar on design, which according to some is “stuck in the 1990s” (it may be a good thing depending on your tastes)
Or, querying about a book:
- a cluster of stories about devouring the book through the night, not being able to put it down etc.
- people having strong opinions on the moody protagonist who likes to complain regularly
- praises of another character discovering herself
- various takes on, apparently, excessively fancy language
Keep in mind that these are not some random bits from the first blog you found on Google, or three top reviews on Amazon. These are big currents in the flow of what people say on the Internet (or at least in the corpus that Deep Probe is using). You get actual sentences or fragments assorted from many sources. You can see, at a glance, what is said the most often, and investigate further if you wish.
I have been thinking about some kind of “text digesting machine” for a long time now. What fascinated me most in linguistics and language processing have revolved around understanding texts (and also, assembling things in systemic ways).
While there exist many solutions that do some similar job, text summarizers for example, they don’t provide the broad analysis I am talking about. And I think we should have better tools under known science of language and computation.
As of middle September 2019, I finally have an embarassingly hacky way of performing theme analysis with some code ran from the command line (or from inside of Emacs, to be precise). Although very primitive, this is an important beginning. It shows that the overall thinking makes sense technologically, and now needs to be much more developed and packaged in some kind of a sane interface.
So here I will document my journey onwards – while hopefully noting interesting developments in linguistics and natural language processing.