Truffle Pig: Knwl.JS Finds Data Snippets Automatically
What if important information, such as time or location information, email addresses, phone numbers, links and other data snippets, are hidden in plain text? If you'd want to mine these valuable fragments a lot of manual work would be necessary. Wait. Not anymore. The JavaScript library Knwl.JS can automatically find this information, filter it and make it available for further use. With some creativity, very flexible solutions are possible. Usage is not complicated, so let's give this a spin.
Knwl.JS: Plugins for The Recognition of Different Content
To kick things off, Knwl.js needs to be implemented into the HTML head first. Afterwards, you can search any text passage for particular content. To do that, the text is assigned to the methodKnwlInstance.init()
either directly or as a variable. Afterwards, you need to decide on a plugin that searches the text for certain patterns. One of the plugins is date
which looks for - well - date information.
KnwlInstant.init("Today is December 23rd 2015.");
var output = KnwlInstance.get("date");
In this example, the plugin date
is accessed via KnwlInstance.get()
. It digs through the previously transferred character string, searching for date information and returns all results in JSON format.
var output = [
{
"year": 2015,
"month": 12,
"day": 23,
"preview": "Today is December 23rd 2015.","found": 2
}
]
The JSON character string contains different values depending on the plugin. When searching for a date, the year, month and day are returned in an itemised form. Additionally, the sentence the respective value was found in is transferred via preview
by all plugins. Via found
you'll mine the information in what spot of the text the information was found in.
When more information is found, Knwl.js displays it as individual JSON objects.
Date, Time and Location Information Only in English
Knwl.js only recognises date and time information when this info is available in English. At least for now, other languages are not supported. The same applies for theplace
plugin, which recognises country names in texts.
var output = [
{
"place": "Germany",
"preview": "This is Germany.","found": 2
}
]
Recognising phone numbers in different languages poses a similarly difficult problem. Here, only the English spelling is supported.
Links and Email Addresses Possible in any Language
Although only the English language is supported, it is still possible to use Kwnl.js on texts in other languages - at least concerning links and email addresses.var output = [
{
"link": "http://www.drweb.de/",
"preview": "At the German site http://www.drweb.de/ you can find daily news.","found": 1
}
]
Important when searching for links is, that the respective protocol – „HTTP://“, „HTTPS://“ or „FTP://“ – is given. Email addresses are also recognised reliably.
Natural Language Processing/Parsing interests me a lot and this seems like an awesome library so thanks to Denis. Here’s a direct link to the GitHub: https://github.com/loadfive/Knwl.js