Semantic Web Analysis of Big Data in Arabic, French, and English
Project Information
Maintenance status: Early Development
Development Status: Under Active Development
Project Versions: 2.0
Project in development since: February 2011
Description
USC’s Annenberg Innovation Lab has hired R-Shief Project Lead Laila Shereen Sakr as a research assistant to work with IBM’s LanguageWare tools in order to extend their semantic content analysis into Arabic and French.
This project led by USC's Annenberg Innovation Lab is producing a set of Semantic Analyses on the 2011 Arab Spring by creating a data model that will provide lexical analyses for Arabic-only sites, English-only sites, and Arabic-English sites based on language specific set of rules and dictionaries that are applied. This will enable the Annenberg Innovation Lab and R-Shief to perform significant semantic content analysis on Arabic language on the web.
Using R-Shief’s Twitter hashtag archive, Sakr has determined that the majority of tweets are in Arabic. Without being able to pre-process Arabic language content in its native language, sentiment and semantic analyses cannot be properly performed. Rather, users will only be able to provide the usual content analyses on not-so-usual material. Our collaborative sentiment and semantic analyses will be a great contribution to the scholars, artists and journalists using R-Shief's Twitter data. The end result will be a series of visualizations and extensive semantic analysis of the rapidly unfolding current events in the Middle East.
|