According to Alexa (An Amazon Analytics Company), R-Shief's top regional traffic is from Palestine, Jordan, Egypt, Saudi Arabia, Iran
R-Shief Labs, LLC is a virtual lab that collects and analyzes content from the Internet using swarm computing generated analytics. We provide real-time analysis of opinion about late-breaking issues in the Arab world. By using aggregate data from Twitter and the Web, R-Shief can dissect how people in Egypt are reacting to the latest changes to the constitutional process; how Libyans perceive the presence of NATO forces; how Bahrainis perceive the presence of Saudi military; and how pro-regime supporters in Syria are using social media platforms. The organization's goal is to provide tools and services for innovative research, publication, and cultural production for a global networked audience.
Our tools have been aggregating an archive of content from the Internet in Arabic and English since 2008. As the revolutions in North Africa and Middle East occurred, R-Shief.s technology was immediately employed to capacity. Today, using swarm intelligence within cloud computing infrastructure, R-Shief Labs provides one of the most comprehensive and publicly accessible repositories on the Arab Revolutions of 2011. As of March 2012, R-Shief's Twitter harvesting tool, Twitterminer has collected around 2.6 billion tweets (about 3,600 tweets/second), and the Web Aggregator has of petabytes of data from Facebook, blogs, and other sites.
A Los Angeles-based company, R-Shief Labs, LLC licenses its tools from its sister company, R-Shief, Inc, a non-profit entity represented by Palo Alto-based law firm Fish & Richardson, LLP for in-kind patenting services. Initially designed for Arabic and English audiences by media artist and critic, VJ Um Amel, the site has grown to address multilingual audiences with petabytes of social media. R-Shief Labs is currently supported with cloud computing and hosting services by Open Source Solutions, LLC.
Fig. 2 This data visualization was created by VJ Um Amel and featured in Science in September 30, 2011.
Miller, Greg. "Social Scientists Wade into the Twitter Stream," in Science, Vol. 333, September 30, 2011. pp. 1814-1815.
"One of the largest repositories of Arabic-language tweets is a database started by Laila Shereen Sakr, an Egyptian-born graduate student in cinematic arts at the University of Southern California in Los Angeles. Shereen Sakr says the project originally sprang from an activist impulse to make sure the voices of Arabic speakers were heard. But she's grown increasingly interested in the research potential. She's found intriguing spikes in certain hashtags, the terms used to flag a topic on Twitter, preceding the fall of Zawiya and Tripoli in Libya, for example. Shereen Sakr hopes the R-Shief become a hub for researchers. I would love for people in other disciplines to take this data and make something of it." she says.
U.S. Assistant Secretary Rosemary Gottemoeller "From the Manhattan Project to the Cloud: Arms Control in the Information Age" address at Stanford University October 27, 2011.
"Laila Shereen Sakr, a PhD Candidate at the University of Southern California, followed the Arab Spring closely, creating a massive database of Arabic-language tweets. Instead of selecting terms herself and searching the database, Sakr let a computer program aggregate data and identify patterns. While aggregating tweets from Libya, her program identified spikes in certain hashtags or selected key words. These word spikes became a sort of pulse, an early warning identifying the fall of the town of Zawiya. A short while later, similar words spikes reappeared allowing Sakr to identify the impending fall of Tripoli. She was accurate to within a few hours."
R-SHIEF explores the world of digitally born data using two innovative methods: Swarm Computing + Cultural Analytics
Swarm Intelligence is an innovative distributed intelligent paradigm for solving optimization problems that originally took its inspiration from the biological examples by swarming, flocking and herding phenomena in vertebrates. Data Mining is an analytic process designed to explore large amounts of data in search of consistent patterns and/or systematic relationships between variables, and then to validate the findings by applying the detected patterns to new subsets of data. Historically the notion of finding useful patterns in data has been given a variety of names including data mining, knowledge discovery, information extraction, etc. Data Mining is an analytic process designed to explore large amounts of data in search of consistent patterns and/or systematic relationships between variables, and then to validate the findings by applying the detected patterns to new subsets of data. In order to achieve this, data mining uses computational techniques from statistics, machine learning and pattern recognition.
Data mining and Swarm intelligence may seem that they do not have many properties in common. However, recent studies suggests that they can be used together for several real world data mining problems especially when other methods would be too expensive or difficult to implement. Our approach to data mining social media involves allowing the computer program to aggregate data and identify patterns in real-time. Our swarm computing system gets smarter as it works by building its own lexicon.
Cultural Analytics is a new approach applicable when studying culture produced at such large, computational scales and authored by many people. Alongside various ethnographic and film studies methodologies, cultural analytics provide new methods and intuitive visual techniques to address both the new and existing research questions, which currently drive the humanities. While others are focused on trends and numbers, we are focused and telling you what it all means. We are focused on building tools that translate what people are saying in non-Western languages.
Our main goal is to make real breakthroughs in understanding. What are people saying and doing online? A digital humanities project, R-Shief explores digitally born texts, conversations, and information using both human and computer analytics.
Delivering Large-Scale Analytics in Real-Time
Our computer-written algorithms enable us to analyze sets of data with millions of rows and columns in real-time. This is an extremely different process/speed from when algorithms are written by human beings. Our method is to review these algorithmically generated data sets with expert, human eyes, who then comment and attribute the computer generated analytics.
Semantic Analysis in ArabicAs far as we know, we are the only company producing semantic social media analysis in non-Western languages. In June we will begin a project to build an open-source, crowdsourcing platform for Arabic speakers of various dialects add attribution tags to R-Shief's Arabic language content.
Dataset is 99% of Twitter's Public API by HashtagWhereas many social media analytics companies pay for access to Twitter's pipe and analyzed data, R-Shief's tools have mapped our way around Twitter's many rules and restrictions so that we are collecting about 99% from Twitter's public API. We are not paying any private company. It is important to note that R-Shief is not only focused on building tools for a trendy platform like Twitter. It is just that we have received some hype around our Twitter analytics.
Data ExplorationWe are not just looking at just numbers; we are not data mining facts. Rather, we are exploring a world were information is born digital -- this type of data exploration is a type of cultural analytics, rooted firmly as digital humanities project. R-Shief seeks to offer the lab and its tools for critical and interventionist work in the world of "data" on the Middle East or what human beings produce online about the Middle East. Outcomes of our research, so far, have varied from written publications, to real-time data, to 3D interactive environments.
R-SHIEF TOOLS + SERVICES
*R-SHIEF Tools and Services are developed by R-Shief, Inc. and filed for provisional patenting.
Twitter Live Graphs
These live Twitter graphs filter through over 1,000 hashtags, while aggregating about 3,600/second. At the top of the page find a filter with which users can select R-Shief's Twitter archive by date range, hashtag, key words, by day/hour/minute level of granularity.
Fig. 2 This 3D real-time graph is updated every minute and reflects top trending hashtags in R-Shief's archive over the past 24 hours.
Or analyses are derived from a real-time language analytics system that detects the language of tweets character by character and organizes each the content by day, hour, and minute. These language analytics give you character count, word count, tweet count, top 50 words with correlating occurrence #s, hashtag counts, retweet counts, @tag counts, and sentiment.
Fig. 3 Real-time word cloud analysis on NATO discussions of #drones in Pakistan in November and December 2011.
Fig. 4 Real-time tweet count of #drones in Pakistan in November and December 2011.
Sentiment Analysis aims to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document. The attitude may be his or her judgement or evaluation, affective state, or the intended emotional communication. The rise of social media such as blogs and social networks has fueled interest in sentiment analysis. With the proliferation of reviews, ratings, recommendations and other forms of online expression, online opinion has turned into a kind of virtual currency for businesses looking to market their products, identify new opportunities and manage their reputations. One step R-SHIEF has accomplished towards this aim is accomplished in research; building a lexicon of words & languages, then applying a numerical value. The database below represents a preview of our Lexicon of Words, Languages, and their corresponding Alphabet to achieve Semantic Analysis.
Fig. 5 This 3D real-time graph is updated every minute and reflects top trending hashtags in R-Shief's archive over the past 24 hours.
In addition to counting and analyzing the data structure from Twitter, R-SHIEF also analyzes the semantic nature of the words in teh tweets themselves. Each row of tweets will have a correlating array of attributes such as language(s), good 1-5, bad 1-5, acronym, meme, first name, last name, place, thing, etc. Using swarm computing and crowdsourcing platform, R-SHIEF is able to arbitrarily generate that data from the tweet data for various time scopes so it could actively maintain one for the last 10 minutes, the last hour, the last 6 hours, the last 12 hours, the last 24 hours, and forever.
Fig. 6 Screenshot of R-Shief PHP/MYSQL tables.
Facebook scraping is the process of automatically collecting information from the platform. It is a field with active developments sharing a common goal with the semantic web vision, an ambitious initiative that still requires breakthroughs in text processing, semantic understanding, artificial intelligence and human-computer interactions. R-SHIEF currently collects and analyzes the following metrics on selected Facebook public pages: the start date ( when they were first set up), evolution of the number of followers, measure of the degree of activity on the page, a compilation of the most prominent of these groups (eg. the top 25 in terms of number of followers.)
Fig. 7 Screenshot of data harvesting from Facebook public page We are all Khaled Said.
Network analysis is a method for studying communication and socio-technical networks within a formal organization. It is a quantitative descriptive technique for creating statistical and graphical models of the people, tasks, groups, knowledge and resources of organizational systems. It is based on social network theory and more specifically, dynamic network analysis -- based on a set of actors (such as individuals or organizations) and the dyadic ties between these actors (such as relationships, connections, or interactions). R-SHIEF'S approach to network analysis involves various data visualization and digital humanities techniques.
Fig. 8 Topic modeling finds highly collocated words and arranges them into groups. What you are seeing on the attached list are the 100 "topics" that make up the Egyptian Twitter traffic. The actual topics have more words than these, this only includes the top 20. One of the issues with topic modeling is assigning the correct number of topics, and 100 may be too many or too few for 600,000 tweets. We can also choose to eliminate certain terms, such as #Egypt. Another note: each Tweet is made up of several topics, such that we can look at how different topics come into prominence over time.
Fig. 9 This is what is known as a bipartite graph meaning that we are showing two types of nodes: tweets and Twitter users. A user is connected to all tweets they have sent and to all tweets directed at them (through the @username aspect of a tweet). Each image is an Ego Network of a particular user, showing the tweets that they have sent, any users connected to those tweets, and any tweets that those users have sent. Tweets are color-coded by timing such that the earliest tweets are lighter and the darker tweets are later (relative to the dataset of 600,000 tweets with #Egypt). The green tweets represents tweets in Arabic and the blue ones are in English.