Commit 47a1d755 authored by Alberto Pascual's avatar Alberto Pascual Committed by J. Fernando Sánchez

docs improved

parent 532aba3c
=============
Configuration
=============
This configuration for the orchestraror is based on the SOMEDI project. In this use example we are going to track Restaurantes Lateral brand on social media.
We are going to describe this example in different incremental phases.
I. Use GSICrawler to get tweets and Facebook posts from official accounts
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This phase gets tweets and Facebook posts from official accounts and shows results printed.
Below is the detailed part of the task located on `somedi-usecase/workflow.py`.
.. sourcecode:: python
class ScrapyTask(GSICrawlerScraper):
query = luigi.Parameter()
id = luigi.Parameter()
number = luigi.Parameter()
source = luigi.Parameter()
host = 'http://gsicrawler:5000/api/v1'
def output(self):
return luigi.LocalTarget(path='/tmp/_scrapy-%s.json' % self.id)
As shown in the code we select as endpoint our GSICrawler demo service and other parameters are going to be given by command line.
Run the orchestrator's workflow to retrieve the 10 latests tweets:
.. sourcecode:: bash
$ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow ScrapyTask --query rest_lateral --number 10 --source twitter --id 1
Now run the orchestrator's workflow to retrieve the 10 latests facebook posts, the query must be the official account name on Facebook without @:
.. sourcecode:: bash
$ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow ScrapyTask --query restauranteslateral --number 10 --source facebook --id 2
II. Analyse collected tweets and Facebook posts with Senpy
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This phase improve the previous one adding analysis with Senpy.
Below is the detailed part of the task located on `somedi-usecase/workflow.py`.
.. sourcecode:: python
class AnalysisTask(SenpyAnalysis):
query = luigi.Parameter()
id = luigi.Parameter()
number =luigi.Parameter()
source = luigi.Parameter()
host = 'http://senpy:5000/api/'
algorithm = luigi.Parameter()
lang = luigi.Parameter()
def requires(self):
return ScrapyTask(self.id,self.query,self.number,self.source)
def output(self):
return luigi.LocalTarget(path='/tmp/analysed%s.json'%self.id)
As shown in the code we select as endpoint our Senpy service and other parameters are going to be given by command line.
You must select what Senpy's algorithm and language are going to be used in the analysis.
Run again the orchestrator's workflow using sentiment140 plugin in spanish:
.. sourcecode:: bash
$ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow AnalysisTask --query restauranteslateral --number 10 --source facebook --algorithm sentiment140 --lang es --id 3
.. sourcecode:: bash
$ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow AnalysisTask --query rest_lateral --number 10 --source twitter --algorithm sentiment140 --lang es --id 4
III. Store collected and analysed tweets on Fuseki and Elasticsearch
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This phase improve the previous one adding a persistence layer to store results.
Below is the detailed part of the task located on `somedi-usecase/workflow.py`.
.. sourcecode:: python
class FusekiTask(CopyToFuseki):
id = luigi.Parameter()
query = luigi.Parameter()
number = luigi.Parameter()
source = luigi.Parameter()
algorithm = luigi.Parameter()
lang = luigi.Parameter()
host = 'fuseki'
port = 3030
def requires(self):
return AnalysisTask(self.id,self.query,self.number,self.source)
def output(self):
return luigi.LocalTarget(path='/tmp/_n3-%s.json' % self.id)
class ElasticsearchTask(CopyToIndex):
id = luigi.Parameter()
query = luigi.Parameter()
number = luigi.Parameter()
source = luigi.Parameter()
algorithm = luigi.Parameter()
lang = luigi.Parameter()
index = 'somedi'
doc_type = 'lateral'
host = 'elasticsearch'
port = 9200
timeout = 100
def requires(self):
return AnalysisTask(self.id,self.query,self.number,self.source)
class StoreTask(luigi.Task):
id = luigi.Parameter()
query = luigi.Parameter()
number = luigi.Parameter()
source = luigi.Parameter()
algorithm = luigi.Parameter()
lang = luigi.Parameter()
def requires(self):
yield FusekiTask(self.id, self.query, self.number)
yield Elasticsearch(self.id, self.query, self.number)
Run again the orchestrator's workflow:
.. sourcecode:: bash
$ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow StoreTask --query restauranteslateral --number 10 --source facebook --algorithm sentiment140 --lang es --id 5
$ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow StoreTask --query rest_lateral --number 10 --source twitter --algorithm sentiment140 --lang es --id 6
Now your data is available on elasticsearch and fuseki.
IV. Show stored data in a Sefarad dashboard
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Open a web browser and navigate to Sefarad environment on http://localhost:8080. This intectactive dashboard shows tweets and Facebook posts collected and analysed on the previous phase. We can distinguish between posts created by the official account and replies.
V. Use GSICrawler to track direct competitors
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This phase track other food restaurants chains. In this example we will track 100 Montaditos. We modify our orchestrator's workflow parameters and run it again:
.. sourcecode:: bash
$ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow StoreTask --query 100MontaditosSpain --number 10 --source facebook --algorithm sentiment140 --lang es --id 7
$ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow StoreTask --query 100montaditos --number 10 --source twitter --algorithm sentiment140 --lang es --id 8
Sefarad dashboard now is updated with new analysed data talking about 100 Montaditos.
\ No newline at end of file
docs/figures/soneti.png

51.8 KB | W: | H:

docs/figures/soneti.png

44.2 KB | W: | H:

docs/figures/soneti.png
docs/figures/soneti.png
docs/figures/soneti.png
docs/figures/soneti.png
  • 2-up
  • Swipe
  • Onion skin
......@@ -6,9 +6,9 @@
Welcome to Soneti's documentation!
==================================
**Soneti** is a toolkit for **analyzing social media**, such as social networks (e.g. Twitter, Facebook, ...), blogs, YouTube, Newspapers, AppStores, etc..
**Soneti** is a toolkit for **analyzing social media**, such as social networks (e.g. Twitter, Facebook, ...), blogs, YouTube, Newspapers, AppStores, etc..
It obtains data from different sources, in addition it enriches this obtained data by performing different types of automatic analysis. Finally, it allows us to visualize the data obtained in interactive dashboards.
Soneti provide services to obtain data from different web services, perfom different of analysis in order to enrich extracted data and visualize this analysis results in interactive dashboards.
.. figure:: figures/soneti.png
:alt: Soneti overview
......
......@@ -19,6 +19,10 @@ Now images are ready to run:
This installation offers a basic version of each service:
.. figure:: figures/quickstart.png
:alt: Soneti demo overview
* **GSICrawler:** This ingestion service demo has CNN, New York Times, ElMundo, Facebook and Twitter as possible sources. This service is available on http://localhost:5000
* **Senpy**: This analysis service demo has sentiment140 as sentiment analysis plugin and EmoRand as emotion analysis plugin. This service is available on http://localhost:8000/
......
......@@ -7,163 +7,22 @@ In this documentation we are going to show some uses of Soneti toolkit.
SOMEDI: Social Media and Digital Interaction Intelligence
---------------------------------------------------------
This use case is part of the SOMEDI project. In this use case we are going to track Restaurantes Lateral brand on social media.
This project uses Soneti toolkit to analyse social media data about different companies. Also uses Artificial Intelligence, Data Analysis and Machine Learning to provide the opportunity to extract and use this information to better serve and engage users and audiences.
We are going to describe this use case in different incremental phases.
SOMEDI tries to solve the challenge of efficiently generating and utilising social media and digital interaction data enabled intelligence.
I. Use GSICrawler to get tweets and Facebook posts from official accounts
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
First step of this project is getting data from social networks, such as Twitter and Facebook, talking about a certain brand using GSICrawler. Secondly, we enrich this data using Senpy. We use an entity detector to know what is being talked about in order to later know if the opinion generated about that entity is positive or negative. Last step is to visualize results in a Sefarad dashboard that resumes opinion and trends.
This phase gets tweets and Facebook posts from official accounts and shows results printed.
Below is the detailed part of the task located on `somedi-usecase/workflow.py`.
.. sourcecode:: python
TRIVALENT: Terrorism pReventIon Via rAdicaLisation countEr-NarraTive
--------------------------------------------------------------------
class ScrapyTask(GSICrawlerScraper):
query = luigi.Parameter()
id = luigi.Parameter()
number = luigi.Parameter()
source = luigi.Parameter()
host = 'http://gsicrawler:5000/api/v1'
This project uses Soneti toolkit to detect potential radicalism messages among social media.
def output(self):
return luigi.LocalTarget(path='/tmp/_scrapy-%s.json' % self.id)
GSICrawler serice is in charge of extracting information from several web sources under the news of social media categories. Currently, the available newspapers are CNN News, The New York Times and AlJazeera. Additionally, it is also possible to extract information from PDF sources such as Dabiq Magazine, which has been the official Daesh propaganda magazine for years.
As shown in the code we select as endpoint our GSICrawler demo service and other parameters are going to be given by command line.
Furthermore, Senpy plugins provide added value services for data analysis tasks, easing their implementation thanks to Senpy architecture. Each plugin has an entry and a semantically annotated output useful for linked data processes. TRIVALENT project uses two Senpy plugins:
Run the orchestrator's workflow to retrieve the 10 latests tweets:
* **Translator plugin + COGITO plugin**: This analysis takes as input GSICrawler written in a source language (e.g. Arabic), translates it into a target language (e.g English) and extracts information such as people, places and organizations mentioned on it following linked data annotation principles.
.. sourcecode:: bash
$ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow ScrapyTask --query rest_lateral --number 10 --source twitter --id 1
Now run the orchestrator's workflow to retrieve the 10 latests facebook posts, the query must be the official account name on Facebook without @:
.. sourcecode:: bash
$ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow ScrapyTask --query restauranteslateral --number 10 --source facebook --id 2
II. Analyse collected tweets and Facebook posts with Senpy
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This phase improve the previous one adding analysis with Senpy.
Below is the detailed part of the task located on `somedi-usecase/workflow.py`.
.. sourcecode:: python
class AnalysisTask(SenpyAnalysis):
query = luigi.Parameter()
id = luigi.Parameter()
number =luigi.Parameter()
source = luigi.Parameter()
host = 'http://senpy:5000/api/'
algorithm = luigi.Parameter()
lang = luigi.Parameter()
def requires(self):
return ScrapyTask(self.id,self.query,self.number,self.source)
def output(self):
return luigi.LocalTarget(path='/tmp/analysed%s.json'%self.id)
As shown in the code we select as endpoint our Senpy service and other parameters are going to be given by command line.
You must select what Senpy's algorithm and language are going to be used in the analysis.
Run again the orchestrator's workflow using sentiment140 plugin in spanish:
.. sourcecode:: bash
$ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow AnalysisTask --query restauranteslateral --number 10 --source facebook --algorithm sentiment140 --lang es --id 3
.. sourcecode:: bash
$ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow AnalysisTask --query rest_lateral --number 10 --source twitter --algorithm sentiment140 --lang es --id 4
III. Store collected and analysed tweets on Fuseki and Elasticsearch
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This phase improve the previous one adding a persistence layer to store results.
Below is the detailed part of the task located on `somedi-usecase/workflow.py`.
.. sourcecode:: python
class FusekiTask(CopyToFuseki):
id = luigi.Parameter()
query = luigi.Parameter()
number = luigi.Parameter()
source = luigi.Parameter()
algorithm = luigi.Parameter()
lang = luigi.Parameter()
host = 'fuseki'
port = 3030
def requires(self):
return AnalysisTask(self.id,self.query,self.number,self.source)
def output(self):
return luigi.LocalTarget(path='/tmp/_n3-%s.json' % self.id)
class ElasticsearchTask(CopyToIndex):
id = luigi.Parameter()
query = luigi.Parameter()
number = luigi.Parameter()
source = luigi.Parameter()
algorithm = luigi.Parameter()
lang = luigi.Parameter()
index = 'somedi'
doc_type = 'lateral'
host = 'elasticsearch'
port = 9200
timeout = 100
def requires(self):
return AnalysisTask(self.id,self.query,self.number,self.source)
class StoreTask(luigi.Task):
id = luigi.Parameter()
query = luigi.Parameter()
number = luigi.Parameter()
source = luigi.Parameter()
algorithm = luigi.Parameter()
lang = luigi.Parameter()
def requires(self):
yield FusekiTask(self.id, self.query, self.number)
yield Elasticsearch(self.id, self.query, self.number)
Run again the orchestrator's workflow:
.. sourcecode:: bash
$ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow StoreTask --query restauranteslateral --number 10 --source facebook --algorithm sentiment140 --lang es --id 5
$ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow StoreTask --query rest_lateral --number 10 --source twitter --algorithm sentiment140 --lang es --id 6
Now your data is available on elasticsearch and fuseki.
IV. Show stored data in a Sefarad dashboard
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Open a web browser and navigate to Sefarad environment on http://localhost:8080. This intectactive dashboard shows tweets and Facebook posts collected and analysed on the previous phase. We can distinguish between posts created by the official account and replies.
V. Use GSICrawler to track direct competitors
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This phase track other food restaurants chains. In this example we will track 100 Montaditos. We modify our orchestrator's workflow parameters and run it again:
.. sourcecode:: bash
$ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow StoreTask --query 100MontaditosSpain --number 10 --source facebook --algorithm sentiment140 --lang es --id 7
$ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow StoreTask --query 100montaditos --number 10 --source twitter --algorithm sentiment140 --lang es --id 8
Sefarad dashboard now is updated with new analysed data talking about 100 Montaditos.
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment