...
 
Commits (11)
[submodule "readthedocs/gsicrawler"]
path = readthedocs/gsicrawler
url = http://lab.cluster.gsi.dit.upm.es/gsicrawler/gsicrawler.git
[submodule "readthedocs/senpy"]
path = readthedocs/senpy
url = http://lab.cluster.gsi.dit.upm.es/senpy/senpy.git
[submodule "readthedocs/orchestrator"]
path = readthedocs/orchestrator
url = http://lab.cluster.gsi.dit.upm.es/social/orchestrator.git
[submodule "gsicrawler"]
path = gsicrawler
url = http://lab.cluster.gsi.dit.upm.es/gsicrawler/gsicrawler.git
[submodule "senpy"]
path = senpy
url = http://lab.cluster.gsi.dit.upm.es/senpy/senpy.git
[submodule "orchestrator"]
path = orchestrator
url = http://lab.cluster.gsi.dit.upm.es/social/orchestrator.git
......@@ -96,6 +96,13 @@ pygments_style = 'sphinx'
#
html_theme = 'alabaster'
html_theme_options = {
'logo': 'soneti_logo.png',
'github_user': 'gsi-upm',
'github_repo': 'soneti',
'github_banner': True,
}
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
......
=============
Configuration
=============
This configuration for the orchestraror is based on the SOMEDI project. In this use example we are going to track Restaurantes Lateral brand on social media.
We are going to describe this example in different incremental phases.
I. Use GSICrawler to get tweets and Facebook posts from official accounts
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This phase gets tweets and Facebook posts from official accounts and shows results printed.
Below is the detailed part of the task located on `somedi-usecase/workflow.py`.
.. sourcecode:: python
class ScrapyTask(GSICrawlerScraper):
query = luigi.Parameter()
id = luigi.Parameter()
number = luigi.Parameter()
source = luigi.Parameter()
host = 'http://gsicrawler:5000/api/v1'
def output(self):
return luigi.LocalTarget(path='/tmp/_scrapy-%s.json' % self.id)
As shown in the code we select as endpoint our GSICrawler demo service and other parameters are going to be given by command line.
Run the orchestrator's workflow to retrieve the 10 latests tweets:
.. sourcecode:: bash
$ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow ScrapyTask --query rest_lateral --number 10 --source twitter --id 1
Now run the orchestrator's workflow to retrieve the 10 latests facebook posts, the query must be the official account name on Facebook without @:
.. sourcecode:: bash
$ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow ScrapyTask --query restauranteslateral --number 10 --source facebook --id 2
Below you can see the services involved in this phase:
.. figure:: figures/tutorial/tutorial1.png
:alt: Tutorial phase 1
II. Analyse collected tweets and Facebook posts with Senpy
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This phase improve the previous one adding analysis with Senpy.
Below is the detailed part of the task located on `somedi-usecase/workflow.py`.
.. sourcecode:: python
class AnalysisTask(SenpyAnalysis):
query = luigi.Parameter()
id = luigi.Parameter()
number =luigi.Parameter()
source = luigi.Parameter()
host = 'http://senpy:5000/api/'
algorithm = luigi.Parameter()
lang = luigi.Parameter()
def requires(self):
return ScrapyTask(self.id,self.query,self.number,self.source)
def output(self):
return luigi.LocalTarget(path='/tmp/analysed%s.json'%self.id)
As shown in the code we select as endpoint our Senpy service and other parameters are going to be given by command line.
You must select what Senpy's algorithm and language are going to be used in the analysis.
Run again the orchestrator's workflow using sentiment140 plugin in spanish:
.. sourcecode:: bash
$ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow AnalysisTask --query restauranteslateral --number 10 --source facebook --algorithm sentiment140 --lang es --id 3
.. sourcecode:: bash
$ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow AnalysisTask --query rest_lateral --number 10 --source twitter --algorithm sentiment140 --lang es --id 4
Below you can see the services involved in this phase:
.. figure:: figures/tutorial/tutorial2.png
:alt: Tutorial phase 2
III. Store collected and analysed tweets on Fuseki and Elasticsearch
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This phase improve the previous one adding a persistence layer to store results.
Below is the detailed part of the task located on `somedi-usecase/workflow.py`.
.. sourcecode:: python
class FusekiTask(CopyToFuseki):
id = luigi.Parameter()
query = luigi.Parameter()
number = luigi.Parameter()
source = luigi.Parameter()
algorithm = luigi.Parameter()
lang = luigi.Parameter()
host = 'fuseki'
port = 3030
def requires(self):
return AnalysisTask(self.id,self.query,self.number,self.source)
def output(self):
return luigi.LocalTarget(path='/tmp/_n3-%s.json' % self.id)
class ElasticsearchTask(CopyToIndex):
id = luigi.Parameter()
query = luigi.Parameter()
number = luigi.Parameter()
source = luigi.Parameter()
algorithm = luigi.Parameter()
lang = luigi.Parameter()
index = 'somedi'
doc_type = 'lateral'
host = 'elasticsearch'
port = 9200
timeout = 100
def requires(self):
return AnalysisTask(self.id,self.query,self.number,self.source)
class StoreTask(luigi.Task):
id = luigi.Parameter()
query = luigi.Parameter()
number = luigi.Parameter()
source = luigi.Parameter()
algorithm = luigi.Parameter()
lang = luigi.Parameter()
def requires(self):
yield FusekiTask(self.id, self.query, self.number)
yield Elasticsearch(self.id, self.query, self.number)
Run again the orchestrator's workflow:
.. sourcecode:: bash
$ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow StoreTask --query restauranteslateral --number 10 --source facebook --algorithm sentiment140 --lang es --id 5
$ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow StoreTask --query rest_lateral --number 10 --source twitter --algorithm sentiment140 --lang es --id 6
Below you can see the services involved in this phase:
.. figure:: figures/tutorial/tutorial3.png
:alt: Tutorial phase 3
Now your data is available on elasticsearch and fuseki.
IV. Show stored data in a Sefarad dashboard
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Open a web browser and navigate to Sefarad environment on http://localhost:8080. This intectactive dashboard shows tweets and Facebook posts collected and analysed on the previous phase. We can distinguish between posts created by the official account and replies.
V. Use GSICrawler to track direct competitors
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This phase track other food restaurants chains. In this example we will track 100 Montaditos. We modify our orchestrator's workflow parameters and run it again:
.. sourcecode:: bash
$ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow StoreTask --query 100MontaditosSpain --number 10 --source facebook --algorithm sentiment140 --lang es --id 7
$ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow StoreTask --query 100montaditos --number 10 --source twitter --algorithm sentiment140 --lang es --id 8
Sefarad dashboard now is updated with new analysed data talking about 100 Montaditos.
\ No newline at end of file
docs/figures/soneti.png

32.6 KB | W: | H:

docs/figures/soneti.png

44.2 KB | W: | H:

docs/figures/soneti.png
docs/figures/soneti.png
docs/figures/soneti.png
docs/figures/soneti.png
  • 2-up
  • Swipe
  • Onion skin
......@@ -6,9 +6,9 @@
Welcome to Soneti's documentation!
==================================
**Soneti** is a toolkit for **analyzing social media**, such as social networks (e.g. Twitter, Facebook, ...), blogs, YouTube, Newspapers, AppStores, etc..
**Soneti** is a toolkit for **analyzing social media**, such as social networks (e.g. Twitter, Facebook, ...), blogs, YouTube, Newspapers, AppStores, etc..
It obtains data from different sources, in addition it enriches this obtained data by performing different types of automatic analysis. Finally, it allows us to visualize the data obtained in interactive dashboards.
Soneti provide services to obtain data from different web services, perfom different of analysis in order to enrich extracted data and visualize this analysis results in interactive dashboards.
.. figure:: figures/soneti.png
:alt: Soneti overview
......@@ -18,8 +18,9 @@ It obtains data from different sources, in addition it enriches this obtained da
:caption: Contents:
what-is-soneti
installation
usecases
installation
configuration
conventions
.. uses-soneti
......
......@@ -19,6 +19,10 @@ Now images are ready to run:
This installation offers a basic version of each service:
.. figure:: figures/quickstart.png
:alt: Soneti demo overview
* **GSICrawler:** This ingestion service demo has CNN, New York Times, ElMundo, Facebook and Twitter as possible sources. This service is available on http://localhost:5000
* **Senpy**: This analysis service demo has sentiment140 as sentiment analysis plugin and EmoRand as emotion analysis plugin. This service is available on http://localhost:8000/
......
......@@ -7,166 +7,47 @@ In this documentation we are going to show some uses of Soneti toolkit.
SOMEDI: Social Media and Digital Interaction Intelligence
---------------------------------------------------------
This use case is part of the SOMEDI project. In this use case we are going to track Restaurantes Lateral brand on social media.
.. figure:: logos/somedi.png
:alt: Soneti logo
:align: center
We are going to describe this use case in different incremental phases.
I. Use GSICrawler to get tweets and Facebook posts from official accounts
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This project uses Soneti toolkit to analyse social media data about different brands.
This phase gets tweets and Facebook posts from official accounts and shows results printed.
SOMEDI tries to solve the challenge of efficiently generating and utilising social media and digital interaction data enabled intelligence.
Below is the detailed part of the task located on `somedi-usecase/workflow.py`.
GSICrawler service in this project is getting data talking about a certain brand from social networks, such as Twitter and Facebook. Secondly, we enrich this data using Senpy. We use an entity detector to know what is being talked about in order to later know if the opinion generated about that entity is positive or negative. Last step is to visualize results in a Sefarad dashboard that resumes opinion and trends.
.. sourcecode:: python
This project provides the opportunity to extract and use this information to better serve and engage users and audiences.
class ScrapyTask(GSICrawlerScraper):
query = luigi.Parameter()
id = luigi.Parameter()
number = luigi.Parameter()
source = luigi.Parameter()
host = 'http://gsicrawler:5000/api/v1'
.. figure:: figures/somedi1.png
:alt: Somedi demo
:align: center
def output(self):
return luigi.LocalTarget(path='/tmp/_scrapy-%s.json' % self.id)
.. figure:: figures/somedi2.png
:alt: Somedi demo
:align: center
As shown in the code we select as endpoint our GSICrawler demo service and other parameters are going to be given by command line.
TRIVALENT: Terrorism pReventIon Via rAdicaLisation countEr-NarraTive
--------------------------------------------------------------------
Run the orchestrator's workflow to retrieve the 10 latests tweets:
.. figure:: logos/trivalent.jpg
:alt: Trivalent logo
:width: 200px
:align: center
.. sourcecode:: bash
This project uses Soneti toolkit to detect potential radicalism messages among social media.
$ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow ScrapyTask --query rest_lateral --number 10 --source twitter --id 1
GSICrawler service is in charge of extracting information from several web sources under the news of social media categories. Currently, the available newspapers are CNN News, The New York Times and AlJazeera. Additionally, it is also possible to extract information from PDF sources such as Dabiq Magazine, which has been the official Daesh propaganda magazine for years.
Now run the orchestrator's workflow to retrieve the 10 latests facebook posts, the query must be the official account name on Facebook without @:
Furthermore, Senpy plugins provide added value services for data analysis tasks, easing their implementation thanks to Senpy architecture. Each plugin has an entry and a semantically annotated output useful for linked data processes. TRIVALENT project uses two Senpy plugins:
.. sourcecode:: bash
$ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow ScrapyTask --query restauranteslateral --number 10 --source facebook --id 2
II. Analyse collected tweets and Facebook posts with Senpy
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This phase improve the previous one adding analysis with Senpy.
Below is the detailed part of the task located on `somedi-usecase/workflow.py`.
.. sourcecode:: python
class AnalysisTask(SenpyAnalysis):
query = luigi.Parameter()
id = luigi.Parameter()
number =luigi.Parameter()
source = luigi.Parameter()
host = 'http://senpy:5000/api/'
algorithm = luigi.Parameter()
lang = luigi.Parameter()
def requires(self):
return ScrapyTask(self.id,self.query,self.number,self.source)
def output(self):
return luigi.LocalTarget(path='/tmp/analysed%s.json'%self.id)
As shown in the code we select as endpoint our Senpy service and other parameters are going to be given by command line.
You must select what Senpy's algorithm and language are going to be used in the analysis.
Run again the orchestrator's workflow using sentiment140 plugin in spanish:
.. sourcecode:: bash
$ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow AnalysisTask --query restauranteslateral --number 10 --source facebook --algorithm sentiment140 --lang es --id 3
.. sourcecode:: bash
$ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow AnalysisTask --query rest_lateral --number 10 --source twitter --algorithm sentiment140 --lang es --id 4
III. Store collected and analysed tweets on Fuseki and Elasticsearch
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This phase improve the previous one adding a persistence layer to store results.
Below is the detailed part of the task located on `somedi-usecase/workflow.py`.
.. sourcecode:: python
class FusekiTask(CopyToFuseki):
id = luigi.Parameter()
query = luigi.Parameter()
number = luigi.Parameter()
source = luigi.Parameter()
algorithm = luigi.Parameter()
lang = luigi.Parameter()
host = 'fuseki'
port = 3030
def requires(self):
return AnalysisTask(self.id,self.query,self.number,self.source)
def output(self):
return luigi.LocalTarget(path='/tmp/_n3-%s.json' % self.id)
class ElasticsearchTask(CopyToIndex):
id = luigi.Parameter()
query = luigi.Parameter()
number = luigi.Parameter()
source = luigi.Parameter()
algorithm = luigi.Parameter()
lang = luigi.Parameter()
index = 'somedi'
doc_type = 'lateral'
host = 'elasticsearch'
port = 9200
timeout = 100
def requires(self):
return AnalysisTask(self.id,self.query,self.number,self.source)
class StoreTask(luigi.Task):
id = luigi.Parameter()
query = luigi.Parameter()
number = luigi.Parameter()
source = luigi.Parameter()
algorithm = luigi.Parameter()
lang = luigi.Parameter()
def requires(self):
yield FusekiTask(self.id, self.query, self.number)
yield Elasticsearch(self.id, self.query, self.number)
Run again the orchestrator's workflow:
.. sourcecode:: bash
$ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow StoreTask --query restauranteslateral --number 10 --source facebook --algorithm sentiment140 --lang es --id 5
$ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow StoreTask --query rest_lateral --number 10 --source twitter --algorithm sentiment140 --lang es --id 6
Now your data is available on elasticsearch and fuseki.
IV. Show stored data in a Sefarad dashboard
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Open a web browser and navigate to Sefarad environment on http://localhost:8080. This intectactive dashboard shows tweets and Facebook posts collected and analysed on the previous phase. We can distinguish between posts created by the official account and replies.
V. Use GSICrawler to track direct competitors
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This phase track other food restaurants chains. In this example we will track 100 Montaditos. We modify our orchestrator's workflow parameters and run it again:
.. sourcecode:: bash
$ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow StoreTask --query 100MontaditosSpain --number 10 --source facebook --algorithm sentiment140 --lang es --id 7
$ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow StoreTask --query 100montaditos --number 10 --source twitter --algorithm sentiment140 --lang es --id 7
Sefarad dashboard now is updated with new analysed data talking about 100 Montaditos.
* **Translator plugin + COGITO plugin**: This analysis takes as input GSICrawler written in a source language (e.g. Arabic), translates it into a target language (e.g English) and extracts information such as people, places and organizations mentioned on it following linked data annotation principles.
.. figure:: figures/crawler1.png
:alt: Trivalent demo
:align: center
.. figure:: figures/crawler2.png
:alt: Trivalent demo
:align: center
\ No newline at end of file
orchestrator @ bff06cc1
Subproject commit bff06cc179745c434e63d06f7a7650a2aa4b6fa1
docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow StoreTask --query restauranteslateral --number 10 --source facebook --algorithm sentiment140 --lang es --id 31 --doc-type lateral
docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow StoreTask --query "@rest_lateral" --number 50 --source twitter --algorithm sentiment140 --lang es --id 32 --doc-type lateral
docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow StoreTask --query 100MontaditosSpain --number 10 --source facebook --algorithm sentiment140 --lang es --id 33 --doc-type montaditos
docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow StoreTask --query "@100montaditos" --number 50 --source twitter --algorithm sentiment140 --lang es --id 34 --doc-type montaditos
docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow StoreTask --query "restaurantes lateral" --number 50 --source tripadvisor --algorithm sentiment140 --lang es --id 35 --doc-type lateral
docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow StoreTask --query "100 montaditos" --number 50 --source tripadvisor --algorithm sentiment140 --lang es --id 36 --doc-type montaditos
docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow StoreTask --query "Cervecería La Sureña" --number 50 --source tripadvisor --algorithm sentiment140 --lang es --id 37 --doc-type surena
docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow StoreTask --query "Lizarran" --number 50 --source tripadvisor --algorithm sentiment140 --lang es --id 38 --doc-type lizarran
docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow StoreTask --query "cerveceria.la.surena" --number 10 --source facebook --algorithm sentiment140 --lang es --id 39 --doc-type surena
docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow StoreTask --query "@La__Surena" --number 50 --source twitter --algorithm sentiment140 --lang es --id 310 --doc-type surena
docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow StoreTask --query "LizarranBar" --number 10 --source facebook --algorithm sentiment140 --lang es --id 311 --doc-type lizarran
docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow StoreTask --query "@lizarranbar" --number 50 --source twitter --algorithm sentiment140 --lang es --id 312 --doc-type lizarran
......@@ -40,7 +40,9 @@
"leaflet": "^1.0.3",
"Leaflet.EasyButton": "^2.2.0",
"leaflet.markercluster": "^1.0.5",
"jqcloud2": "jqcloud2#2.0.3"
"jqcloud2": "jqcloud2#2.0.3",
"paper-checkbox": "paper-checkbox#1.1.1",
"paper-listbox": "paper-listbox#*"
},
"resolutions": {
"webcomponentsjs": "0.7.24",
......
......@@ -109,7 +109,10 @@
color: #080;
}
iron-pages {
padding: 48px 62px 10px;
padding: 48px 62px 10px;
max-width: 1600px;
margin: 0 auto;
}
#barsentiment > paper-material{
height: 261px;
......
......@@ -11,6 +11,8 @@
<link rel="import" href="/bower_components/iron-pages/iron-pages.html">
<link rel="import" href="/bower_components/paper-item/paper-item.html">
<link rel="import" href="/bower_components/paper-menu/paper-menu.html">
<link rel="import" href="/bower_components/paper-dropdown-menu/paper-dropdown-menu.html">
<link rel="import" href="/bower_components/paper-checkbox/paper-checkbox.html">
<link rel="import" href="/elements/entities-chart/entities-chart.html">
<link rel="import" href="/elements/news-chart/news-chart.html">
<link rel="import" href="/elements/wordcloud/wordcloud.html">
......@@ -32,8 +34,8 @@
<img style="width: 30%; margin:0 auto;"src="/images/somedi.png">
</div>
<paper-tabs selected="{{selected}}">
<paper-tab>Dashboard</paper-tab>
<!-- <paper-tab>Sparql Editor</paper-tab> -->
<paper-tab>Brand Overview</paper-tab>
<paper-tab>Competitors Comparision</paper-tab>
<paper-tab>About</paper-tab>
</paper-tabs>
......@@ -55,8 +57,6 @@
title="Tweets"
data="{{data}}">
</number-chart>
</div>
<div class="col-md-6">
<number-chart
data="{{data}}"
stylebg="bg-facebook"
......@@ -67,6 +67,29 @@
icon="/images/facebook-white.png"
subtitle="Total elements">
</number-chart>
<number-chart
data="{{data}}"
stylebg="bg-tripadvisor"
object="Tripadvisor"
aggKey="schema:author"
title="Tripadvisor reviews"
filters="{{filters}}"
icon="/images/facebook-white.png"
subtitle="Total elements">
</number-chart>
</div>
<div class="col-md-6">
<google-chart
field="schema:author"
data="{{data}}"
id='barsentiment'
extra-id='1'
type='pie'
filters="{{filters}}"
icon='social:people'
options='{"title": "Sources"}'
cols='[{"label": "Sentiment", "type": "string"},{"label": "Count", "type": "number"}]'>
</google-chart>
</div>
</div>
<br>
......@@ -81,21 +104,11 @@
filters="{{filters}}"
icon='social:mood'
options='{"title": "Sentiments"}'
cols='[{"label": "Sentiment", "type": "string"},{"label": "Count", "type": "number"}]'
cols='[{"label": "Sentiment", "type": "string"},{"label": "Count", "type": "number"}]'>
</google-chart>
</div>
<div class="col-md-6">
<google-chart
field="schema:author"
data="{{data}}"
id='barsentiment'
extra-id='1'
type='pie'
filters="{{filters}}"
icon='social:people'
options='{"title": "Sources"}'
cols='[{"label": "Sentiment", "type": "string"},{"label": "Count", "type": "number"}]'
</google-chart>
</div>
</div>
<br>
......@@ -178,12 +191,28 @@
</div>
<!--
<div>
<yasgui-ui
endpoint="http:///localhost:3030/default/query"
queries="{{queries}}"
</yasgui-ui>
<yasgui-ui
endpoint="http:///${FUSEKI_ENDPOINT_EXTERNAL}/default/query"
queries="{{queries}}"
</yasgui-ui>
</div>
-->
<div>
<paper-dropdown-menu label="Competitors">
<paper-listbox slot="dropdown-content" class="dropdown-content">
<paper-item><paper-checkbox>100 Montaditos</paper-checkbox></paper-item>
<paper-item>brontosaurus</paper-item>
<paper-item>carcharodontosaurus</paper-item>
<paper-item>diplodocus</paper-item>
</paper-listbox>
</paper-dropdown-menu>
</div>
<div>
This demo shows Soneti functionalities in the context of the project Somedi. The presented demo is the result of the ingest of the crawled online data and posterior analysis and semantic augmentation of said data. The resulting information is stored in both ElastiSearch and Fuseki databases, which allows the user to consult the data is a variety of forms, as showed.
</div>
......@@ -258,7 +287,7 @@
_clientChanged: function() {
//console.log("ClientChanged");
ready = true;
this._query();
this._querylateral();
},
getPeople: function(data){
var people = []
......@@ -324,7 +353,7 @@
}
else {
this.push('filters', {terms: {'schema:articleBody': [this.query]}});
this._ESsearch()
this._query()
}
},
_filtersChange: function() {
......@@ -332,137 +361,15 @@
console.log("default search fired")
console.log("filters: "+this.filters);
console.log("query: "+this.query);
this._query()
this._querylateral()
}
else {
console.log(this.query);
console.log(this.filters);
this._query()
this._querylateral()
}
},
_ESsearch: function() {
console.log("_ESsearch")
var that = this;
//console.log("Ready?: ", ready);
if(ready){
this.client.search({
// undocumented params are appended to the query string
index: "somedi",
body: {
size: 500,
query: {
multi_match:{
query: this.query,
fields: ['schema:author', 'schema:headline', 'marl:hasPolarity', 'schema:about', 'entities.nif:anchorOf']
}
},
sort:{'schema:datePublished':{order: "asc"}},
aggs: {
type: {
terms: {
field: "@type.keyword",
order: {
_count: "desc"
}
}
},
'schema:author': {
terms: {
field: "schema:author.keyword",
order: {
_count: "desc"
}
}
},
'schema:creator': {
terms: {
field: "schema:creator.keyword",
order: {
_count: "desc"
}
}
},
'comments.data.from.name': {
terms: {
field: "comments.data.from.name.keyword",
order: {
_count: "desc"
}
}
},
'entities.rdfs:subClassOf': {
terms: {
field: "entities.rdfs:subClassOf.keyword",
size: 20,
order: {
_count: "desc"
}
}
},
'schema:search': {
terms: {
field: "schema:search.keyword",
size: 20,
order: {
_count: "desc"
}
}
},
'topics.rdfs:subClassOf': {
terms: {
field: "topics.rdfs:subClassOf.keyword",
size: 20,
order: {
_count: "desc"
}
}
},
'sentiments.marl:hasPolarity': {
terms: {
field: "sentiments.marl:hasPolarity.keyword",
size: 20,
order: {
_count: "desc"
}
}
},
sentiment: {
terms: {
field: "sentiments.marl:hasPolarity.keyword",
order: {
_count: "desc"
}
}
},
emotion: {
terms: {
field: "emotions.onyx:hasEmotion.onyx:hasEmotionCategory.keyword",
order: {
_count: "desc"
}
}
},
'schema:datePublished': {
date_histogram : {
field : "schema:datePublished",
format: "dd-MM-yyyy",
interval : "month"
}
}
}
}
}).then(function (resp) {
var myids = []
resp.hits.hits.forEach(function(entry){myids.push(entry._id)})
that.ids = myids;
//console.log(that.ids)
that.data = resp;
//console.log(that.data);
});
}
},
_query: function() {
_querylateral: function() {
//console.log("_query")
var that = this;
//console.log("Ready?: ", ready);
......@@ -470,6 +377,7 @@
this.client.search({
// undocumented params are appended to the query string
index: "somedi",
type: "lateral",
body: {
size: 500,
query: {
......@@ -477,7 +385,7 @@
must: this.filters,
}
},
sort:{'schema:datePublished':{order: "asc"}},
sort:{'schema:datePublished':{order: "desc"}},
aggs: {
type: {
terms: {
......
<link rel="import" href="/bower_components/polymer/polymer.html">
<link rel="import" href="/bower_components/iron-icons/iron-icons.html">
<link rel="import" href="/bower_components/paper-checkbox/paper-checkbox.html">
<link rel="import" href="/bower_components/paper-item/paper-item.html">
<dom-module id="competitors-selector">
<template>
<paper-material elevation="1">
<div class="row">
<div class="col-md-2"><paper-item><strong>Select Competitors:</strong></paper-item></div>
<div class="col-md-2"><paper-item><paper-checkbox id="montaditos" on-change="_selectionChange">100 Montaditos</paper-checkbox></paper-item></div>
<div class="col-md-2"><paper-item><paper-checkbox id="surena"on-change="_selectionChange">La Sureña</paper-checkbox></paper-item></div>
<div class="col-md-2"><paper-item><paper-checkbox id="lizarran" on-change="_selectionChange">Lizarrán</paper-checkbox></paper-item></div>
</div>
</paper-material>
</template>
<script>
Polymer({
is: 'competitors-selector',
properties: {
competitors: {
type: Array,
notify: true,
value: function() { return []; }
}
},
_selectionChange: function(event){
var that = this;
//console.log(event.target.id,event.target.active);
if (event.target.active) this.push('competitors', event.target.id)
else if (!event.target.active){
var i = this.competitors.indexOf(event.target.id);
if(i != -1) {
this.splice('competitors', i, 1);
}
}
//console.log(this.competitors)
}
});
</script>
</dom-module>
......@@ -545,12 +545,12 @@ Data can be provided in one of three ways:
datos[fechacompleta]['propio'] = datos[fechacompleta]['propio'] + 1
} else datos[fechacompleta]['propio'] = 1
}
else if(entry._source['schema:creator'] != ('rest_lateral' || 'restauranteslateral')){
else if(entry._source['schema:creator'] != ('rest_lateral' || 'restauranteslateral' || '100montaditos' || '100MontaditosSpain')){
if (datos[fechacompleta].hasOwnProperty('otro')){
datos[fechacompleta]['otro'] = datos[fechacompleta]['otro'] + 1
} else datos[fechacompleta]['otro'] = 1
}
if(entry._source['schema:author'] == 'facebook'){
if(entry._source['schema:creator'] == 'restauranteslateral'){
if (datos[fechacompleta].hasOwnProperty('otro')){
//console.log( entry._source['comments']['data'].length)
datos[fechacompleta]['otro'] = datos[fechacompleta]['otro'] + entry._source['comments']['data'].length
......@@ -570,12 +570,12 @@ Data can be provided in one of three ways:
datos[fechacompleta]['propio'] = datos[fechacompleta]['propio'] + 1
} else datos[fechacompleta]['propio'] = 1
}
else if(entry._source['schema:creator'] != ('rest_lateral' || 'restauranteslateral')){
else if(entry._source['schema:creator'] != ('rest_lateral' || 'restauranteslateral' || '100montaditos' || '100MontaditosSpain')){
if (datos[fechacompleta].hasOwnProperty('otro')){
datos[fechacompleta]['otro'] = datos[fechacompleta]['otro'] + 1
} else datos[fechacompleta]['otro'] = 1
}
if(entry._source['schema:author'] == 'facebook'){
if(entry._source['schema:creator'] == 'restauranteslateral'){
if (datos[fechacompleta].hasOwnProperty('otro')){
//console.log( entry._source['comments']['data'].length)
datos[fechacompleta]['otro'] = datos[fechacompleta]['otro'] + entry._source['comments']['data'].length
......
......@@ -160,7 +160,7 @@
if (that.type == 'official'){
if((entry._source['schema:creator'] == 'rest_lateral') || (entry._source['schema:creator'] == 'restauranteslateral')) results.push(entry._source);
} else {
if((entry._source['schema:creator'] != 'rest_lateral') && (entry._source['schema:creator'] != 'restauranteslateral')) results.push(entry._source);
if((entry._source['schema:creator'] != 'rest_lateral') && (entry._source['schema:creator'] != 'restauranteslateral') && (entry._source['schema:creator'] != '100montaditos') && (entry._source['schema:creator'] != '100MontaditosSpain')) results.push(entry._source);
}
});
//console.log(results)
......@@ -206,6 +206,8 @@
checkSource: function(source) {
if(source.indexOf("twitter") > -1 || source.indexOf("twitter") > -1 )
return "/images/twitter.png"
if(source.indexOf("tripadvisor") > -1 || source.indexOf("Tripadvisor") > -1 )
return "/images/tripadvisor.png"
else
return "/images/facebook.png"
},
......
......@@ -14,11 +14,14 @@
background-color: #00c0ef !important;
color: white;
}
.bg-green {
color: white;
background-color: #00a65a !important;
}
.bg-tripadvisor {
color: white;
background-color: #00a680 !important;
}
.bg-facebook{
color: white;
background-color: #4467B1 !important;
......
......@@ -77,7 +77,7 @@
},
addfilter: function(){
//console.log("filter add "+this.object)
var object = this.object;
var object = this.object.toLowerCase();
var aggkey = this.aggkey;
//console.log(aggkey)
this.push('filters', {terms: { 'schema:author' : [object]}})
......
......@@ -84,7 +84,7 @@ Polymer({
var words = []
hits.forEach(function(entry) {
//console.log(entry)
if (entry.key != 'restauranteslateral' && entry.key != 'rest_lateral'){
if (entry.key != 'restauranteslateral' && entry.key != 'rest_lateral' && entry.key != '100montaditos' && entry.key != '100MontaditosSpain'){
words.push({'text': entry.key, 'weight' : entry.doc_count})
}
});
......
......@@ -2,7 +2,7 @@
# Copy bower dependencies when using -v $PWD/:/usr/src/app
envsubst < /usr/src/app/dashboard-somedi.env.html > /usr/src/app/dashboard-somedi.html || exit 1;
#envsubst < /usr/src/app/dashboard-somedi.env.html > /usr/src/app/dashboard-somedi.html || exit 1;
if [ -f /.dockerenv ]; then
......
# ---
# apiVersion: v1
# kind: ConfigMap
# metadata:
# name: somedi-crawler
# data:
# ES_ENDPOINT: "sefarad-es"
# ES_PORT: "9200"
# TWITTER_CONSUMER_KEY: "aXyLS0LMM69OBfpslDNe4oZxL"
# TWITTER_CONSUMER_SECRET: "nWJIjYoyWT3vm282CDiEGzIQ1ZlSTF2IwPFnoTbUXih95u2fY8"
# TWITTER_ACCESS_TOKEN: "377869454-OB1GQt1ycK5EGrdgHYoHdmJ8WbgTciezCKHXjzH2"
# TWITTER_ACCESS_TOKEN_SECRET: "qP5kXmi85SFvasv7RdECVwYc0gpch19mMJkISTcvjKc4x"
# FUSEKI_ENDPOINT: "sefarad-fuseki"
# FUSEKI_PORT: "3030"
# API_KEY_MEANING_CLOUD: "02079c6e4e97d85f3fa483cd42180042"
# ---
# apiVersion: extensions/v1beta1
# kind: Deployment
# metadata:
# name: somedi-crawler
# spec:
# replicas: 1
# template:
# metadata:
# labels:
# role: somedi-luigi
# app: somedi-luigi
# spec:
# containers:
# - name: gsicrawler
# image: registry.cluster.gsi.dit.upm.es/sefarad/dashboard-somedi/gsicrawler:v1.0.9
# imagePullPolicy: Always
# resources:
# limits:
# memory: "512Mi"
# cpu: "200m"
# ports:
# - name: luigiweb
# containerPort: 8082
# envFrom:
# - configMapRef:
# name: somedi-crawler
# - name: senpy
# image: gsiupm/sentiment-meaningcloud:0.1.7-python3.5
# imagePullPolicy: Always
# resources:
# limits:
# memory: "512Mi"
# cpu: "200m"
# ports:
# - name: senpy
# containerPort: 5000
# envFrom:
# - configMapRef:
# name: somedi-crawler
# ---
apiVersion: v1
kind: ConfigMap
metadata:
name: somedi-web-unstable
data:
ES_ENDPOINT_EXTERNAL: "sefarad-elasticsearch.cluster.gsi.dit.upm.es"
FUSEKI_ENDPOINT_EXTERNAL: "sefarad-fuseki.cluster.gsi.dit.upm.es"
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: somedi-web-unstable
spec:
replicas: 1
template:
metadata:
labels:
role: somedi-web-unstable
app: somedi-web.unstable
spec:
containers:
- name: somedi
image: registry.cluster.gsi.dit.upm.es/sefarad/dashboard-somedi/web:unstable
imagePullPolicy: Always
resources:
limits:
memory: "512Mi"
cpu: "500m"
ports:
- name: web
containerPort: 8080
envFrom:
- configMapRef:
name: somedi-web-unstable
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: somedi-web-unstable
annotations:
ingress.kubernetes.io/rewrite-target: /
spec:
rules:
# - host: somedi.cluster.gsi.dit.upm.es
# http:
# paths:
# - path: /
# backend:
# serviceName: somedi-web
# servicePort: 8080
- host: unstable.somedi.cluster.gsi.dit.upm.es
http:
paths:
- path: /
backend:
serviceName: somedi-web-unstable
servicePort: 8080
---
apiVersion: v1
kind: Service
metadata:
name: somedi-web-unstable
spec:
type: ClusterIP
ports:
- port: 8080
protocol: TCP
selector:
role: somedi-web-unstable
This diff is collapsed.
import luigi
from luigi.contrib.esindex import CopyToIndex
from orchestrator.SenpyAnalysis import SenpyAnalysis
from orchestrator.GSICrawlerScraper import GSICrawlerScraper
from orchestrator.CopyToFuseki import CopyToFuseki
from soneti_tasks.SenpyAnalysis import SenpyAnalysis
from soneti_tasks.GSICrawlerScraper import GSICrawlerScraper
from soneti_tasks.CopyToFuseki import CopyToFuseki
class ScrapyTask(GSICrawlerScraper):
......@@ -59,7 +59,7 @@ class ElasticsearchTask(CopyToIndex):
algorithm = luigi.Parameter()
lang = luigi.Parameter()
index = 'somedi'
doc_type = 'lateral'
doc_type = luigi.Parameter()
host = 'elasticsearch'
port = 9200
timeout = 100
......@@ -75,7 +75,9 @@ class StoreTask(luigi.Task):
source = luigi.Parameter()
algorithm = luigi.Parameter()
lang = luigi.Parameter()
doc_type = luigi.Parameter()
def requires(self):
yield FusekiTask(self.id, self.query, self.number, self.source, self.algorithm,self.lang)
yield ElasticsearchTask(self.id, self.query, self.number, self.source, self.algorithm,self.lang)
\ No newline at end of file
yield ElasticsearchTask(self.id, self.query, self.number, self.source, self.algorithm,self.lang,self.doc_type)
\ No newline at end of file
#!/bin/sh
curl -XPUT http://localhost:9200/somedi-unstable/_mapping/texts?update_all_types -d '
{
"properties": {
"schema:articleBody": {
"type": "text",
"fielddata": true
}
}
}'
curl -XPUT http://localhost:9200/somedi -d '
{
"settings": {
"analysis": {
"filter": {
"my_stop": {
"type": "stop",
"stopwords": "_english_"
}
}
}
}
}'