configuration.rst 6.75 KB
Newer Older
Alberto Pascual's avatar
Alberto Pascual committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
=============
Configuration
=============

This configuration for the orchestraror is based on the SOMEDI project. In this use example we are going to track Restaurantes Lateral brand on social media.

We are going to describe this example in different incremental phases.

I. Use GSICrawler to get tweets and Facebook posts from official accounts
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This phase gets tweets and Facebook posts from official accounts and shows results printed.

Below is the detailed part of the task located on `somedi-usecase/workflow.py`.

.. sourcecode:: python 

    class ScrapyTask(GSICrawlerScraper):
        query = luigi.Parameter()
        id = luigi.Parameter()
        number = luigi.Parameter()
        source = luigi.Parameter()
        host = 'http://gsicrawler:5000/api/v1'

        def output(self):
            return luigi.LocalTarget(path='/tmp/_scrapy-%s.json' % self.id)

As shown in the code we select as endpoint our GSICrawler demo service and other parameters are going to be given by command line.

Run the orchestrator's workflow to retrieve the 10 latests tweets:

.. sourcecode:: bash 

    $ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow ScrapyTask --query rest_lateral --number 10 --source twitter --id 1

Now run the orchestrator's workflow to retrieve the 10 latests facebook posts, the query must be the official account name on Facebook without @:

.. sourcecode:: bash 

    $ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow ScrapyTask --query restauranteslateral --number 10 --source facebook --id 2

Alberto Pascual's avatar
Alberto Pascual committed
42
43
44
45
46
Below you can see the services involved in this phase:

.. figure:: figures/tutorial/tutorial1.png
   :alt: Tutorial phase 1

Alberto Pascual's avatar
Alberto Pascual committed
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86

II. Analyse collected tweets and Facebook posts with Senpy
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This phase improve the previous one adding analysis with Senpy.

Below is the detailed part of the task located on `somedi-usecase/workflow.py`.

.. sourcecode:: python 

    class AnalysisTask(SenpyAnalysis):

       query = luigi.Parameter()
       id = luigi.Parameter()
       number =luigi.Parameter()
       source = luigi.Parameter()
       host = 'http://senpy:5000/api/'
       algorithm = luigi.Parameter()
       lang = luigi.Parameter()
     
       def requires(self):
           return ScrapyTask(self.id,self.query,self.number,self.source)
     
       def output(self):
           return luigi.LocalTarget(path='/tmp/analysed%s.json'%self.id)

As shown in the code we select as endpoint our Senpy service and other parameters are going to be given by command line.

You must select what Senpy's algorithm and language are going to be used in the analysis.

Run again the orchestrator's workflow using sentiment140 plugin in spanish:

.. sourcecode:: bash 

    $ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow AnalysisTask --query restauranteslateral --number 10 --source facebook --algorithm sentiment140 --lang es --id 3

.. sourcecode:: bash 

    $ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow AnalysisTask --query rest_lateral --number 10 --source twitter --algorithm sentiment140 --lang es --id 4

Alberto Pascual's avatar
Alberto Pascual committed
87
88
89
90
91
Below you can see the services involved in this phase:

.. figure:: figures/tutorial/tutorial2.png
   :alt: Tutorial phase 2

Alberto Pascual's avatar
Alberto Pascual committed
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
III. Store collected and analysed tweets on Fuseki and Elasticsearch
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This phase improve the previous one adding a persistence layer to store results. 

Below is the detailed part of the task located on `somedi-usecase/workflow.py`.

.. sourcecode:: python 

    class FusekiTask(CopyToFuseki):
        
        id = luigi.Parameter()
        query = luigi.Parameter()
        number = luigi.Parameter()
        source = luigi.Parameter()
        algorithm = luigi.Parameter()
        lang = luigi.Parameter()
        host = 'fuseki'
        port = 3030

        def requires(self):
            return AnalysisTask(self.id,self.query,self.number,self.source)
            
        def output(self):
            return luigi.LocalTarget(path='/tmp/_n3-%s.json' % self.id)

    class ElasticsearchTask(CopyToIndex):
        
        id = luigi.Parameter()
        query = luigi.Parameter()
        number = luigi.Parameter()
        source = luigi.Parameter()
        algorithm = luigi.Parameter()
        lang = luigi.Parameter()
        index = 'somedi'
        doc_type = 'lateral'
        host = 'elasticsearch'
        port = 9200
        timeout = 100

        def requires(self):
            return AnalysisTask(self.id,self.query,self.number,self.source)

    class StoreTask(luigi.Task):

        id = luigi.Parameter()
        query = luigi.Parameter()
        number = luigi.Parameter()
        source = luigi.Parameter()
        algorithm = luigi.Parameter()
        lang = luigi.Parameter()

        def requires(self):
            yield FusekiTask(self.id, self.query, self.number)
            yield Elasticsearch(self.id, self.query, self.number)

Run again the orchestrator's workflow:

.. sourcecode:: bash 
    
    $ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow StoreTask --query restauranteslateral --number 10 --source facebook --algorithm sentiment140 --lang es --id 5

    $ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow StoreTask --query rest_lateral --number 10 --source twitter --algorithm sentiment140 --lang es --id 6

Alberto Pascual's avatar
Alberto Pascual committed
156
157
158
159
160
Below you can see the services involved in this phase:

.. figure:: figures/tutorial/tutorial3.png
   :alt: Tutorial phase 3

Alberto Pascual's avatar
Alberto Pascual committed
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
Now your data is available on elasticsearch and fuseki.

IV. Show stored data in a Sefarad dashboard
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Open a web browser and navigate to Sefarad environment on http://localhost:8080. This intectactive dashboard shows tweets and Facebook posts collected and analysed on the previous phase. We can distinguish between posts created by the official account and replies.

V. Use GSICrawler to track direct competitors
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This phase track other food restaurants chains. In this example we will track 100 Montaditos. We modify our orchestrator's workflow parameters and run it again:

.. sourcecode:: bash 
    
    $ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow StoreTask --query 100MontaditosSpain --number 10 --source facebook --algorithm sentiment140 --lang es --id 7

    $ docker-compose exec orchestrator python -m luigi --module somedi-usecase.workflow StoreTask --query 100montaditos --number 10 --source twitter --algorithm sentiment140 --lang es --id 8

Sefarad dashboard now is updated with new analysed data talking about 100 Montaditos.