README.md 2.29 KB
Newer Older
Alberto Pascual's avatar
Alberto Pascual committed
1
2
3
4
5
6
7
This repository has a demo for SOMEDI project. In this demo we are going to track Restaurantes Lateral brand on social media.

Installation
============

Clone this repository and change to lateral-demo branch:
	
8
    git clone https://lab.gsi.upm.es/sefarad/dashboard-somedi.git
Alberto Pascual's avatar
Alberto Pascual committed
9
10
11
12
13
14
15
16
17
18

Build and run the docker images:

	docker-compose up --build

NOTE: This command may require sudo

Loading backup data
===================

19
20
If you already have a list of analyzed posts you want to import into elasticsearch, you can do so with the loadBackup.py script.
For convenience, this script is included in the gsicrawler container.
J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
21
You just need to copy your backup file (a jsonlines file named `backup.jsons`) in the gsicrawler folder, and run this command:
Alberto Pascual's avatar
Alberto Pascual committed
22

23
```
24
docker-compose exec orchestrator python loadBackup.py
25
```
Alberto Pascual's avatar
Alberto Pascual committed
26
27


28
29
Services
========
Alberto Pascual's avatar
Alberto Pascual committed
30

J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
31
This demo includes the following services:
Alberto Pascual's avatar
Alberto Pascual committed
32

33
34
* Luigi orchestrator (orchestrator). This service provides both the task executer and a web interface to check your workflows status.
The tasks are executed periodically according to the period in `tasks.py:Main`.
35
36
By default, the period is 24h.
The web interface shows the status of the tasks, and it is available on http://localhost:8082
37
* GSICrawler: a service to get data from different sources (e.g. Twitter, Facebook) (available on http://localhost:5000 and http://localhost:5555 (flower)).
38
39
40
41
* Elasticsearch: the official elasticsearch image. It is available on localhost:19200
* Senpy, used for sentiment and semantic analysis. It is mapped to http://localhost:5000/
* Somedi dashboard (sefarad), a website developed with sefarad. It displays the data stored in elasticsearch.
It is available on http://localhost:8080.
42
* Redis: a dependency of GSICrawler.
Alberto Pascual's avatar
Alberto Pascual committed
43

44
45
The docker-compose definition adds all these services to the same network, so they can communicate with each other using the service name, without exposing external ports.
The endpoints used in each service (e.g. the elasticsearch endpoint in the gsicrawler service) are configurable through environment variables.
J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
46
47
48
49
50
51
52
53
54
55
56


Troubleshooting
===============

Elasticsearch may crash on startup and complain about vm.max_heap_count.
This will solve it temporarily, until the next boot:

```
sudo sysctl -w vm.max_map_count=262144 
```
57
58

If you want to make this permanent, set the value in your `/etc/sysctl.conf`.