README.md 2.09 KB
Newer Older
Alberto Pascual's avatar
Alberto Pascual committed
1
2
3
4
5
6
7
This repository has a demo for SOMEDI project. In this demo we are going to track Restaurantes Lateral brand on social media.

Installation
============

Clone this repository and change to lateral-demo branch:
	
8
    git clone https://lab.gsi.upm.es/sefarad/dashboard-somedi.git
Alberto Pascual's avatar
Alberto Pascual committed
9
10
11
12
13
14
15
16
17
18

Build and run the docker images:

	docker-compose up --build

NOTE: This command may require sudo

Loading backup data
===================

19
20
21
If you already have a list of analyzed posts you want to import into elasticsearch, you can do so with the loadBackup.py script.
For convenience, this script is included in the gsicrawler container.
You just need to copy your backup file (a jsonlines file named `lateralbackup.jsons`) in the gsicrawler folder, and run this command:
Alberto Pascual's avatar
Alberto Pascual committed
22

23
24
25
```
docker-compose exec gsicrawler python loadBackup.py
```
Alberto Pascual's avatar
Alberto Pascual committed
26
27


28
29
Services
========
Alberto Pascual's avatar
Alberto Pascual committed
30

31
This demo includes four services:
Alberto Pascual's avatar
Alberto Pascual committed
32

33
34
35
36
37
38
39
40
* Luigi orchestrator (gsicrawler). This service provides both the task executer and a web interface to check your workflows status.
The tasks are executed periodically according to the period in `crontasks.py`.
By default, the period is 24h.
The web interface shows the status of the tasks, and it is available on http://localhost:8082
* Elasticsearch: the official elasticsearch image. It is available on localhost:19200
* Senpy, used for sentiment and semantic analysis. It is mapped to http://localhost:5000/
* Somedi dashboard (sefarad), a website developed with sefarad. It displays the data stored in elasticsearch.
It is available on http://localhost:8080.
Alberto Pascual's avatar
Alberto Pascual committed
41

42
43
The docker-compose definition adds all these services to the same network, so they can communicate with each other using the service name, without exposing external ports.
The endpoints used in each service (e.g. the elasticsearch endpoint in the gsicrawler service) are configurable through environment variables.
J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
44
45
46
47
48
49
50
51
52
53
54


Troubleshooting
===============

Elasticsearch may crash on startup and complain about vm.max_heap_count.
This will solve it temporarily, until the next boot:

```
sudo sysctl -w vm.max_map_count=262144 
```
55
56

If you want to make this permanent, set the value in your `/etc/sysctl.conf`.