This repository has a demo for SOMEDI project. In this demo we are going to track Restaurantes Lateral brand on social media. Installation ============ Clone this repository and change to lateral-demo branch: git clone https://lab.gsi.upm.es/social/somedi-lateral.git Build and run the docker images: docker-compose up --build NOTE: This command may require sudo Loading backup data =================== If you already have a list of analyzed posts you want to import into elasticsearch, you can do so with the loadBackup.py script. For convenience, this script is included in the gsicrawler container. You just need to copy your backup file (a jsonlines file named `backup.jsons`) in the orchestrator folder, and run this command: ``` docker-compose exec orchestrator python loadBackup.py ``` Services ======== This demo includes the following services: * Luigi orchestrator (orchestrator). This service provides both the task executer and a web interface to check your workflows status. The tasks are executed periodically according to the period in `tasks.py:Main`. By default, the period is 24h. The web interface shows the status of the tasks, and it is available on http://localhost:8082 * GSICrawler: a service to get data from different sources (e.g. Twitter, Facebook) (available on http://localhost:5000 and http://localhost:5555 (flower)). * Elasticsearch: the official elasticsearch image. It is available on localhost:19200 * Senpy, used for sentiment and semantic analysis. It is mapped to http://localhost:5000/ * Somedi dashboard (sefarad), a website developed with sefarad. It displays the data stored in elasticsearch. It is available on http://localhost:8080. * Redis: a dependency of GSICrawler. The docker-compose definition adds all these services to the same network, so they can communicate with each other using the service name, without exposing external ports. The endpoints used in each service (e.g. the elasticsearch endpoint in the gsicrawler service) are configurable through environment variables. Troubleshooting =============== Elasticsearch may crash on startup and complain about vm.max_heap_count. This will solve it temporarily, until the next boot: ``` sudo sysctl -w vm.max_map_count=262144 ``` If you want to make this permanent, set the value in your `/etc/sysctl.conf`.