Commit b2f47971 authored by J. Fernando Sánchez's avatar J. Fernando Sánchez

Added pipeline and demo

parent 97cead6b
*.html~
.*
*~
bower_components
[submodule "elements/number-chart"]
path = elements/number-chart
url = ssh://git@lab.cluster.gsi.dit.upm.es:2200/sefarad/number-chart.git
from node:7.10.0
ENV NODE_PATH=/tmp/node_modules APP_NAME=dashboard-tourpedia
# Install dependencies first to use cache
RUN npm install -g http-server bower
ADD bower.json /usr/src/bower.json
RUN cd /usr/src && \
bower install --allow-root
ADD . /usr/src/app
WORKDIR /usr/src/app/
CMD ["/usr/src/app/init.sh"]
`Sefarad dashboard for the tourpedia data.
## Usage
This web component accepts the following parameters:
```html
<dashboard-tourpedia
client="<!-- elasticsearch client -->"
</dashboard-tourpedia>
```
See the `dashboard-tourpedia.html` and `demo/index.html` for more information.
## Installation
This web component is available in bower.
```bash
$ bower install dashboard-tourpedia
```
This command will install it inside `bower_components` folder
## Development
Requirements:
* Docker
* Docker-compose
The docker-compose file can be used to test the component and to develop it.
Simply run:
```
docker compose up
```
And go to http://127.0.0.1:8080/demo/index.html
The docker-compose file mounts the current directory in the docker container, so every change you make to files locally will be reflected immediately in the browser.
If you add new dependencies to the compponent (through the `bower.json` file), you need to either run `bower install` within the container or recreate the image, like so:
```
docker compose up --build
```
Or:
```
docker exec web bower install
```
Note that the component assumes all dependencies are added in `../`.
This is the structure the component will find when installed as a dependency with bower.
To mimic that structure, the `init.sh` script automatically links the bower package.
......@@ -14,14 +14,22 @@
"material-search": "material-search#*",
"leaflet-maps": "leaflet-maps#*",
"paper-tabs": "paper-tabs#*",
"yasgui-polymer": "yasgui-polymer#*",
"reviews-table": "reviews-table#*",
"number-chart": "number-chart#*",
"number-chart": "number-chart#^1.1.1",
"google-apis": "GoogleWebComponents/google-apis#^1.0.0",
"google-chart-elasticsearch": "google-chart-elasticsearch#^1.1.0",
"iron-icons": "PolymerElements/iron-icons#^1.0.0",
"paper-icon-button": "PolymerElements/paper-icon-button#^1.0.0",
"polymer": "Polymer/polymer#^1.1.0"
"polymer": "Polymer/polymer#^1.1.0",
"jquery": "jquery#^2.1.4",
"elasticsearchjs-import": "DigElements/elasticsearchjs-import#~1.0.0",
"elasticjs-import": "DigElements/elasticjs-import#~1.0.1",
"webcomponentsjs": "< 1.0.0"
},
"license": "MIT",
"homepage": "https://lab.cluster.gsi.dit.upm.es/sefarad/dashboard-tourpedia"
"resolutions": {
"webcomponentsjs": "0.7.24"
},
"license": "Apache-2.0",
"homepage": "https://lab.cluster.gsi.dit.upm.es/sefarad/dashboard-tourpedia"
}
This diff is collapsed.
<link rel="import" href="../bower_components/polymer/polymer.html">
<link rel="import" href="../bower_components/elastic-client/elastic-client.html">
<link rel="import" href="../bower_components/dashboard-tourpedia/dashboard-tourpedia.html">
<html>
<head>
<script src="../bower_components/webcomponentsjs/webcomponents-lite.js"></script>
<link rel="import" href="imports.html"></link>
</head>
<body>
<template is="dom-bind">
<elastic-client
config='{"host": "http://localhost:9200"}'
client="{{client}}"
cluster-status="{{myStatus}}">
</elastic-client>
<dashboard-tourpedia
client="{{client}}"></dashboard-tourpedia>
<!-- <button id="databutton" onclick="changedata()">Click to change data</button> -->
<script>
var datas =[
{"hits": {
"total": 20000
},
"aggregations": {
"category": {
"buckets": [
{"key": "myObject", "doc_count": 3000},
{"key": "otherObject", "doc_count": 1000}
]
}
}
},
{"hits": {
"total": 30000
},
"aggregations": {
"category": {
"buckets": [
{"key": "myObject", "doc_count": 1000},
{"key": "otherObject", "doc_count": 4000}
]
}
}
}];
var numdata = 0;
<!-- var nc1 = document.getElementById('demo-chart1'); -->
<!-- var nc2 = document.getElementById('demo-chart2'); -->
<!-- nc1.data = nc2.data = datas[0]; -->
<!-- function changedata(){ -->
<!-- numdata += 1; -->
<!-- nc1.data = nc2.data = datas[numdata%2]; -->
<!-- } -->
</script>
</template>
</body>
</html>
version: '2'
services:
sefarad:
build: .
ports:
- "8080:8080"
volumes:
- .:/usr/src/app
networks:
- sefarad-network
depends_on:
- elasticsearch
elasticsearch:
image: elasticsearch
ports:
- "9200:9200"
- "9300:9300"
volumes:
- ./elasticsearch/nodes:/usr/share/elasticsearch/data/nodes
- ./elasticsearch/config:/usr/share/elasticsearch/config
networks:
- sefarad-network
luigi:
build:
context: luigi/
volumes:
- ./luigi:/usr/src/app
networks:
- sefarad-network
networks:
sefarad-network:
driver: bridge
\ No newline at end of file
# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
# Before you set out to tweak and tune the configuration, make sure you
# understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please see the documentation for further information on configuration options:
# <https://www.elastic.co/guide/en/elasticsearch/reference/5.0/settings.html>
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: sefarad
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1
#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, see the documentation at:
# <https://www.elastic.co/guide/en/elasticsearch/reference/5.0/modules-network.html>
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.zen.ping.unicast.hosts: ["host1", "host2"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of nodes / 2 + 1):
#
#discovery.zen.minimum_master_nodes: 3
#
# For more information, see the documentation at:
# <https://www.elastic.co/guide/en/elasticsearch/reference/5.0/modules-discovery-zen.html>
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, see the documentation at:
# <https://www.elastic.co/guide/en/elasticsearch/reference/5.0/modules-gateway.html>
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true
http.host: 0.0.0.0
http.cors.enabled : true
http.cors.allow-origin : "*"
http.cors.allow-methods: OPTIONS, HEAD, GET, POST, PUT, DELETE
http.cors.allow-headers: X-Requested-With, X-Auth-Token, Content-Type, Content-Length
## JVM configuration
################################################################
## IMPORTANT: JVM heap size
################################################################
##
## You should always set the min and max JVM heap
## size to the same value. For example, to set
## the heap to 4 GB, set:
##
## -Xms4g
## -Xmx4g
##
## See https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html
## for more information
##
################################################################
# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space
-Xms2g
-Xmx2g
################################################################
## Expert settings
################################################################
##
## All settings below this section are considered
## expert settings. Don't tamper with them unless
## you understand what you are doing
##
################################################################
## GC configuration
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
## optimizations
# disable calls to System#gc
-XX:+DisableExplicitGC
# pre-touch memory pages used by the JVM during initialization
-XX:+AlwaysPreTouch
## basic
# force the server VM
-server
# set to headless, just in case
-Djava.awt.headless=true
# ensure UTF-8 encoding by default (e.g. filenames)
-Dfile.encoding=UTF-8
# use our provided JNA always versus the system one
-Djna.nosys=true
# flags to keep Netty from being unsafe
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
# log4j 2
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-Dlog4j.skipJansi=true
## heap dumps
# generate a heap dump when an allocation from the Java heap fails
# heap dumps are created in the working directory of the JVM
-XX:+HeapDumpOnOutOfMemoryError
# specify an alternative path for heap dumps
# ensure the directory exists and has sufficient space
#-XX:HeapDumpPath=${heap.dump.path}
## GC logging
#-XX:+PrintGCDetails
#-XX:+PrintGCTimeStamps
#-XX:+PrintGCDateStamps
#-XX:+PrintClassHistogram
#-XX:+PrintTenuringDistribution
#-XX:+PrintGCApplicationStoppedTime
# log GC status to a file with time stamps
# ensure the directory exists
#-Xloggc:${loggc}
# Elasticsearch 5.0.0 will throw an exception on unquoted field names in JSON.
# If documents were already indexed with unquoted fields in a previous version
# of Elasticsearch, some operations may throw errors.
#
# WARNING: This option will be removed in Elasticsearch 6.0.0 and is provided
# only for migration purposes.
#-Delasticsearch.json.allow_unquoted_field_names=true
status = error
# log action execution errors for easier debugging
logger.action.name = org.elasticsearch.action
logger.action.level = debug
appender.console.type = Console
appender.console.name = console
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] %marker%m%n
appender.rolling.type = RollingFile
appender.rolling.name = rolling
appender.rolling.fileName = ${sys:es.logs}.log
appender.rolling.layout.type = PatternLayout
appender.rolling.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] %marker%.10000m%n
appender.rolling.filePattern = ${sys:es.logs}-%d{yyyy-MM-dd}.log
appender.rolling.policies.type = Policies
appender.rolling.policies.time.type = TimeBasedTriggeringPolicy
appender.rolling.policies.time.interval = 1
appender.rolling.policies.time.modulate = true
rootLogger.level = info
rootLogger.appenderRef.console.ref = console
rootLogger.appenderRef.rolling.ref = rolling
appender.deprecation_rolling.type = RollingFile
appender.deprecation_rolling.name = deprecation_rolling
appender.deprecation_rolling.fileName = ${sys:es.logs}_deprecation.log
appender.deprecation_rolling.layout.type = PatternLayout
appender.deprecation_rolling.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] %marker%.10000m%n
appender.deprecation_rolling.filePattern = ${sys:es.logs}_deprecation-%i.log.gz
appender.deprecation_rolling.policies.type = Policies
appender.deprecation_rolling.policies.size.type = SizeBasedTriggeringPolicy
appender.deprecation_rolling.policies.size.size = 1GB
appender.deprecation_rolling.strategy.type = DefaultRolloverStrategy
appender.deprecation_rolling.strategy.max = 4
logger.deprecation.name = org.elasticsearch.deprecation
logger.deprecation.level = warn
logger.deprecation.appenderRef.deprecation_rolling.ref = deprecation_rolling
logger.deprecation.additivity = false
appender.index_search_slowlog_rolling.type = RollingFile
appender.index_search_slowlog_rolling.name = index_search_slowlog_rolling
appender.index_search_slowlog_rolling.fileName = ${sys:es.logs}_index_search_slowlog.log
appender.index_search_slowlog_rolling.layout.type = PatternLayout
appender.index_search_slowlog_rolling.layout.pattern = [%d{ISO8601}][%-5p][%-25c] %marker%.10000m%n
appender.index_search_slowlog_rolling.filePattern = ${sys:es.logs}_index_search_slowlog-%d{yyyy-MM-dd}.log
appender.index_search_slowlog_rolling.policies.type = Policies
appender.index_search_slowlog_rolling.policies.time.type = TimeBasedTriggeringPolicy
appender.index_search_slowlog_rolling.policies.time.interval = 1
appender.index_search_slowlog_rolling.policies.time.modulate = true
logger.index_search_slowlog_rolling.name = index.search.slowlog
logger.index_search_slowlog_rolling.level = trace
logger.index_search_slowlog_rolling.appenderRef.index_search_slowlog_rolling.ref = index_search_slowlog_rolling
logger.index_search_slowlog_rolling.additivity = false
appender.index_indexing_slowlog_rolling.type = RollingFile
appender.index_indexing_slowlog_rolling.name = index_indexing_slowlog_rolling
appender.index_indexing_slowlog_rolling.fileName = ${sys:es.logs}_index_indexing_slowlog.log
appender.index_indexing_slowlog_rolling.layout.type = PatternLayout
appender.index_indexing_slowlog_rolling.layout.pattern = [%d{ISO8601}][%-5p][%-25c] %marker%.10000m%n
appender.index_indexing_slowlog_rolling.filePattern = ${sys:es.logs}_index_indexing_slowlog-%d{yyyy-MM-dd}.log
appender.index_indexing_slowlog_rolling.policies.type = Policies
appender.index_indexing_slowlog_rolling.policies.time.type = TimeBasedTriggeringPolicy
appender.index_indexing_slowlog_rolling.policies.time.interval = 1
appender.index_indexing_slowlog_rolling.policies.time.modulate = true
logger.index_indexing_slowlog.name = index.indexing.slowlog.index
logger.index_indexing_slowlog.level = trace
logger.index_indexing_slowlog.appenderRef.index_indexing_slowlog_rolling.ref = index_indexing_slowlog_rolling
logger.index_indexing_slowlog.additivity = false
#!/bin/sh
curl -XPUT http://localhost:9200/tourpedia/_mapping/places -d '
{
"properties": {
"location": {
"type": "text",
"fielddata": true
},
"category": {
"type": "text",
"fielddata": true
}
}
}'
\ No newline at end of file
#!/bin/sh
# Copy bower dependencies when using -v $PWD/:/usr/src/app
if [ -f /.dockerenv ]; then
cp -a /usr/src/bower_components /usr/src/app/;
fi
bower link --allow-root
bower link $APP_NAME --allow-root
http-server .
FROM python:3
RUN pip install luigi elasticsearch rdflib requests
WORKDIR /usr/src/app
ADD . /usr/src/app
ENTRYPOINT ["python","-m","luigi"]
This diff is collapsed.
# -*- coding: utf-8 -*-
#
# Copyright 2012-2015 Spotify AB
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
import datetime
import json
import random
import requests
from rdflib import Graph, plugin
from rdflib.serializer import Serializer
import luigi
from luigi.contrib.esindex import CopyToIndex
class FetchDataTask(luigi.Task):
"""
Generates a local file containing 5 elements of data in JSON format.
"""
#: the date parameter.
#date = luigi.DateParameter(default=datetime.date.today())
#field = str(random.randint(0,10000)) + datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")
filename = luigi.Parameter()
def run(self):
"""
Writes data in JSON format into the task's output target.
The data objects have the following attributes:
* `_id` is the default Elasticsearch id field,
* `text`: the text,
* `date`: the day when the data was created.
"""
#today = datetime.date.today()
file = self.filename
with self.output().open('w') as output:
with open(file) as infile:
j = json.load(infile)
for i in j:
i["_id"] = i["id"]
output.write(json.dumps(i))
print(i)
output.write('\n')
def output(self):
"""
Returns the target output for this task.
In this case, a successful execution of this task will create a file on the local filesystem.
:return: the target output for this task.
:rtype: object (:py:class:`luigi.target.Target`)
"""
return luigi.LocalTarget(path='/tmp/_docs-%s.json' % self.filename)
class Elasticsearch(CopyToIndex):
"""
This task loads JSON data contained in a :py:class:`luigi.target.Target` into an ElasticSearch index.
This task's input will the target returned by :py:meth:`~.Senpy.output`.
This class uses :py:meth:`luigi.contrib.esindex.CopyToIndex.run`.
After running this task you can run:
.. code-block:: console
$ curl "localhost:9200/example_index/_search?pretty"
to see the indexed documents.
To see the update log, run
.. code-block:: console
$ curl "localhost:9200/update_log/_search?q=target_index:example_index&pretty"
To cleanup both indexes run:
.. code-block:: console
$ curl -XDELETE "localhost:9200/example_index"
$ curl -XDELETE "localhost:9200/update_log/_query?q=target_index:example_index"
"""
#: date task parameter (default = today)
date = luigi.DateParameter(default=datetime.date.today())
filename = luigi.Parameter()
#: the name of the index in ElasticSearch to be updated.
index = luigi.Parameter()
#: the name of the document type.
doc_type = luigi.Parameter()
#: the host running the ElasticSearch service.
host = 'elasticsearch'
#: the port used by the ElasticSearch service.
port = 9200
def requires(self):
"""
This task's dependencies:
* :py:class:`~.SenpyTask`
:return: object (:py:class:`luigi.task.Task`)
"""
return FetchDataTask(self.filename)
if __name__ == "__main__":
#luigi.run(['--task', 'Elasticsearch'])
luigi.run( )
This diff is collapsed.
# -*- coding: utf-8 -*-
#
# Copyright 2012-2015 Spotify AB
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
</