Commit 9f6a6f5e authored by J. Fernando Sánchez's avatar J. Fernando Sánchez

Loads of changes!

* Added conversion plugins (API might change!)
* Added conversion to the analysis pipeline
* Changed behaviour of --default-plugins (it adds conversion plugins regardless)
* Added emotionModel [sic] and emotionConversion models

//TODO add conversion tests
//TODO add conversion to docs
parent 3cea7534
......@@ -7,6 +7,9 @@ variables:
DOCKER_DRIVER: overlay
DOCKERFILE: Dockerfile
before_script:
- sh version.sh > senpy/VERSION
stages:
- test
- images
......
from python:3.4-slim
from python:3.4
RUN mkdir /cache/
ENV PIP_CACHE_DIR=/cache/
WORKDIR /usr/src/app
ADD requirements.txt /usr/src/app/
RUN pip install --use-wheel -r requirements.txt
ADD . /usr/src/app/
RUN pip install --use-wheel .
RUN pip install .
VOLUME /data/
......@@ -13,6 +16,6 @@ RUN mkdir /senpy-plugins/
WORKDIR /senpy-plugins/
ONBUILD ADD . /senpy-plugins/
ONBUILD RUN python -m senpy -f /senpy-plugins
ONBUILD RUN python -m senpy --only-install -f /senpy-plugins
ENTRYPOINT ["python", "-m", "senpy", "-f", "/senpy-plugins/", "--host", "0.0.0.0"]
\ No newline at end of file
from python:3.5
FROM python:3.5
RUN mkdir /cache/
ENV PIP_CACHE_DIR=/cache/
......
......@@ -2,13 +2,19 @@ PYVERSIONS=3.5 3.4 2.7
PYMAIN=$(firstword $(PYVERSIONS))
NAME=senpy
REPO=gsiupm
VERSION=$(shell git describe --tags)
TARNAME=$(NAME)-$(subst -,.,$(VERSION)).tar.gz
VERSION=$(shell ./version.sh)
TARNAME=$(NAME)-$(VERSION).tar.gz
IMAGENAME=$(REPO)/$(NAME):$(VERSION)
TEST_COMMAND=gitlab-runner exec docker --cache-dir=/tmp/gitlabrunner --docker-volumes /tmp/gitlabrunner:/tmp/gitlabrunner --env CI_PROJECT_NAME=$(NAME)
all: build run
FORCE:
version: FORCE
@echo $(VERSION) > $(NAME)/VERSION
@echo $(NAME) $(VERSION)
yapf:
yapf -i -r senpy
yapf -i -r tests
......@@ -36,7 +42,13 @@ quick_test: $(addprefix test-,$(PYMAIN))
test: $(addprefix test-,$(PYVERSIONS))
debug-%:
(docker start $(NAME)-debug && docker attach $(NAME)-debug) || docker run -w /usr/src/app/ -v $$PWD:/usr/src/app --entrypoint=/bin/bash -ti --name $(NAME)-debug '$(IMAGENAME)-python$* pip install -r test-requirements.txt'
@docker start $(NAME)-debug || (\
$(MAKE) build-$*; \
docker run -d -w /usr/src/app/ -v $$PWD:/usr/src/app --entrypoint=/bin/bash -p 5000:5000 -ti --name $(NAME)-debug '$(IMAGENAME)-python$*'; \
docker exec -ti $(NAME)-debug pip install -r test-requirements.txt; \
)\
docker attach $(NAME)-debug
debug: debug-$(PYMAIN)
......@@ -77,7 +89,9 @@ pip_upload:
pip_test: $(addprefix pip_test-,$(PYVERSIONS))
run: build
docker run --rm -p 5000:5000 -ti '$(IMAGENAME)-python$(PYMAIN)'
run-%: build-%
docker run --rm -p 5000:5000 -ti '$(IMAGENAME)-python$(PYMAIN)' --default-plugins
run: run-$(PYMAIN)
.PHONY: test test-% build-% build test pip_test run yapf dev
......@@ -53,7 +53,8 @@
"@id": "http://micro.blog/status1#char=16,77",
"nif:beginIndex": 16,
"nif:endIndex": 77,
"nif:anchorOf": "put your Windows Phone on your newest #open technology program"
"nif:anchorOf": "put your Windows Phone on your newest #open technology program",
"prov:wasGeneratedBy": "me:SgAnalysis1"
}
],
"sentiments": [
......
......@@ -24,7 +24,8 @@
"@id": "http://micro.blog/status1#char=16,77",
"nif:beginIndex": 16,
"nif:endIndex": 77,
"nif:anchorOf": "put your Windows Phone on your newest #open technology program"
"nif:anchorOf": "put your Windows Phone on your newest #open technology program",
"prov:wasGeneratedBy": "me:SgAnalysis1"
}
],
"sentiments": [
......
......@@ -2,46 +2,127 @@ Developing new plugins
----------------------
Each plugin represents a different analysis process.There are two types of files that are needed by senpy for loading a plugin:
Plugins Interface
=======
- Definition file, has the ".senpy" extension.
- Code file, is a python file.
This separation will allow us to deploy plugins that use the same code but employ different parameters.
For instance, one could use the same classifier and processing in several plugins, but train with different datasets.
This scenario is particularly useful for evaluation purposes.
The only limitation is that the name of each plugin needs to be unique.
Plugins Definitions
===================
The definition file can be written in JSON or YAML, where the data representation consists on attribute-value pairs.
The principal attributes are:
The definition file contains all the attributes of the plugin, and can be written in YAML or JSON.
The most important attributes are:
* name: plugin name used in senpy to call the plugin.
* module: indicates the module that will be loaded
* **name**: unique name that senpy will use internally to identify the plugin.
* **module**: indicates the module that contains the plugin code, which will be automatically loaded by senpy.
* **version**
* extra_params: used to specify parameters that the plugin accepts that are not already part of the senpy API. Those parameters may be required, and have aliased names. For instance:
.. code:: python
.. code:: yaml
extra_params:
hello_param:
aliases: # required
- hello_param
- hello
required: true
default: Hi you
values:
- Hi you
- Hello y'all
- Howdy
Parameter validation will fail if a required parameter without a default has not been provided, or if the definition includes a set of values and the provided one does not match one of them.
A complete example:
.. code:: yaml
name: <Name of the plugin>
module: <Python file>
version: 0.1
And the json equivalent:
.. code:: json
{
"name" : "senpyPlugin",
"module" : "{python code file}"
"name": "<Name of the plugin>",
"module": "<Python file>",
"version": "0.1"
}
.. code:: python
name: senpyPlugin
module: {python code file}
Plugins Code
=================
============
The basic methods in a plugin are:
* __init__
* activate: used to load memory-hungry resources
* deactivate: used to free up resources
* analyse: called in every user requests. It takes in the parameters supplied by a user and should return a senpy Response.
* analyse_entry: called in every user requests. It takes in the parameters supplied by a user and should yield one or more ``Entry`` objects.
Plugins are loaded asynchronously, so don't worry if the activate method takes too long. The plugin will be marked as activated once it is finished executing the method.
Example plugin
==============
In this section, we will implement a basic sentiment analysis plugin.
To determine the polarity of each entry, the plugin will compare the length of the string to a threshold.
This threshold will be included in the definition file.
The definition file would look like this:
.. code:: yaml
name: helloworld
module: helloworld
version: 0.0
threshold: 10
Now, in a file named ``helloworld.py``:
.. code:: python
#!/bin/env python
#helloworld.py
from senpy.plugins import SenpyPlugin
from senpy.models import Sentiment
class HelloWorld(SenpyPlugin):
def analyse_entry(entry, params):
'''Basically do nothing with each entry'''
sentiment = Sentiment()
if len(entry.text) < self.threshold:
sentiment['marl:hasPolarity'] = 'marl:Positive'
else:
sentiment['marl:hasPolarity'] = 'marl:Negative'
entry.sentiments.append(sentiment)
yield entry
F.A.Q.
======
Why does the analyse function yield instead of return?
??????????????????????????????????????????????????????
This is so that plugins may add new entries to the response or filter some of them.
For instance, a `context detection` plugin may add a new entry for each context in the original entry.
On the other hand, a conveersion plugin may leave out those entries that do not contain relevant information.
If I'm using a classifier, where should I train it?
???????????????????????????????????????????????????
......@@ -78,17 +159,17 @@ This example ilustrate how to implement the Sentiment140 service as a plugin in
.. code:: python
class Sentiment140Plugin(SentimentPlugin):
def analyse(self, **params):
def analyse_entry(self, entry, params):
text = entry.text
lang = params.get("language", "auto")
res = requests.post("http://www.sentiment140.com/api/bulkClassifyJson",
json.dumps({"language": lang,
"data": [{"text": params["input"]}]
"data": [{"text": text}]
}
)
)
p = params.get("prefix", None)
response = Results(prefix=p)
polarity_value = self.maxPolarityValue*int(res.json()["data"][0]
["polarity"]) * 0.25
polarity = "marl:Neutral"
......@@ -98,18 +179,13 @@ This example ilustrate how to implement the Sentiment140 service as a plugin in
elif polarity_value < neutral_value:
polarity = "marl:Negative"
entry = Entry(id="Entry0",
nif__isString=params["input"])
sentiment = Sentiment(id="Sentiment0",
prefix=p,
marl__hasPolarity=polarity,
marl__polarityValue=polarity_value)
sentiment.prov__wasGeneratedBy = self.id
entry.sentiments = []
entry.sentiments.append(sentiment)
entry.language = lang
response.entries.append(entry)
return response
yield entry
Where can I define extra parameters to be introduced in the request to my plugin?
......@@ -143,9 +219,9 @@ The extraction of this paremeter is used in the analyse method of the Plugin int
Where can I set up variables for using them in my plugin?
?????????????????????????????????????????????????????????
You can add these variables in the definition file with the extracture of attribute-value pair.
You can add these variables in the definition file with the structure of attribute-value pairs.
Once you have added your variables, the next step is to extract them into the plugin. The plugin's __init__ method has a parameter called `info` where you can extract the values of the variables. This info parameter has the structure of a python dictionary.
Every field added to the definition file is available to the plugin instance.
Can I activate a DEBUG mode for my plugin?
???????????????????????????????????????????
......@@ -154,7 +230,15 @@ You can activate the DEBUG mode by the command-line tool using the option -d.
.. code:: bash
python -m senpy -d
senpy -d
Additionally, with the ``--pdb`` option you will be dropped into a pdb post mortem shell if an exception is raised.
.. code:: bash
senpy --pdb
Where can I find more code examples?
????????????????????????????????????
......
Flask>=0.10.1
gunicorn>=19.0.0
requests>=2.4.1
GitPython>=0.3.2.RC1
gevent>=1.1rc4
PyLD>=0.6.5
six
......@@ -9,4 +7,5 @@ future
jsonschema
jsonref
PyYAML
semver
rdflib
rdflib-jsonld
......@@ -20,21 +20,10 @@ Sentiment analysis server in Python
from __future__ import print_function
from .version import __version__
try:
import semver
__version_info__ = semver.parse_version_info(__version__)
import logging
if __version_info__.prerelease:
import logging
logger = logging.getLogger(__name__)
msg = 'WARNING: You are using a pre-release version of {} ({})'.format(
__name__, __version__)
if len(logging.root.handlers) > 0:
logger.info(msg)
else:
import sys
print(msg, file=sys.stderr)
except ImportError:
print('semver not installed. Not doing version checking')
logger = logging.getLogger(__name__)
logger.info('Using senpy version: {}'.format(__version__))
__all__ = ['api', 'blueprints', 'cli', 'extensions', 'models', 'plugins']
......@@ -74,7 +74,7 @@ def main():
parser.add_argument(
'--host',
type=str,
default="127.0.0.1",
default="0.0.0.0",
help='Use 0.0.0.0 to accept requests from any host.')
parser.add_argument(
'--port',
......@@ -93,8 +93,7 @@ def main():
'-i',
action='store_true',
default=False,
help='Do not run a server, only install plugin dependencies'
)
help='Do not run a server, only install plugin dependencies')
args = parser.parse_args()
logging.basicConfig()
rl = logging.getLogger()
......
......@@ -7,6 +7,31 @@ API_PARAMS = {
"algorithm": {
"aliases": ["algorithm", "a", "algo"],
"required": False,
},
"outformat": {
"@id": "outformat",
"aliases": ["outformat", "o"],
"default": "json-ld",
"required": True,
"options": ["json-ld", "turtle"],
},
"expanded-jsonld": {
"@id": "expanded-jsonld",
"aliases": ["expanded", "expanded-jsonld"],
"required": True,
"default": 0
},
"emotionModel": {
"@id": "emotionModel",
"aliases": ["emotionModel", "emoModel"],
"required": False
},
"conversion": {
"@id": "conversion",
"description": "How to show the elements that have (not) been converted",
"required": True,
"options": ["filtered", "nested", "full"],
"default": "full"
}
}
......@@ -47,13 +72,6 @@ NIF_PARAMS = {
"default": "direct",
"options": ["direct", "url", "file"],
},
"outformat": {
"@id": "outformat",
"aliases": ["outformat", "o"],
"default": "json-ld",
"required": False,
"options": ["json-ld"],
},
"language": {
"@id": "language",
"aliases": ["language", "l"],
......@@ -76,12 +94,12 @@ NIF_PARAMS = {
def parse_params(indict, spec=NIF_PARAMS):
outdict = {}
logger.debug("Parsing: {}\n{}".format(indict, spec))
outdict = indict.copy()
wrong_params = {}
for param, options in iteritems(spec):
if param[0] != "@": # Exclude json-ld properties
logger.debug("Param: %s - Options: %s", param, options)
for alias in options["aliases"]:
for alias in options.get("aliases", []):
if alias in indict:
outdict[param] = indict[alias]
if param not in outdict:
......@@ -95,8 +113,9 @@ def parse_params(indict, spec=NIF_PARAMS):
outdict[param] not in spec[param]["options"]:
wrong_params[param] = spec[param]
if wrong_params:
logger.debug("Error parsing: %s", wrong_params)
message = Error(
status=404,
status=400,
message="Missing or invalid parameters",
parameters=outdict,
errors={param: error
......
......@@ -17,10 +17,11 @@
"""
Blueprints for Senpy
"""
from flask import (Blueprint, request, current_app,
render_template, url_for, jsonify)
from flask import (Blueprint, request, current_app, render_template, url_for,
jsonify)
from .models import Error, Response, Plugins, read_schema
from .api import WEB_PARAMS, parse_params
from .api import WEB_PARAMS, API_PARAMS, parse_params
from .version import __version__
from functools import wraps
import logging
......@@ -29,6 +30,7 @@ logger = logging.getLogger(__name__)
api_blueprint = Blueprint("api", __name__)
demo_blueprint = Blueprint("demo", __name__)
ns_blueprint = Blueprint("ns", __name__)
def get_params(req):
......@@ -43,12 +45,21 @@ def get_params(req):
@demo_blueprint.route('/')
def index():
return render_template("index.html")
return render_template("index.html", version=__version__)
@api_blueprint.route('/contexts/<entity>.jsonld')
def context(entity="context"):
return jsonify({"@context": Response.context})
context = Response._context
context['@vocab'] = url_for('ns.index', _external=True)
return jsonify({"@context": context})
@ns_blueprint.route('/') # noqa: F811
def index():
context = Response._context
context['@vocab'] = url_for('.ns', _external=True)
return jsonify({"@context": context})
@api_blueprint.route('/schemas/<schema>')
......@@ -62,26 +73,39 @@ def schema(schema="definitions"):
def basic_api(f):
@wraps(f)
def decorated_function(*args, **kwargs):
print('Getting request:')
print(request)
raw_params = get_params(request)
web_params = parse_params(raw_params, spec=WEB_PARAMS)
headers = {'X-ORIGINAL-PARAMS': raw_params}
# Get defaults
web_params = parse_params({}, spec=WEB_PARAMS)
api_params = parse_params({}, spec=API_PARAMS)
if hasattr(request, 'params'):
request.params.update(raw_params)
else:
request.params = raw_params
outformat = 'json-ld'
try:
print('Getting request:')
print(request)
web_params = parse_params(raw_params, spec=WEB_PARAMS)
api_params = parse_params(raw_params, spec=API_PARAMS)
if hasattr(request, 'params'):
request.params.update(api_params)
else:
request.params = api_params
response = f(*args, **kwargs)
except Error as ex:
response = ex
in_headers = web_params["inHeaders"] != "0"
headers = {'X-ORIGINAL-PARAMS': raw_params}
in_headers = web_params['inHeaders'] != "0"
expanded = api_params['expanded-jsonld']
outformat = api_params['outformat']
return response.flask(
in_headers=in_headers,
headers=headers,
context_uri=url_for(
'api.context', entity=type(response).__name__, _external=True))
prefix=url_for('.api', _external=True),
context_uri=url_for('api.context',
entity=type(response).__name__,
_external=True),
outformat=outformat,
expanded=expanded)
return decorated_function
......@@ -106,10 +130,11 @@ def plugins():
def plugin(plugin=None):
sp = current_app.senpy
if plugin == 'default' and sp.default_plugin:
response = sp.default_plugin
plugin = response.name
elif plugin in sp.plugins:
response = sp.plugins[plugin]
return sp.default_plugin
plugins = sp.filter_plugins(
id='plugins/{}'.format(plugin)) or sp.filter_plugins(name=plugin)
if plugins:
response = list(plugins.values())[0]
else:
return Error(message="Plugin not found", status=404)
return response
......@@ -6,7 +6,6 @@ logger = logging.getLogger(__name__)
class Client(object):
def __init__(self, endpoint):
self.endpoint = endpoint
......@@ -15,9 +14,7 @@ class Client(object):
def request(self, path=None, method='GET', **params):
url = '{}{}'.format(self.endpoint, path)
response = requests.request(method=method,
url=url,
params=params)
response = requests.request(method=method, url=url, params=params)
try:
resp = models.from_dict(response.json())
resp.validate(resp)
......@@ -30,8 +27,9 @@ class Client(object):
'#### Response:\n'
'\tCode: {code}'
'\tContent: {content}'
'\n').format(error=ex,
url=url,
code=response.status_code,
content=response.content))
'\n').format(
error=ex,
url=url,
code=response.status_code,
content=response.content))
raise ex
"""
Main class for Senpy.
It orchestrates plugin (de)activation and analysis.
"""
from future import standard_library
standard_library.install_aliases()
from .plugins import SentimentPlugin
from .models import Error
from .blueprints import api_blueprint, demo_blueprint
from .plugins import SentimentPlugin, SenpyPlugin
from .models import Error, Entry, Results
from .blueprints import api_blueprint, demo_blueprint, ns_blueprint
from .api import API_PARAMS, NIF_PARAMS, parse_params
from git import Repo, InvalidGitRepositoryError
from threading import Thread
import os
......@@ -30,18 +30,21 @@ class Senpy(object):
def __init__(self,
app=None,
plugin_folder="plugins",
plugin_folder=".",
default_plugins=False):
self.app = app
self._search_folders = set()
self._plugin_list = []
self._outdated = True
self._default = None
self.add_folder(plugin_folder)
if default_plugins:
base_folder = os.path.join(os.path.dirname(__file__), "plugins")
self.add_folder(base_folder)
self.add_folder('plugins', from_root=True)
else:
# Add only conversion plugins
self.add_folder(os.path.join('plugins', 'conversion'),
from_root=True)
if app is not None:
self.init_app(app)
......@@ -60,9 +63,12 @@ class Senpy(object):
else:
app.teardown_request(self.teardown)
app.register_blueprint(api_blueprint, url_prefix="/api")
app.register_blueprint(ns_blueprint, url_prefix="/ns")
app.register_blueprint(demo_blueprint, url_prefix="/")
def add_folder(self, folder):
def add_folder(self, folder, from_root=False):
if from_root:
folder = os.path.join(os.path.dirname(__file__), folder)
logger.debug("Adding folder: %s", folder)
if os.path.isdir(folder):
self._search_folders.add(folder)
......@@ -70,10 +76,9 @@ class Senpy(object):
else:
logger.debug("Not a folder: %s", folder)
def analyse(self, **params):
algo = None
logger.debug("analysing with params: {}".format(params))
def _find_plugin(self, params):
api_params = parse_params(params, spec=API_PARAMS)
algo = None
if "algorithm" in api_params and api_params["algorithm"]:
algo = api_params["algorithm"]
elif self.plugins:
......@@ -97,32 +102,114 @@ class Senpy(object):
status=400,
message=("The algorithm '{}'"
" is not activated yet").format(algo))
plug = self.plugins[algo]
return self.plugins[algo]
def _get_params(self, params, plugin):
nif_params = parse_params(params, spec=NIF_PARAMS)
extra_params = plug.get('extra_params', {})
extra_params = plugin.get('extra_params', {})
specific_params = parse_params(params, spec=extra_params)
nif_params.update(specific_params)
return nif_params
def _get_entries(self, params):
entry = None
if params['informat'] == 'text':
entry = Entry(text=params['input'])
else:
raise NotImplemented('Only text input format implemented')
yield entry
def analyse(self, **api_params):
logger.debug("analysing with params: {}".format(api_params))
plugin = self._find_plugin(api_params)
nif_params = self._get_params(api_params, plugin)
resp = Results()
if 'with_parameters' in api_params:
resp.parameters = nif_params
try:
resp = plug.analyse(**nif_params)
resp.analysis.append(plug)
entries = []
for i in self._get_entries(nif_params):
entries += list(plugin.analyse_entry(i, nif_params))
resp.entries = entries
self.convert_emotions(resp, plugin, nif_params)
resp.analysis.append(plugin.id)
logger.debug("Returning analysis result: {}".format(resp))
except Error as ex:
logger.exception('Error returning analysis result')
resp = ex
except Exception as ex:
resp = Error(message=str(ex), status=500)
logger.exception('Error returning analysis result')
resp = Error(message=str(ex), status=500)
return resp
def _conversion_candidates(self, fromModel, toModel):
candidates = self.filter_plugins(**{'@type': 'emotionConversionPlugin'})
for name, candidate in candidates.items():
for pair in candidate.onyx__doesConversion:
logging.debug(pair)
if pair['onyx:conversionFrom'] == fromModel \
and pair['onyx:conversionTo'] == toModel:
# logging.debug('Found candidate: {}'.format(candidate))
yield candidate
def convert_emotions(self, resp, plugin, params):
"""
Conversion of all emotions in a response.
In addition to converting from one model to another, it has
to include the conversion plugin to the analysis list.
Needless to say, this is far from an elegant solution, but it works.
@todo refactor and clean up
"""