plugins.rst 11.3 KB
Newer Older
1 2
Developing new plugins
----------------------
J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
3 4 5
This document contains the minimum to get you started with developing new analysis plugin.
For an example of conversion plugins, see :doc:`conversion`.
For a description of definition files, see :doc:`plugins-definition`.
6

J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
7
A more step-by-step tutorial with slides is available `here <https://lab.cluster.gsi.dit.upm.es/senpy/senpy-tutorial>`__ 
8

9 10
.. contents:: :local:

J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
11 12
What is a plugin?
=================
13

J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
14
A plugin is a python object that can process entries. Given an entry, it will modify it, add annotations to it, or generate new entries.
Ignacio Corcuera's avatar
Ignacio Corcuera committed
15

J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
16

J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
17 18
What is an entry?
=================
19

J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
20 21 22 23
Entries are objects that can be annotated.
In general, they will be a piece of text.
By default, entries are `NIF contexts <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/nif-core.html>`_ represented in JSON-LD format.
It is a dictionary/JSON object that looks like this:
24

J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
25
  .. code:: python
26

J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
27 28 29 30 31 32 33 34 35
            {
               "@id": "<unique identifier or blank node name>",
               "nif:isString": "input text",
               "sentiments": [ {
                     ...
               }
               ],
               ...
            }
J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
36 37 38 39 40 41 42 43 44 45 46 47 48 49

Annotations are added to the object like this:

.. code:: python

   entry = Entry()
   entry.vocabulary__annotationName = 'myvalue'
   entry['vocabulary:annotationName'] = 'myvalue'
   entry['annotationNameURI'] = 'myvalue'

Where vocabulary is one of the prefixes defined in the default senpy context, and annotationURI is a full URI.
The value may be any valid JSON-LD dictionary.
For simplicity, senpy includes a series of models by default in the ``senpy.models`` module.

J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
50

J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
51 52 53 54
What are annotations?
=====================
They are objects just like entries.
Senpy ships with several default annotations, including: ``Sentiment``, ``Emotion``, ``EmotionSet``...jk bb
J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
55 56


J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
57 58
What's a plugin made of?
========================
J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
59

J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
60 61
When receiving a query, senpy selects what plugin or plugins should process each entry, and in what order.
It also makes sure the every entry and the parameters provided by the user meet the plugin requirements.
J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
62

J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
63
Hence, two parts are necessary: 1) the code that will process the entry, and 2) some attributes and metadata that will tell senpy how to interact with the plugin.
J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
64

J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
65
In practice, this is what a plugin looks like, tests included:
J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
66 67


J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
68 69 70
.. literalinclude:: ../senpy/plugins/example/rand_plugin.py
   :emphasize-lines: 5-11
   :language: python
J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
71 72


J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
73 74
The lines highlighted contain some information about the plugin.
In particular, the following information is mandatory:
J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
75

J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
76 77 78 79 80
* A unique name for the class. In our example, Rand.
* The subclass/type of plugin. This is typically either `SentimentPlugin` or `EmotionPlugin`. However, new types of plugin can be created for different annotations. The only requirement is that these new types inherit from `senpy.Analysis`
* A description of the plugin. This can be done simply by adding a doc to the class.
* A version, which should get updated.
* An author name.
J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
81 82


J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
83 84
Plugins Code
============
J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
85

J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
86
The basic methods in a plugin are:
87

J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
88 89 90
* analyse_entry: called in every user requests. It takes two parameters: ``Entry``, the entry object, and ``params``, the parameters supplied by the user. It should yield one or more ``Entry`` objects.
* activate: used to load memory-hungry resources. For instance, to train a classifier.
* deactivate: used to free up resources when the plugin is no longer needed.
91

J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
92
Plugins are loaded asynchronously, so don't worry if the activate method takes too long. The plugin will be marked as activated once it is finished executing the method.
93 94


J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
95 96
How does senpy find modules?
============================
97

J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
98 99 100 101
Senpy looks for files of two types:

* Python files of the form `senpy_<NAME>.py` or `<NAME>_plugin.py`. In these files, it will look for: 1) Instances that inherit from `senpy.Plugin`, or subclasses of `senpy.Plugin` that can be initialized without a configuration file. i.e. classes that contain all the required attributes for a plugin.
* Plugin definition files (see :doc:`advanced-plugins`)
102

J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
103 104
Defining additional parameters
==============================
105

J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
106 107 108 109 110 111 112
Your plugin may ask for additional parameters from the users of the service by using the attribute ``extra_params`` in your plugin definition.
It takes a dictionary, where the keys are the name of the argument/parameter, and the value has the following fields:

* aliases: the different names which can be used in the request to use the parameter.
* required: if set to true, users need to provide this parameter unless a default is set.
* options: the different acceptable values of the parameter (i.e. an enum). If set, the value provided must match one of the options.
* default: the default value of the parameter, if none is provided in the request.
113 114

.. code:: python
115

J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
116 117 118 119 120 121 122 123
          "extra_params":{
             "language": {
                "aliases": ["language", "lang", "l"],
                "required": True,
                "options": ["es", "en"],
                "default": "es"
                }
             }
124 125 126



J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
127 128 129 130 131 132 133
Loading data and files
======================

Most plugins will need access to files (dictionaries, lexicons, etc.).
These files are usually heavy or under a license that does not allow redistribution.
For this reason, senpy has a `data_folder` that is separated from the source files.
The location of this folder is controlled programmatically or by setting the `SENPY_DATA` environment variable.
134

J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
135
Plugins have a convenience function `self.open` which will automatically prepend the data folder to relative paths:
136 137 138 139


.. code:: python

J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155
          import os


          class PluginWithResources(AnalysisPlugin):
              file_in_data = <FILE PATH>
              file_in_sources = <FILE PATH>

              def activate(self):
                  with self.open(self.file_in_data) as f:
                      self._classifier = train_from_file(f)
                  file_in_source = os.path.join(self.get_folder(), self.file_in_sources)
                  with self.open(file_in_source) as f:
                      pass

         
It is good practice to specify the paths of these files in the plugin configuration, so the same code can be reused with different resources.
156 157


J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
158 159 160 161 162 163 164
Docker image
============

Add the following dockerfile to your project to generate a docker image with your plugin:

.. code:: dockerfile

J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
165 166 167 168 169 170 171 172 173
   FROM gsiupm/senpy

Once you make sure your plugin works with a specific version of senpy, modify that file to make sure your build will work even if senpy gets updated.
e.g.:


.. code:: dockerfile

   FROM gsiupm/senpy:1.0.1
J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
174

J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
175
     
J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
176 177 178 179 180 181 182 183 184 185 186 187 188
This will copy your source folder to the image, and install all dependencies.
Now, to build an image:

.. code:: shell

   docker build . -t gsiupm/exampleplugin

And you can run it with:

.. code:: shell

   docker run -p 5000:5000 gsiupm/exampleplugin

J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
189

J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
190
If the plugin uses non-source files (:ref:`loading data and files`), the recommended way is to use `SENPY_DATA` folder.
191 192 193 194 195 196 197 198 199 200 201 202 203
Data can then be mounted in the container or added to the image.
The former is recommended for open source plugins with licensed resources, whereas the latter is the most convenient and can be used for private images.

Mounting data:

.. code:: bash

   docker run -v $PWD/data:/data gsiupm/exampleplugin

Adding data to the image:

.. code:: dockerfile

J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
204
   FROM gsiupm/senpy:1.0.1
205
   COPY data /
J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
206

207 208
F.A.Q.
======
J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
209 210 211 212
What annotations can I use?
???????????????????????????

You can add almost any annotation to an entry.
213
The most common use cases are covered in the :doc:`apischema`.
J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
214 215


J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
216 217 218 219
Why does the analyse function yield instead of return?
??????????????????????????????????????????????????????

This is so that plugins may add new entries to the response or filter some of them.
J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
220
For instance, a chunker may split one entry into several.
221
On the other hand, a conversion plugin may leave out those entries that do not contain relevant information.
J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
222 223


J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
224 225
If I'm using a classifier, where should I train it?
???????????????????????????????????????????????????
226 227 228 229 230

Training a classifier can be time time consuming. To avoid running the training unnecessarily, you can use ShelfMixin to store the classifier. For instance:

.. code:: python

231
          from senpy.plugins import ShelfMixin, AnalysisPlugin
232

233
          class MyPlugin(ShelfMixin, AnalysisPlugin):
234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249
              def train(self):
                  ''' Code to train the classifier
                  '''
                  # Here goes the code
                  # ...
                  return classifier

              def activate(self):
                  if 'classifier' not in self.sh:
                      classifier = self.train()
                      self.sh['classifier'] = classifier
                  self.classifier = self.sh['classifier']
              
              def deactivate(self):
                  self.close()

J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
250 251 252

By default the ShelfMixin creates a file based on the plugin name and stores it in that plugin's folder.
However, you can manually specify a 'shelf_file' in your .senpy file.
253 254 255

Shelves may get corrupted if the plugin exists unexpectedly.
A corrupt shelf prevents the plugin from loading.
J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
256
If you do not care about the data in the shelf, you can force your plugin to remove the corrupted file and load anyway, set the  'force_shelf' to True in your plugin and start it again.
257

258 259
How can I turn an external service into a plugin?
?????????????????????????????????????????????????
Ignacio Corcuera's avatar
Ignacio Corcuera committed
260

261
This example ilustrate how to implement a plugin that accesses the Sentiment140 service.
Ignacio Corcuera's avatar
Ignacio Corcuera committed
262 263 264 265

.. code:: python

          class Sentiment140Plugin(SentimentPlugin):
J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
266 267
              def analyse_entry(self, entry, params):
                  text = entry.text
Ignacio Corcuera's avatar
Ignacio Corcuera committed
268 269 270
                  lang = params.get("language", "auto")
                  res = requests.post("http://www.sentiment140.com/api/bulkClassifyJson",
                                      json.dumps({"language": lang,
J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
271
                                                  "data": [{"text": text}]
Ignacio Corcuera's avatar
Ignacio Corcuera committed
272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289
                                                  }
                                                 )
                                      )

                  p = params.get("prefix", None)
                  polarity_value = self.maxPolarityValue*int(res.json()["data"][0]
                                                             ["polarity"]) * 0.25
                  polarity = "marl:Neutral"
                  neutral_value = self.maxPolarityValue / 2.0
                  if polarity_value > neutral_value:
                      polarity = "marl:Positive"
                  elif polarity_value < neutral_value:
                      polarity = "marl:Negative"

                  sentiment = Sentiment(id="Sentiment0",
                                      prefix=p,
                                      marl__hasPolarity=polarity,
                                      marl__polarityValue=polarity_value)
J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
290
                  sentiment.prov(self)
Ignacio Corcuera's avatar
Ignacio Corcuera committed
291
                  entry.sentiments.append(sentiment)
J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
292
                  yield entry
Ignacio Corcuera's avatar
Ignacio Corcuera committed
293 294 295 296 297 298 299 300 301


Can I activate a DEBUG mode for my plugin?
???????????????????????????????????????????

You can activate the DEBUG mode by the command-line tool using the option -d.

.. code:: bash

J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
302 303 304 305 306 307 308
   senpy -d


Additionally, with the ``--pdb`` option you will be dropped into a pdb post mortem shell if an exception is raised.

.. code:: bash

J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
309
   python -m pdb yourplugin.py
J. Fernando Sánchez's avatar
J. Fernando Sánchez committed
310

311 312 313 314
Where can I find more code examples?
????????????????????????????????????

See: `<http://github.com/gsi-upm/senpy-plugins-community>`_.