Commit 35797c9d authored by Rodrigo Barbado Esteban's avatar Rodrigo Barbado Esteban
Browse files

Video and minor changes

parent 8698b6b4
......@@ -6,7 +6,7 @@ GSI Crawler is an innovative and useful framework which aims to extract informat
.. image:: images/crawler1.png
:align: center
In this documentation we are going to introduce this framework, detailing the global architecture of the project and explaining each module functionality. Finally we will expose most a case study in order to better understand the system itself.
In this documentation we are going to introduce this framework, detailing the global architecture of the project and explaining each module functionality. Finally we will expose most a case study in order to better understand the system itself. A demo video about GSI Crawler is available `here <https://www.youtube.com/watch?v=x9jzGDZs5hY&feature=youtu.be>`_.
......
......@@ -28,8 +28,7 @@ First of all, you need to clone the repositories:
.. code:: bash
$ git clone https://lab.cluster.gsi.dit.upm.es/sefarad/gsicrawler.git
$ git clone https://lab.cluster.gsi.dit.upm.es/sefarad/dashboard-gsicrawler.git
$ git clone http://lab.cluster.gsi.dit.upm.es/sefarad/gsicrawler.git
Then, it is needed to set up the environment variables. For this task, first create a file named ``.env`` in the root directory of each project (gsicrawler and dashboard-gsicrawler). As you can see, `Twitter <https://developer.twitter.com/en/docs/basics/authentication/guides/access-tokens>`_ and `Meaningcloud <https://www.meaningcloud.com/developer/apis>`_ credentials are needed if you wish to use those services.
......@@ -43,23 +42,20 @@ Then, it is needed to set up the environment variables. For this task, first cre
ES_PORT=9200
ES_ENDPOINT_EXTERNAL=localhost:19200
FUSEKI_PASSWORD={YourFusekiPass}
FUSEKI_ENDPOINT_EXTERNAL=fuseki:3030
FUSEKI_ENDPOINT_EXTERNAL=localhost:13030
FUSEKI_ENDPOINT={YourFusekiEndPoint}
API_KEY_MEANING_CLOUD={YourMeaningCloudApiKey, get it on Meaningcloud}
FUSEKI_ENDPOINT_DASHBOARD={YourFusekiEndpoint, e.g. localhost:13030}
FUSEKI_ENDPOINT = fuseki
FUSEKI_PORT = 3030
Finally, in both repositories execute the following line:
Finally, execute the following lines:
.. code:: bash
$ cd gsicrawler
$ sudo docker-compose up
$ cd ../dashboard-gsicrawler
$ sudo docker/compose up
The information related to the initialization can be found in the console. If you wish to see how tasks are being executed, apart from seeing the logs you can access the Luigi task visualizer in ``localhost:8082``. In the next steps you will discover more about Luigi.
......@@ -133,7 +129,7 @@ Finally, for running the tutorial execute the following line from your repositor
.. code:: bash
$ docker-compose exec luigi python -m crontasks tutorial2
$ sudo docker-compose run gsicrawler tutorial2
|
......@@ -181,7 +177,7 @@ For executing this tutorial you should execute the following line:
.. code:: bash
$ docker-compose exec luigi python -m crontasks tutorial3
$ sudo docker-compose run gsicrawler tutorial3
In order to access the stored data in Elastic Search, access ``localhost:19200/tutorial/_search?pretty`` from your web browser.
......@@ -230,4 +226,7 @@ In the case of seeing it on Fuseki, the address would be ``localhost:13030/tutor
schema:search "\"isis\"" ;
schema:thumbnailUrl "http://i2.cdn.turner.com/cnnnext/dam/assets/171002123455-31-las-vegas-incident-1002-story-body.jpg" .
For developing visual analysis tools, we suggest to build a dashboard following this `documentation <http://sefarad.readthedocs.io/en/latest/dashboards-dev.html>`_.
Tutorial IV: Developing your first dashboard
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......@@ -44,7 +44,7 @@
<h1>What is GSI Crawler?<a class="headerlink" href="#what-is-gsi-crawler" title="Permalink to this headline"></a></h1>
<p>GSI Crawler is an innovative and useful framework which aims to extract information from web pages enriching following semantic approaches. At the moment, there are three available platforms: Twitter, Reddit and News. The user interacts with the tool through a web interface, selecting the analysis type he wants to carry out and the platform that is going to be examined.</p>
<img alt="_images/crawler1.png" class="align-center" src="_images/crawler1.png" />
<p>In this documentation we are going to introduce this framework, detailing the global architecture of the project and explaining each module functionality. Finally we will expose most a case study in order to better understand the system itself.</p>
<p>In this documentation we are going to introduce this framework, detailing the global architecture of the project and explaining each module functionality. Finally we will expose most a case study in order to better understand the system itself. A demo video about GSI Crawler is available <a class="reference external" href="https://www.youtube.com/watch?v=x9jzGDZs5hY&amp;feature=youtu.be">here</a>.</p>
</div>
......
......@@ -50,6 +50,7 @@
<li class="toctree-l2"><a class="reference internal" href="tutorials.html#tutorial-i-install">Tutorial I: Install</a></li>
<li class="toctree-l2"><a class="reference internal" href="tutorials.html#tutorial-ii-crawling-news">Tutorial II: Crawling news</a></li>
<li class="toctree-l2"><a class="reference internal" href="tutorials.html#tutorial-iii-semantic-enrichment-and-data-storage">Tutorial III: Semantic enrichment and data storage</a></li>
<li class="toctree-l2"><a class="reference internal" href="tutorials.html#tutorial-iv-developing-your-first-dashboard">Tutorial IV: Developing your first dashboard</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="architecture.html">Architecture</a><ul>
......
This diff is collapsed.
......@@ -57,8 +57,7 @@
<p>For docker installation in Ubuntu, visit this <a class="reference external" href="https://store.docker.com/editions/community/docker-ce-server-ubuntu?tab=description">link</a>.</p>
<p>Docker-compose installation detailed instructions are available <a class="reference external" href="https://docs.docker.com/compose/install/">here</a>.</p>
<p>First of all, you need to clone the repositories:</p>
<div class="code bash highlight-default"><div class="highlight"><pre><span></span>$ git clone https://lab.cluster.gsi.dit.upm.es/sefarad/gsicrawler.git
$ git clone https://lab.cluster.gsi.dit.upm.es/sefarad/dashboard-gsicrawler.git
<div class="code bash highlight-default"><div class="highlight"><pre><span></span>$ git clone http://lab.cluster.gsi.dit.upm.es/sefarad/gsicrawler.git
</pre></div>
</div>
<p>Then, it is needed to set up the environment variables. For this task, first create a file named <code class="docutils literal"><span class="pre">.env</span></code> in the root directory of each project (gsicrawler and dashboard-gsicrawler). As you can see, <a class="reference external" href="https://developer.twitter.com/en/docs/basics/authentication/guides/access-tokens">Twitter</a> and <a class="reference external" href="https://www.meaningcloud.com/developer/apis">Meaningcloud</a> credentials are needed if you wish to use those services.</p>
......@@ -70,19 +69,16 @@ $ git clone https://lab.cluster.gsi.dit.upm.es/sefarad/dashboard-gsicrawler.git
<span class="n">ES_PORT</span><span class="o">=</span><span class="mi">9200</span>
<span class="n">ES_ENDPOINT_EXTERNAL</span><span class="o">=</span><span class="n">localhost</span><span class="p">:</span><span class="mi">19200</span>
<span class="n">FUSEKI_PASSWORD</span><span class="o">=</span><span class="p">{</span><span class="n">YourFusekiPass</span><span class="p">}</span>
<span class="n">FUSEKI_ENDPOINT_EXTERNAL</span><span class="o">=</span><span class="n">fuseki</span><span class="p">:</span><span class="mi">3030</span>
<span class="n">FUSEKI_ENDPOINT_EXTERNAL</span><span class="o">=</span><span class="n">localhost</span><span class="p">:</span><span class="mi">13030</span>
<span class="n">FUSEKI_ENDPOINT</span><span class="o">=</span><span class="p">{</span><span class="n">YourFusekiEndPoint</span><span class="p">}</span>
<span class="n">API_KEY_MEANING_CLOUD</span><span class="o">=</span><span class="p">{</span><span class="n">YourMeaningCloudApiKey</span><span class="p">,</span> <span class="n">get</span> <span class="n">it</span> <span class="n">on</span> <span class="n">Meaningcloud</span><span class="p">}</span>
<span class="n">FUSEKI_ENDPOINT_DASHBOARD</span><span class="o">=</span><span class="p">{</span><span class="n">YourFusekiEndpoint</span><span class="p">,</span> <span class="n">e</span><span class="o">.</span><span class="n">g</span><span class="o">.</span> <span class="n">localhost</span><span class="p">:</span><span class="mi">13030</span><span class="p">}</span>
<span class="n">FUSEKI_ENDPOINT</span> <span class="o">=</span> <span class="n">fuseki</span>
<span class="n">FUSEKI_PORT</span> <span class="o">=</span> <span class="mi">3030</span>
</pre></div>
</div>
<p>Finally, in both repositories execute the following line:</p>
<p>Finally, execute the following lines:</p>
<div class="code bash highlight-default"><div class="highlight"><pre><span></span>$ cd gsicrawler
$ sudo docker-compose up
$ cd ../dashboard-gsicrawler
$ sudo docker/compose up
</pre></div>
</div>
<p>The information related to the initialization can be found in the console. If you wish to see how tasks are being executed, apart from seeing the logs you can access the Luigi task visualizer in <code class="docutils literal"><span class="pre">localhost:8082</span></code>. In the next steps you will discover more about Luigi.</p>
......@@ -141,7 +137,7 @@ $ sudo docker/compose up
</pre></div>
</div>
<p>Finally, for running the tutorial execute the following line from your repository path.</p>
<div class="code bash highlight-default"><div class="highlight"><pre><span></span>$ docker-compose exec luigi python -m crontasks tutorial2
<div class="code bash highlight-default"><div class="highlight"><pre><span></span>$ sudo docker-compose run gsicrawler tutorial2
</pre></div>
</div>
<div class="line-block">
......@@ -183,7 +179,7 @@ $ sudo docker/compose up
</div>
<p>The Luigi pipeline has more complexity as now data has to be stored in Elastic Search and Fuseki. The code of the pipeline can also be found in <code class="docutils literal"><span class="pre">luigi/scrapers/tutorial3.py</span></code>, being the task execution workflow initiated by <code class="docutils literal"><span class="pre">PipelineTask</span></code>, which is in charge of calling its dependent tasks.</p>
<p>For executing this tutorial you should execute the following line:</p>
<div class="code bash highlight-default"><div class="highlight"><pre><span></span>$ docker-compose exec luigi python -m crontasks tutorial3
<div class="code bash highlight-default"><div class="highlight"><pre><span></span>$ sudo docker-compose run gsicrawler tutorial3
</pre></div>
</div>
<p>In order to access the stored data in Elastic Search, access <code class="docutils literal"><span class="pre">localhost:19200/tutorial/_search?pretty</span></code> from your web browser.</p>
......@@ -226,7 +222,9 @@ $ sudo docker/compose up
<span class="n">schema</span><span class="p">:</span><span class="n">thumbnailUrl</span> <span class="s2">&quot;http://i2.cdn.turner.com/cnnnext/dam/assets/171002123455-31-las-vegas-incident-1002-story-body.jpg&quot;</span> <span class="o">.</span>
</pre></div>
</div>
<p>For developing visual analysis tools, we suggest to build a dashboard following this <a class="reference external" href="http://sefarad.readthedocs.io/en/latest/dashboards-dev.html">documentation</a>.</p>
</div>
<div class="section" id="tutorial-iv-developing-your-first-dashboard">
<h2>Tutorial IV: Developing your first dashboard<a class="headerlink" href="#tutorial-iv-developing-your-first-dashboard" title="Permalink to this headline"></a></h2>
</div>
</div>
......@@ -265,6 +263,7 @@ $ sudo docker/compose up
<li class="toctree-l2"><a class="reference internal" href="#tutorial-i-install">Tutorial I: Install</a></li>
<li class="toctree-l2"><a class="reference internal" href="#tutorial-ii-crawling-news">Tutorial II: Crawling news</a></li>
<li class="toctree-l2"><a class="reference internal" href="#tutorial-iii-semantic-enrichment-and-data-storage">Tutorial III: Semantic enrichment and data storage</a></li>
<li class="toctree-l2"><a class="reference internal" href="#tutorial-iv-developing-your-first-dashboard">Tutorial IV: Developing your first dashboard</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="architecture.html">Architecture</a></li>
......
......@@ -6,7 +6,7 @@ GSI Crawler is an innovative and useful framework which aims to extract informat
.. image:: images/crawler1.png
:align: center
In this documentation we are going to introduce this framework, detailing the global architecture of the project and explaining each module functionality. Finally we will expose most a case study in order to better understand the system itself.
In this documentation we are going to introduce this framework, detailing the global architecture of the project and explaining each module functionality. Finally we will expose most a case study in order to better understand the system itself. A demo video about GSI Crawler is available `here <https://www.youtube.com/watch?v=x9jzGDZs5hY&feature=youtu.be>`_.
......
......@@ -28,8 +28,7 @@ First of all, you need to clone the repositories:
.. code:: bash
$ git clone https://lab.cluster.gsi.dit.upm.es/sefarad/gsicrawler.git
$ git clone https://lab.cluster.gsi.dit.upm.es/sefarad/dashboard-gsicrawler.git
$ git clone http://lab.cluster.gsi.dit.upm.es/sefarad/gsicrawler.git
Then, it is needed to set up the environment variables. For this task, first create a file named ``.env`` in the root directory of each project (gsicrawler and dashboard-gsicrawler). As you can see, `Twitter <https://developer.twitter.com/en/docs/basics/authentication/guides/access-tokens>`_ and `Meaningcloud <https://www.meaningcloud.com/developer/apis>`_ credentials are needed if you wish to use those services.
......@@ -43,23 +42,20 @@ Then, it is needed to set up the environment variables. For this task, first cre
ES_PORT=9200
ES_ENDPOINT_EXTERNAL=localhost:19200
FUSEKI_PASSWORD={YourFusekiPass}
FUSEKI_ENDPOINT_EXTERNAL=fuseki:3030
FUSEKI_ENDPOINT_EXTERNAL=localhost:13030
FUSEKI_ENDPOINT={YourFusekiEndPoint}
API_KEY_MEANING_CLOUD={YourMeaningCloudApiKey, get it on Meaningcloud}
FUSEKI_ENDPOINT_DASHBOARD={YourFusekiEndpoint, e.g. localhost:13030}
FUSEKI_ENDPOINT = fuseki
FUSEKI_PORT = 3030
Finally, in both repositories execute the following line:
Finally, execute the following lines:
.. code:: bash
$ cd gsicrawler
$ sudo docker-compose up
$ cd ../dashboard-gsicrawler
$ sudo docker/compose up
The information related to the initialization can be found in the console. If you wish to see how tasks are being executed, apart from seeing the logs you can access the Luigi task visualizer in ``localhost:8082``. In the next steps you will discover more about Luigi.
......@@ -133,7 +129,7 @@ Finally, for running the tutorial execute the following line from your repositor
.. code:: bash
$ docker-compose exec luigi python -m crontasks tutorial2
$ sudo docker-compose run gsicrawler tutorial2
|
......@@ -181,7 +177,7 @@ For executing this tutorial you should execute the following line:
.. code:: bash
$ docker-compose exec luigi python -m crontasks tutorial3
$ sudo docker-compose run gsicrawler tutorial3
In order to access the stored data in Elastic Search, access ``localhost:19200/tutorial/_search?pretty`` from your web browser.
......@@ -230,4 +226,7 @@ In the case of seeing it on Fuseki, the address would be ``localhost:13030/tutor
schema:search "\"isis\"" ;
schema:thumbnailUrl "http://i2.cdn.turner.com/cnnnext/dam/assets/171002123455-31-las-vegas-incident-1002-story-body.jpg" .
For developing visual analysis tools, we suggest to build a dashboard following this `documentation <http://sefarad.readthedocs.io/en/latest/dashboards-dev.html>`_.
Tutorial IV: Developing your first dashboard
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment