Commit 73681502 authored by Rodrigo Barbado Esteban's avatar Rodrigo Barbado Esteban
Browse files

changes in .env

parent 95614439
......@@ -24,6 +24,8 @@ The following figure describes the architecture from a modular point of view, be
.. image:: images/arch.png
:align: center
Tasks Server
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
......@@ -54,4 +56,4 @@ Once the Luigi orchestator has been explained, we will conclude this section det
Web App - Polymer Web Components
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
GSI Crawler framework uses a webpage based on Polymer web components to interact with all the functionalities offered by the tool. These Polymer Web Components are simply independent submodules that can be grouped each other to build the general dashboard interface.
GSI Crawler framework uses a webpage based on Polymer web components to interact with all the functionalities offered by the tool. These Polymer Web Components are simply independent submodules that can be grouped each other to build the general dashboard interface. For more information please visit `Sefarad documentation <http://sefarad.readthedocs.io/en/latest/widgets.html>`_ on web components.
......@@ -6,7 +6,8 @@ GSI Crawler is an innovative and useful framework which aims to extract informat
In this documentation we are going to introduce this framework, detailing the global architecture of the project and explaining each module functionality. Finally we will expose most a case study in order to better understand the system itself.
.. image:: images/crawler1.png
:align: center
......@@ -2,7 +2,13 @@ Getting started
---------------
First glance into GSI Crawler
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The quickest way of exploring the possibilities offered by GSI Crawler is accessing this `demo <https://docs.docker.com/compose/install/>`_. There you can find a dashboard to visualize data collected from different News sources and Twitter. Some examples of added value offered by this tool are topic and sentiment extraction, identification of people appearing on the scraped data and geolocation of sources.
The quickest way of exploring the possibilities offered by GSI Crawler is accessing this `demo <http://dashboard-gsicrawler.cluster.gsi.dit.upm.es//>`_. There you can find a dashboard to visualize data collected from different News sources and Twitter. Some examples of added value offered by this tool are topic and sentiment extraction, identification of people appearing on the scraped data and geolocation of sources.
.. image:: images/crawler2.png
:align: center
|
.. image:: images/map.jpg
:align: center
......@@ -25,25 +31,21 @@ First of all, you need to clone the repository:
$ git clone https://lab.cluster.gsi.dit.upm.es/sefarad/gsicrawler.git
$ cd gsicrawler
Then, it is needed to set up the environment variables. For this task, first create a file named ``.env`` in the root directory of the project.
Then, it is needed to set up the environment variables. For this task, first create a file named ``.env`` in the root directory of the project. As you can see, Twitter and MeaningCloud credentials are needed if you wish to use those services.
.. code::
LUIGI_ENDPOINT="gsicrawler-luigi"
LUIGI_ENDPOINT_EXTERNAL="gsicrawler-luigi.cluster.gsi.dit.upm.es"
CRAWLER_ENDPOINT="gsicrawler"
CRAWLER_ENDPOINT_EXTERNAL="gsicrawler.cluster.gsi.dit.upm.es"
TWITTER_CONSUMER_KEY={YourConsumerKey}
TWITTER_CONSUMER_SECRET={YourConsumerSecret}
TWITTER_ACCESS_TOKEN={YourAccessToken}
TWITTER_ACCESS_TOKEN_SECRET={YourAccessTokenSecret}
ES_ENDPOINT=elasticsearch
ES_PORT=9200
FUSEKI_PASSWORD=gsi2017fuseki
FUSEKI_PASSWORD={YourFusekiPass}
FUSEKI_ENDPOINT_EXTERNAL=fuseki:3030
FUSEKI_ENDPOINT="gsicrawler-fuseki"
FUSEKI_ENDPOINT={YourFusekiEndPoint}
API_KEY_MEANING_CLOUD={YourMeaningCloudApiKey}
FUSEKI_ENDPOINT_DASHBOARD={YourFusekiEndpoint}
FUSEKI_ENDPOINT_DASHBOARD={YourFusekiEndpoint, e.g. localhost:13030}
Once you have created the file, you should add a new attribute for the **luigi** service in the file called ``docker-compose.yml``, being ``.env`` its value.
......@@ -61,7 +63,9 @@ Finally, to run the image:
$ sudo docker-compose up
The information related to the initialization can be found in the console, and when the process finishes it is possible to access the Demo dashboard by accesing ``localhost:8080`` from your web browser.
The information related to the initialization can be found in the console. If you wish to see how tasks are being executed, apart from seeing the logs you can access the Luigi task visualizer in ``localhost:8082``. In the next steps you will discover more about Luigi.
When the process finishes it is possible to access the Demo dashboard by accesing ``localhost:8080`` from your web browser.
|
......@@ -75,7 +79,7 @@ We will only obtain the headline and url of each piece of news appearing on the
.. image:: images/cnnsearch.png
:align: center
|
The code of this example can be found in ``luigi/scrapers/tutorial2.py``:
......
......@@ -73,7 +73,7 @@
</div>
<div class="section" id="web-app-polymer-web-components">
<h3>Web App - Polymer Web Components<a class="headerlink" href="#web-app-polymer-web-components" title="Permalink to this headline"></a></h3>
<p>GSI Crawler framework uses a webpage based on Polymer web components to interact with all the functionalities offered by the tool. These Polymer Web Components are simply independent submodules that can be grouped each other to build the general dashboard interface.</p>
<p>GSI Crawler framework uses a webpage based on Polymer web components to interact with all the functionalities offered by the tool. These Polymer Web Components are simply independent submodules that can be grouped each other to build the general dashboard interface. For more information please visit <a class="reference external" href="http://sefarad.readthedocs.io/en/latest/widgets.html">Sefarad documentation</a> on web components.</p>
</div>
</div>
</div>
......
......@@ -44,6 +44,7 @@
<h1>What is GSI Crawler?<a class="headerlink" href="#what-is-gsi-crawler" title="Permalink to this headline"></a></h1>
<p>GSI Crawler is an innovative and useful framework which aims to extract information from web pages enriching following semantic approaches. At the moment, there are three available platforms: Twitter, Reddit and News. The user interacts with the tool through a web interface, selecting the analysis type he wants to carry out and the platform that is going to be examined.</p>
<p>In this documentation we are going to introduce this framework, detailing the global architecture of the project and explaining each module functionality. Finally we will expose most a case study in order to better understand the system itself.</p>
<img alt="_images/crawler1.png" class="align-center" src="_images/crawler1.png" />
</div>
......
This diff is collapsed.
......@@ -43,7 +43,11 @@
<h1>Getting started<a class="headerlink" href="#getting-started" title="Permalink to this headline"></a></h1>
<div class="section" id="first-glance-into-gsi-crawler">
<h2>First glance into GSI Crawler<a class="headerlink" href="#first-glance-into-gsi-crawler" title="Permalink to this headline"></a></h2>
<p>The quickest way of exploring the possibilities offered by GSI Crawler is accessing this <a class="reference external" href="https://docs.docker.com/compose/install/">demo</a>. There you can find a dashboard to visualize data collected from different News sources and Twitter. Some examples of added value offered by this tool are topic and sentiment extraction, identification of people appearing on the scraped data and geolocation of sources.</p>
<p>The quickest way of exploring the possibilities offered by GSI Crawler is accessing this <a class="reference external" href="http://dashboard-gsicrawler.cluster.gsi.dit.upm.es//">demo</a>. There you can find a dashboard to visualize data collected from different News sources and Twitter. Some examples of added value offered by this tool are topic and sentiment extraction, identification of people appearing on the scraped data and geolocation of sources.</p>
<img alt="_images/crawler2.png" class="align-center" src="_images/crawler2.png" />
<div class="line-block">
<div class="line"><br /></div>
</div>
<img alt="_images/map.jpg" class="align-center" src="_images/map.jpg" />
</div>
<div class="section" id="tutorial-i-install">
......@@ -56,22 +60,18 @@
$ cd gsicrawler
</pre></div>
</div>
<p>Then, it is needed to set up the environment variables. For this task, first create a file named <code class="docutils literal"><span class="pre">.env</span></code> in the root directory of the project.</p>
<div class="code highlight-default"><div class="highlight"><pre><span></span><span class="n">LUIGI_ENDPOINT</span><span class="o">=</span><span class="s2">&quot;gsicrawler-luigi&quot;</span>
<span class="n">LUIGI_ENDPOINT_EXTERNAL</span><span class="o">=</span><span class="s2">&quot;gsicrawler-luigi.cluster.gsi.dit.upm.es&quot;</span>
<span class="n">CRAWLER_ENDPOINT</span><span class="o">=</span><span class="s2">&quot;gsicrawler&quot;</span>
<span class="n">CRAWLER_ENDPOINT_EXTERNAL</span><span class="o">=</span><span class="s2">&quot;gsicrawler.cluster.gsi.dit.upm.es&quot;</span>
<span class="n">TWITTER_CONSUMER_KEY</span><span class="o">=</span><span class="p">{</span><span class="n">YourConsumerKey</span><span class="p">}</span>
<p>Then, it is needed to set up the environment variables. For this task, first create a file named <code class="docutils literal"><span class="pre">.env</span></code> in the root directory of the project. As you can see, Twitter and MeaningCloud credentials are needed if you wish to use those services.</p>
<div class="code highlight-default"><div class="highlight"><pre><span></span><span class="n">TWITTER_CONSUMER_KEY</span><span class="o">=</span><span class="p">{</span><span class="n">YourConsumerKey</span><span class="p">}</span>
<span class="n">TWITTER_CONSUMER_SECRET</span><span class="o">=</span><span class="p">{</span><span class="n">YourConsumerSecret</span><span class="p">}</span>
<span class="n">TWITTER_ACCESS_TOKEN</span><span class="o">=</span><span class="p">{</span><span class="n">YourAccessToken</span><span class="p">}</span>
<span class="n">TWITTER_ACCESS_TOKEN_SECRET</span><span class="o">=</span><span class="p">{</span><span class="n">YourAccessTokenSecret</span><span class="p">}</span>
<span class="n">ES_ENDPOINT</span><span class="o">=</span><span class="n">elasticsearch</span>
<span class="n">ES_PORT</span><span class="o">=</span><span class="mi">9200</span>
<span class="n">FUSEKI_PASSWORD</span><span class="o">=</span><span class="n">gsi2017fuseki</span>
<span class="n">FUSEKI_PASSWORD</span><span class="o">=</span><span class="p">{</span><span class="n">YourFusekiPass</span><span class="p">}</span>
<span class="n">FUSEKI_ENDPOINT_EXTERNAL</span><span class="o">=</span><span class="n">fuseki</span><span class="p">:</span><span class="mi">3030</span>
<span class="n">FUSEKI_ENDPOINT</span><span class="o">=</span><span class="s2">&quot;gsicrawler-fuseki&quot;</span>
<span class="n">FUSEKI_ENDPOINT</span><span class="o">=</span><span class="p">{</span><span class="n">YourFusekiEndPoint</span><span class="p">}</span>
<span class="n">API_KEY_MEANING_CLOUD</span><span class="o">=</span><span class="p">{</span><span class="n">YourMeaningCloudApiKey</span><span class="p">}</span>
<span class="n">FUSEKI_ENDPOINT_DASHBOARD</span><span class="o">=</span><span class="p">{</span><span class="n">YourFusekiEndpoint</span><span class="p">}</span>
<span class="n">FUSEKI_ENDPOINT_DASHBOARD</span><span class="o">=</span><span class="p">{</span><span class="n">YourFusekiEndpoint</span><span class="p">,</span> <span class="n">e</span><span class="o">.</span><span class="n">g</span><span class="o">.</span> <span class="n">localhost</span><span class="p">:</span><span class="mi">13030</span><span class="p">}</span>
</pre></div>
</div>
<p>Once you have created the file, you should add a new attribute for the <strong>luigi</strong> service in the file called <code class="docutils literal"><span class="pre">docker-compose.yml</span></code>, being <code class="docutils literal"><span class="pre">.env</span></code> its value.</p>
......@@ -83,7 +83,8 @@ $ cd gsicrawler
<div class="code bash highlight-default"><div class="highlight"><pre><span></span>$ sudo docker-compose up
</pre></div>
</div>
<p>The information related to the initialization can be found in the console, and when the process finishes it is possible to access the Demo dashboard by accesing <code class="docutils literal"><span class="pre">localhost:8080</span></code> from your web browser.</p>
<p>The information related to the initialization can be found in the console. If you wish to see how tasks are being executed, apart from seeing the logs you can access the Luigi task visualizer in <code class="docutils literal"><span class="pre">localhost:8082</span></code>. In the next steps you will discover more about Luigi.</p>
<p>When the process finishes it is possible to access the Demo dashboard by accesing <code class="docutils literal"><span class="pre">localhost:8080</span></code> from your web browser.</p>
<div class="line-block">
<div class="line"><br /></div>
</div>
......@@ -93,9 +94,6 @@ $ cd gsicrawler
<p>This second tutorial will show how to build a crawler to gather news from the CNN extracting data from the CNN News API, but in a general case we could use <a class="reference external" href="https://docs.scrapy.org/en/latest/">Scrapy</a> library, which allows to extract data from web pages.</p>
<p>We will only obtain the headline and url of each piece of news appearing on the CNN related to one topic, storing those fields into a JSON file.</p>
<img alt="_images/cnnsearch.png" class="align-center" src="_images/cnnsearch.png" />
<div class="line-block">
<div class="line"><br /></div>
</div>
<p>The code of this example can be found in <code class="docutils literal"><span class="pre">luigi/scrapers/tutorial2.py</span></code>:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">requests</span>
<span class="kn">import</span> <span class="nn">json</span>
......
......@@ -24,6 +24,8 @@ The following figure describes the architecture from a modular point of view, be
.. image:: images/arch.png
:align: center
Tasks Server
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
......@@ -54,4 +56,4 @@ Once the Luigi orchestator has been explained, we will conclude this section det
Web App - Polymer Web Components
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
GSI Crawler framework uses a webpage based on Polymer web components to interact with all the functionalities offered by the tool. These Polymer Web Components are simply independent submodules that can be grouped each other to build the general dashboard interface.
GSI Crawler framework uses a webpage based on Polymer web components to interact with all the functionalities offered by the tool. These Polymer Web Components are simply independent submodules that can be grouped each other to build the general dashboard interface. For more information please visit `Sefarad documentation <http://sefarad.readthedocs.io/en/latest/widgets.html>`_ on web components.
......@@ -6,7 +6,8 @@ GSI Crawler is an innovative and useful framework which aims to extract informat
In this documentation we are going to introduce this framework, detailing the global architecture of the project and explaining each module functionality. Finally we will expose most a case study in order to better understand the system itself.
.. image:: images/crawler1.png
:align: center
......@@ -2,7 +2,13 @@ Getting started
---------------
First glance into GSI Crawler
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The quickest way of exploring the possibilities offered by GSI Crawler is accessing this `demo <https://docs.docker.com/compose/install/>`_. There you can find a dashboard to visualize data collected from different News sources and Twitter. Some examples of added value offered by this tool are topic and sentiment extraction, identification of people appearing on the scraped data and geolocation of sources.
The quickest way of exploring the possibilities offered by GSI Crawler is accessing this `demo <http://dashboard-gsicrawler.cluster.gsi.dit.upm.es//>`_. There you can find a dashboard to visualize data collected from different News sources and Twitter. Some examples of added value offered by this tool are topic and sentiment extraction, identification of people appearing on the scraped data and geolocation of sources.
.. image:: images/crawler2.png
:align: center
|
.. image:: images/map.jpg
:align: center
......@@ -18,50 +24,44 @@ For docker installation in Ubuntu, visit this `link <https://store.docker.com/ed
Docker-compose installation detailed instructions are available `here <https://docs.docker.com/compose/install/>`_.
First of all, you need to clone the repository:
First of all, you need to clone the repositories:
.. code:: bash
$ git clone https://lab.cluster.gsi.dit.upm.es/sefarad/gsicrawler.git
$ git clone https://lab.cluster.gsi.dit.upm.es/sefarad/dashboard-gsicrawler.git
$ cd gsicrawler
Then, it is needed to set up the environment variables. For this task, first create a file named ``.env`` in the root directory of the project.
Then, it is needed to set up the environment variables. For this task, first create a file named ``.env`` in the root directory of each project (gsicrawler and dashboard-gsicrawler). As you can see, Twitter and MeaningCloud credentials are needed if you wish to use those services.
.. code::
LUIGI_ENDPOINT="gsicrawler-luigi"
LUIGI_ENDPOINT_EXTERNAL="gsicrawler-luigi.cluster.gsi.dit.upm.es"
CRAWLER_ENDPOINT="gsicrawler"
CRAWLER_ENDPOINT_EXTERNAL="gsicrawler.cluster.gsi.dit.upm.es"
TWITTER_CONSUMER_KEY={YourConsumerKey}
TWITTER_CONSUMER_SECRET={YourConsumerSecret}
TWITTER_ACCESS_TOKEN={YourAccessToken}
TWITTER_ACCESS_TOKEN_SECRET={YourAccessTokenSecret}
ES_ENDPOINT=elasticsearch
ES_PORT=9200
FUSEKI_PASSWORD=gsi2017fuseki
ES_ENDPOINT_EXTERNAL=localhost:19200
FUSEKI_PASSWORD={YourFusekiPass}
FUSEKI_ENDPOINT_EXTERNAL=fuseki:3030
FUSEKI_ENDPOINT="gsicrawler-fuseki"
FUSEKI_ENDPOINT={YourFusekiEndPoint}
API_KEY_MEANING_CLOUD={YourMeaningCloudApiKey}
FUSEKI_ENDPOINT_DASHBOARD={YourFusekiEndpoint}
Once you have created the file, you should add a new attribute for the **luigi** service in the file called ``docker-compose.yml``, being ``.env`` its value.
.. code::
env_file:
- .env
FUSEKI_ENDPOINT_DASHBOARD={YourFusekiEndpoint, e.g. localhost:13030}
FUSEKI_ENDPOINT = localhost
FUSEKI_PORT = 3030
Finally, to run the image:
Finally, in both repositories execute the following line:
.. code:: bash
$ sudo docker-compose up
The information related to the initialization can be found in the console, and when the process finishes it is possible to access the Demo dashboard by accesing ``localhost:8080`` from your web browser.
The information related to the initialization can be found in the console. If you wish to see how tasks are being executed, apart from seeing the logs you can access the Luigi task visualizer in ``localhost:8082``. In the next steps you will discover more about Luigi.
When the process finishes it is possible to access the Demo dashboard by accesing ``localhost:8080`` from your web browser.
|
......@@ -75,7 +75,7 @@ We will only obtain the headline and url of each piece of news appearing on the
.. image:: images/cnnsearch.png
:align: center
|
The code of this example can be found in ``luigi/scrapers/tutorial2.py``:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment