@@ -6,7 +6,7 @@ GSI Crawler is an innovative and useful framework which aims to extract informat
.. image:: images/crawler1.png
:align: center
In this documentation we are going to introduce this framework, detailing the global architecture of the project and explaining each module functionality. Finally we will expose most a case study in order to better understand the system itself.
In this documentation we are going to introduce this framework, detailing the global architecture of the project and explaining each module functionality. Finally we will expose most a case study in order to better understand the system itself. A demo video about GSI Crawler is available `here <https://www.youtube.com/watch?v=x9jzGDZs5hY&feature=youtu.be>`_.
Then, it is needed to set up the environment variables. For this task, first create a file named ``.env`` in the root directory of each project (gsicrawler and dashboard-gsicrawler). As you can see, `Twitter <https://developer.twitter.com/en/docs/basics/authentication/guides/access-tokens>`_ and `Meaningcloud <https://www.meaningcloud.com/developer/apis>`_ credentials are needed if you wish to use those services.
...
...
@@ -43,23 +42,20 @@ Then, it is needed to set up the environment variables. For this task, first cre
ES_PORT=9200
ES_ENDPOINT_EXTERNAL=localhost:19200
FUSEKI_PASSWORD={YourFusekiPass}
FUSEKI_ENDPOINT_EXTERNAL=fuseki:3030
FUSEKI_ENDPOINT_EXTERNAL=localhost:13030
FUSEKI_ENDPOINT={YourFusekiEndPoint}
API_KEY_MEANING_CLOUD={YourMeaningCloudApiKey, get it on Meaningcloud}
FUSEKI_ENDPOINT_DASHBOARD={YourFusekiEndpoint, e.g. localhost:13030}
FUSEKI_ENDPOINT = fuseki
FUSEKI_PORT = 3030
Finally, in both repositories execute the following line:
Finally, execute the following lines:
.. code:: bash
$ cd gsicrawler
$ sudo docker-compose up
$ cd ../dashboard-gsicrawler
$ sudo docker/compose up
The information related to the initialization can be found in the console. If you wish to see how tasks are being executed, apart from seeing the logs you can access the Luigi task visualizer in ``localhost:8082``. In the next steps you will discover more about Luigi.
...
...
@@ -133,7 +129,7 @@ Finally, for running the tutorial execute the following line from your repositor
.. code:: bash
$ docker-compose exec luigi python -m crontasks tutorial2
$ sudo docker-compose run gsicrawler tutorial2
|
...
...
@@ -181,7 +177,7 @@ For executing this tutorial you should execute the following line:
.. code:: bash
$ docker-compose exec luigi python -m crontasks tutorial3
$ sudo docker-compose run gsicrawler tutorial3
In order to access the stored data in Elastic Search, access ``localhost:19200/tutorial/_search?pretty`` from your web browser.
...
...
@@ -230,4 +226,7 @@ In the case of seeing it on Fuseki, the address would be ``localhost:13030/tutor
For developing visual analysis tools, we suggest to build a dashboard following this `documentation <http://sefarad.readthedocs.io/en/latest/dashboards-dev.html>`_.
<h1>What is GSI Crawler?<aclass="headerlink"href="#what-is-gsi-crawler"title="Permalink to this headline">¶</a></h1>
<p>GSI Crawler is an innovative and useful framework which aims to extract information from web pages enriching following semantic approaches. At the moment, there are three available platforms: Twitter, Reddit and News. The user interacts with the tool through a web interface, selecting the analysis type he wants to carry out and the platform that is going to be examined.</p>
<p>In this documentation we are going to introduce this framework, detailing the global architecture of the project and explaining each module functionality. Finally we will expose most a case study in order to better understand the system itself.</p>
<p>In this documentation we are going to introduce this framework, detailing the global architecture of the project and explaining each module functionality. Finally we will expose most a case study in order to better understand the system itself. A demo video about GSI Crawler is available <aclass="reference external"href="https://www.youtube.com/watch?v=x9jzGDZs5hY&feature=youtu.be">here</a>.</p>
<liclass="toctree-l2"><aclass="reference internal"href="tutorials.html#tutorial-iii-semantic-enrichment-and-data-storage">Tutorial III: Semantic enrichment and data storage</a></li>
<liclass="toctree-l2"><aclass="reference internal"href="tutorials.html#tutorial-iv-developing-your-first-dashboard">Tutorial IV: Developing your first dashboard</a></li>
Search.setIndex({docnames:["architecture","gsicrawler","index","tutorials"],envversion:53,filenames:["architecture.rst","gsicrawler.rst","index.rst","tutorials.rst"],objects:{},objnames:{},objtypes:{},terms:{"000z":[],"02t21":3,"03t14":3,"04t18":3,"05t16":[],"26z":3,"29z":3,"30z":3,"36z":3,"55km":3,"case":[0,1,3],"class":[0,3],"default":0,"final":[1,3],"function":[0,1],"garc\u00eda":[],"import":[0,3],"jos\u00e9":[],"new":[0,1,2],"return":3,"s\u00e1nchez":[],"true":[],"try":[],Adding:[],For:[0,3],Its:[],Las:3,One:0,That:0,The:[0,1,3],Then:3,There:3,These:0,_id:3,_index:3,_score:3,_scrapi:3,_search:3,_shard:[],_sourc:3,_type:3,abl:[],about:3,abov:[0,3],absa:[],acces:3,access:3,access_token:[],access_token_secret:[],accommod:[],accomplish:[],accord:[0,3],aces:[],achiev:[],acquisit:0,across:[],activ:0,add:[],add_demo:[],add_tweet:[],added:3,adding:[],addit:[],additionali:[],addr:[],address:3,adit:[],administr:0,after:0,afterward:[],again:[],against:[],agil:0,aim:[0,1],alberto:[],alert:[],algo:[],algorithm:[],align:3,all:[0,3],alloc:[],allow:[0,3],along:[],also:[0,3],amado:[],amazon:[],ambush:3,amount:[],analys:[],analysi:[0,1,3],analysistyp:[],analyt:0,analyz:0,anew:[],ani:0,anoth:0,apart:[0,3],api:[0,3],api_key_meaning_cloud:3,aplic:[],app:2,app_nam:[],appear:3,append:[],appli:[],applic:0,approach:[0,1],araqu:[],architectur:[1,2],argument:0,arm:3,articl:[],articlebodi:3,ask:[],aspect:[],assad:3,asset:3,assign:[],associ:[],asyncron:[],attach:0,attr:[],attract:[],attribut:[],author:3,automat:0,aux:3,avail:[0,1,3],background:[],baghdadi:3,bar:[],base:[0,3],bashar:3,basic:[],becaus:0,been:0,befor:0,begin:[],behav:0,behavior:[],being:[0,3],belong:0,below:0,better:[0,1],between:[],big:[],bodi:3,book:[],both:[0,3],bower:[],bower_compon:[],brand:[],briefli:[],bring:[],browser:3,buffer:3,build:[0,3],call:[0,3],can:[0,3],capabl:0,card:[],carlo:[],carmona:[],carri:[0,1],categor:0,cdn:3,center:3,central:0,certain:0,certainti:0,chang:[],charg:3,chart:[],check:[],choos:[],chown:[],citi:3,clear:0,client:0,clone:3,cluster:3,cnn:3,cnnnext:3,code:3,collabor:0,collect:[0,3],com:3,comfort:0,command:[],comment:0,commonli:0,commun:[],complet:[],complex:3,compon:2,compos:3,compound:[],concaten:[],concept:0,concert:3,conclud:0,concret:[],cond:[],config:[],configur:[],conflict:3,connect:[],consequ:[],consider:[],consist:0,consol:3,consumer_kei:[],consumer_secret:[],contain:[0,3],containsemotionsanalysi:[],containssentimentsanalysi:[],content:[0,2,3],convent:[],copi:[],corcuera:[],core:0,correctli:[],could:[0,3],crawl:2,crawler:0,crawler_endpoint:[],crawler_endpoint_extern:[],crawlertask:3,creat:3,credenti:3,cron:0,crontask:3,crowd:[],css:[],current:[],custom:[],dai:[],dam:3,dashboard:[0,3],data:[0,2],databas:[],dataset:[],date:[],datemodifi:3,datepublish:3,datetim:0,dbpedia:[],decod:[],deeper:[],deepth:[],def:3,defin:0,demo:3,depend:3,depict:[],describ:0,descript:[],design:[],desir:0,destin:[],detail:[0,1,3],develop:3,diagram:0,dict:3,dictionari:[],did:3,difer:[],differ:[0,3],direct:[],directori:3,discov:[0,3],displai:0,distribut:0,dit:3,div:[],divid:0,do_some_funct:[],doc:0,doc_typ:[],docker:3,dockerfil:[],doctyp:[],document:[0,1,3],doesn:[],dom:[],domain:[],domest:3,don:[],donald:3,done:[],download:[],dsaa:[],due:[],dump:3,each:[0,1,3],easi:[],easili:[],edit:[],editor:[],effort:[],elast:[0,3],elasticdemo:[],elasticsearch:[0,3],element:3,elev:[],email:[],emilio:[],emoji:[],emot:0,enabl:[],encapsul:[],end:[],endpoint:[],engin:0,enhanc:[],enrich:[0,1,2],enriqu:[],enter:3,entiti:3,entri:[],env:3,env_fil:[],enviro:[],environ:[0,3],error:[],es_endpoint:3,es_endpoint_extern:3,es_port:3,essentiali:[],etc:[],europ:[],european:[],even:[],everi:[],everydai:0,examin:1,exampl:[0,3],exec:3,execut:3,exemplifi:[],exist:0,expect:0,explain:[0,1],exploit:[],explor:3,expos:1,express:[],extend:[],extens:0,extern:[],extra:[],extract:[0,1,3],facebook:[],facilit:0,fail:[],failur:[],fake:[],fals:[],fast:0,featur:[],fernando:[],fetch:0,fetchdatatask:[],field:3,figur:0,file:[0,3],filenam:[],filepath:3,filesystem:3,filter:0,financi:[],find:3,finish:3,fire:3,first:2,firstli:[],firstpublishd:3,fit:0,flask:[],flow:[],focu:[],focus:0,folder:[],follow:[0,1,3],footbal:[],footballmood:[],forc:3,form:0,format:3,found:3,four:[],foursquar:[],framework:[0,1],from:[0,1,3],fulful:0,fuseki:[0,3],fuseki_endpoint:3,fuseki_endpoint_dashboard:3,fuseki_endpoint_extern:3,fuseki_password:3,fuseki_port:3,gather:3,gener:[0,3],geoloc:3,get:2,git:3,github:[],gitlab:[],give:0,given:[],glanc:2,global:1,goal:[],goe:[],going:[0,1,3],googl:[],googleplac:[],graphic:[],group:0,grow:0,gsi2017fuseki:[],gsi:0,gsicrawl:3,handl:[],has:[0,3],hashtag:[],haspolar:[],have:[0,3],hawija:3,headlin:3,heart:0,hello:[],help:[],here:3,high:0,highli:[],hit:[],homepag:[],host:[],hostil:3,how:[0,3],href:[],html:[0,3],http:[0,3],icon:[],identif:3,iglesia:[],ignacio:[],iii:2,illustr:[],imag:[],imagin:0,implement:[],incid:3,includ:0,incom:0,incurs:3,independ:0,index:[0,3],inferfac:[],infil:[],inform:[0,1,3],ingest:0,initi:3,innov:1,input:0,insid:0,instal:2,instanc:0,instruct:3,intellig:[],interact:[0,1],interest:[],interfac:[0,1],intern:0,introduc:1,involv:0,iraq:3,iraqi:3,iron:[],isi:3,item:[],its:[0,3],itself:1,javascript:[],job:[],jpg:3,json:[0,3],just:[],keyboard:[],kill:3,know:[],known:0,lab:3,las:3,lastli:0,lastmodifiedd:3,later:0,latest:3,layer:[],less:[],let:[],level:0,librari:[0,3],licens:[],like:[],line:3,link:3,list:[],load:[],local:3,localhost:3,localtarget:3,locat:[],log:3,logo:[],look:[],luigi:[0,3],luigi_auto_en:[],luigi_endpoint:[],luigi_endpoint_extern:[],made:3,mai:[],mail:[],main:0,mainli:[],make:[],manag:0,mandatori:0,manuel:[],marketplac:[],marl:[],mass:3,materi:[],max_scor:[],mean:[],meaningcloud:3,mechan:[],media:[],method:0,middleeast:3,militari:3,mine:0,miss:[],mit:[],modern:[],modifi:[],modul:[1,2],modular:0,moment:[0,1],monitor:[],more:[0,3],most:[0,1],move:[],msdkfmsdflsdml:[],multipl:0,mum:3,murder:3,must:[],my_compon:[],my_dashboard_rout:[],mydashboard:[],myfootballtweet:[],myweb:[],myweb_compon:[],name:[0,3],necessari:0,need:3,network:[],newdashboard:[],newsarticl:3,newsitem:3,next:3,niger:3,node:[],node_modul:[],node_path:[],notic:[],notif:[],notifi:[],now:3,num:3,number:0,object:[],observ:[],obtain:[0,3],obtent:[],offer:[0,3],onc:0,one:[0,3],ones:[],onli:3,onlin:[],ontolog:[],onyx:[],open:[0,3],openn:[],oper:[0,3],opinion:[],option:[],orchest:0,orchestr:[],order:[0,1,3],org:[],organ:0,organis:[],origin:0,oscar:[],other:0,our:3,out:[0,1],outfil:3,output:3,overview:2,own:[],packag:[],page:[1,3],paper:[],paradigm:0,paramet:[0,3],pars:0,part:[],parti:[],particip:[],partit:[],pascual:[],path:3,peopl:3,perform:[],period:0,permiss:[],persist:[],pertin:[],petit:0,piec:3,pipelin:[0,3],pipelinetask:3,place:[],plata:[],platform:[0,1],player:[],pleas:[0,3],point:0,polar:[],polarityvalu:[],polit:3,polym:2,polymerel:[],pop:[],popul:[],port:[],possibl:[0,3],power:0,pragmat:[],pre:[],prebuilt:[],present:[],presid:3,pretti:3,previou:0,previous:0,print:3,proceed:[],process:[0,3],produc:[],product:[],profund:[],program:3,progress:[],project:[1,3],properli:[],properti:[],provid:0,put:[],python3:[],python:3,querei:[],queri:0,queu:[],queue:[],quickest:3,rada:[],ran:[],rdf:[],read:[],readi:[],recaptur:3,receiv:0,recogn:0,recognit:[],recommend:[],reddit:[0,1],redirect:[],refer:0,regim:3,regist:[],rel:[],relat:3,relev:[],reload:[],rememb:[],remot:0,repositori:3,repres:0,represent:0,request:[0,3],requir:[0,3],resolut:[],resourc:[],respect:[],respons:[0,3],response_json:[],rest:0,restart:[],restaur:[],result:[0,3],retriev:0,retrievecnnnew:3,retrievenytimesnew:[],reusabl:[],review:[],root:3,rout:[],rtype:[],run:[0,3],russia:3,saavedra:[],sai:3,same:0,save:0,scalabl:[],scenario:[],schedul:[],schema:3,scheme:[],score:[],scrap:[],scrape:[0,3],scraper:[0,3],scrapi:[0,3],scrapytask:[],scratch:[],script:[],seach:[],search:[0,3],search_queri:[],second:3,section:0,see:3,sefarad:[0,3],sefarad_demo:[],select:[0,1],selector:[],self:3,semant:[0,1,2],semev:[],send:[],senpi:[0,3],senpytask:[],sent:0,sentenc:0,sentiment:[0,3],sentisdata:[],sequenc:0,server:2,servic:[0,3],set:[0,3],sever:0,shoot:3,should:3,show:[0,3],shown:0,side:[],similar:[],simpl:0,simpli:0,size:3,smtp:[],smtp_host:[],smtp_port:[],social:[],solv:0,some:3,sophist:[],sourc:[0,3],sourcer:[],spain:[],span:[],sparql:[],spec:[],special:[],specif:[],specifi:[],spider:0,stack:0,standard:[],start:2,static_fil:[],step:[0,3],storag:[0,2],store:[0,3],stori:3,str:3,strign:0,string:[],structur:[0,3],studi:1,style:[],sub:0,subindex:[],submodul:0,success:3,successfulli:[],sudo:3,suggest:3,summari:[],suppli:[],support:[],surround:3,syria:3,syrian:3,system:[0,1],tag:0,take:0,talk:[],tanf:3,target:3,task:[2,3],tass:[],team:[],techniqu:[],technolog:[],tediou:[],templat:[],terror:3,test:[],text:0,thank:0,thei:3,them:[],thi:[0,1,3],third:0,those:3,three:[0,1],through:1,thumbnail:3,thumbnailurl:3,time:0,timed_out:[],timestamp:[],titl:[],tmp:3,todai:[],took:[],tool:[0,1,3],top:[],topic:3,total:[],tourpedia:[],track:[],tracker:[],treat:[],trend:[],trigger:0,triplet:[],troop:3,trump:3,tuesdai:3,turn:[],turner:3,tutori:2,tutorial2:3,tutorial3:3,tutorialtask:3,tweet:0,twitter:[0,1,3],twitter_access_token:3,twitter_access_token_secret:3,twitter_consumer_kei:3,twitter_consumer_secret:3,two:0,type:[0,1,3],ubuntu:3,unanalys:[],uncov:0,understand:[0,1],understood:[],unexpect:0,updat:[],upload:[],upm:3,uri:[],url:[0,3],usb:[],use:[0,3],used:0,useful:1,user:[0,1],user_loc:[],uses:0,using:0,utf:[],valid:0,valu:3,valuabl:0,variabl:3,vega:3,via:[],view:0,visit:[0,3],visual:[0,3],visualis:[],wai:[0,3],want:1,web:[1,2,3],webpag:0,websit:0,wednesdai:3,well:0,were:[],what:[0,2],whatev:[],when:[0,3],where:0,whether:3,which:[0,1,3],whose:0,widget:[],wikipedia:[],wire:[],wish:3,within:0,work:0,workflow:[0,3],would:[0,3],write:3,wsgi:[],www:3,xxxx:[],yet:0,yml:[],you:3,your:3,youraccesstoken:3,youraccesstokensecret:3,yourconsumerkei:3,yourconsumersecret:3,yourfusekiendpoint:3,yourfusekiendpointextern:[],yourfusekipass:3,yourmeaningcloudapikei:3,zone:3},titles:["Architecture","What is GSI Crawler?","Welcome to GSI Crawler\u2019s documentation!","Getting started"],titleterms:{"import":[],"new":3,Adding:[],about:[],analys:[],app:0,architectur:0,aspect:[],avail:[],collect:[],compon:0,compos:[],configur:[],crawl:3,crawler:[1,2,3],cron:[],custom:[],dashboard:[],data:3,dataset:[],dbpedia:[],demo:[],develop:[],docker:[],document:2,elast:[],elasticsearch:[],enrich:3,extra:[],file:[],financi:[],first:3,footballmood:[],get:3,glanc:3,gsi:[1,2,3],gsicrawl:[],guid:[],iii:3,indic:[],instal:3,json:[],knowledg:[],librari:[],load:[],luigi:[],modul:0,overview:0,own:[],pipelin:[],polym:0,previou:[],quick:[],refer:[],run:[],seach:[],semant:3,senpi:[],server:0,servic:[],sourc:[],sparql:[],start:3,storag:3,store:[],tabl:[],task:0,tourpedia:[],tracker:[],tutori:3,tweet:[],twitter:[],updat:[],visualis:[],web:0,welcom:2,what:1,widget:[],your:[]}})
\ No newline at end of file
Search.setIndex({docnames:["architecture","gsicrawler","index","tutorials"],envversion:53,filenames:["architecture.rst","gsicrawler.rst","index.rst","tutorials.rst"],objects:{},objnames:{},objtypes:{},terms:{"000z":[],"02t21":3,"03t14":3,"04t18":3,"05t16":[],"26z":3,"29z":3,"30z":3,"36z":3,"55km":3,"case":[0,1,3],"class":[0,3],"default":0,"final":[1,3],"function":[0,1],"garc\u00eda":[],"import":[0,3],"jos\u00e9":[],"new":[0,1,2],"return":3,"s\u00e1nchez":[],"true":[],"try":[],Adding:[],For:[0,3],Its:[],Las:3,One:0,That:0,The:[0,1,3],Then:3,There:3,These:0,_id:3,_index:3,_score:3,_scrapi:3,_search:3,_shard:[],_sourc:3,_type:3,abl:[],about:[1,3],abov:[0,3],absa:[],acces:3,access:3,access_token:[],access_token_secret:[],accommod:[],accomplish:[],accord:[0,3],aces:[],achiev:[],acquisit:0,across:[],activ:0,add:[],add_demo:[],add_tweet:[],added:3,adding:[],addit:[],additionali:[],addr:[],address:3,adit:[],administr:0,after:0,afterward:[],again:[],against:[],agil:0,aim:[0,1],alberto:[],alert:[],algo:[],algorithm:[],align:3,all:[0,3],alloc:[],allow:[0,3],along:[],also:[0,3],amado:[],amazon:[],ambush:3,amount:[],analys:[],analysi:[0,1],analysistyp:[],analyt:0,analyz:0,anew:[],ani:0,anoth:0,apart:[0,3],api:[0,3],api_key_meaning_cloud:3,aplic:[],app:2,app_nam:[],appear:3,append:[],appli:[],applic:0,approach:[0,1],araqu:[],architectur:[1,2],argument:0,arm:3,articl:[],articlebodi:3,ask:[],aspect:[],assad:3,asset:3,assign:[],associ:[],asyncron:[],attach:0,attr:[],attract:[],attribut:[],author:3,automat:0,aux:3,avail:[0,1,3],background:[],baghdadi:3,bar:[],base:[0,3],bashar:3,basic:[],becaus:0,been:0,befor:0,begin:[],behav:0,behavior:[],being:[0,3],belong:0,below:0,better:[0,1],between:[],big:[],bodi:3,book:[],both:[0,3],bower:[],bower_compon:[],brand:[],briefli:[],bring:[],browser:3,buffer:3,build:[0,3],call:[0,3],can:[0,3],capabl:0,card:[],carlo:[],carmona:[],carri:[0,1],categor:0,cdn:3,center:3,central:0,certain:0,certainti:0,chang:[],charg:3,chart:[],check:[],choos:[],chown:[],citi:3,clear:0,client:0,clone:3,cluster:3,cnn:3,cnnnext:3,code:3,collabor:0,collect:[0,3],com:3,comfort:0,command:[],comment:0,commonli:0,commun:[],complet:[],complex:3,compon:2,compos:3,compound:[],concaten:[],concept:0,concert:3,conclud:0,concret:[],cond:[],config:[],configur:[],conflict:3,connect:[],consequ:[],consider:[],consist:0,consol:3,consumer_kei:[],consumer_secret:[],contain:[0,3],containsemotionsanalysi:[],containssentimentsanalysi:[],content:[0,2,3],convent:[],copi:[],corcuera:[],core:0,correctli:[],could:[0,3],crawl:2,crawler:0,crawler_endpoint:[],crawler_endpoint_extern:[],crawlertask:3,creat:3,credenti:3,cron:0,crontask:[],crowd:[],css:[],current:[],custom:[],dai:[],dam:3,dashboard:[0,2],data:[0,2],databas:[],dataset:[],date:[],datemodifi:3,datepublish:3,datetim:0,dbpedia:[],decod:[],deeper:[],deepth:[],def:3,defin:0,demo:[1,3],depend:3,depict:[],describ:0,descript:[],design:[],desir:0,destin:[],detail:[0,1,3],develop:2,diagram:0,dict:3,dictionari:[],did:3,difer:[],differ:[0,3],direct:[],directori:3,discov:[0,3],displai:0,distribut:0,dit:3,div:[],divid:0,do_some_funct:[],doc:0,doc_typ:[],docker:3,dockerfil:[],doctyp:[],document:[0,1,3],doesn:[],dom:[],domain:[],domest:3,don:[],donald:3,done:[],download:[],dsaa:[],due:[],dump:3,each:[0,1,3],easi:[],easili:[],edit:[],editor:[],effort:[],elast:[0,3],elasticdemo:[],elasticsearch:[0,3],element:3,elev:[],email:[],emilio:[],emoji:[],emot:0,enabl:[],encapsul:[],end:[],endpoint:[],engin:0,enhanc:[],enrich:[0,1,2],enriqu:[],enter:3,entiti:3,entri:[],env:3,env_fil:[],enviro:[],environ:[0,3],error:[],es_endpoint:3,es_endpoint_extern:3,es_port:3,essentiali:[],etc:[],europ:[],european:[],even:[],everi:[],everydai:0,examin:1,exampl:[0,3],exec:[],execut:3,exemplifi:[],exist:0,expect:0,explain:[0,1],exploit:[],explor:3,expos:1,express:[],extend:[],extens:0,extern:[],extra:[],extract:[0,1,3],facebook:[],facilit:0,fail:[],failur:[],fake:[],fals:[],fast:0,featur:[],fernando:[],fetch:0,fetchdatatask:[],field:3,figur:0,file:[0,3],filenam:[],filepath:3,filesystem:3,filter:0,financi:[],find:3,finish:3,fire:3,first:2,firstli:[],firstpublishd:3,fit:0,flask:[],flow:[],focu:[],focus:0,folder:[],follow:[0,1,3],footbal:[],footballmood:[],forc:3,form:0,format:3,found:3,four:[],foursquar:[],framework:[0,1],from:[0,1,3],fulful:0,fuseki:[0,3],fuseki_endpoint:3,fuseki_endpoint_dashboard:[],fuseki_endpoint_extern:3,fuseki_password:3,fuseki_port:3,gather:3,gener:[0,3],geoloc:3,get:2,git:3,github:[],gitlab:[],give:0,given:[],glanc:2,global:1,goal:[],goe:[],going:[0,1,3],googl:[],googleplac:[],graphic:[],group:0,grow:0,gsi2017fuseki:[],gsi:0,gsicrawl:3,handl:[],has:[0,3],hashtag:[],haspolar:[],have:[0,3],hawija:3,headlin:3,heart:0,hello:[],help:[],here:[1,3],high:0,highli:[],hit:[],homepag:[],host:[],hostil:3,how:[0,3],href:[],html:[0,3],http:[0,3],icon:[],identif:3,iglesia:[],ignacio:[],iii:2,illustr:[],imag:[],imagin:0,implement:[],incid:3,includ:0,incom:0,incurs:3,independ:0,index:[0,3],inferfac:[],infil:[],inform:[0,1,3],ingest:0,initi:3,innov:1,input:0,insid:0,instal:2,instanc:0,instruct:3,intellig:[],interact:[0,1],interest:[],interfac:[0,1],intern:0,introduc:1,involv:0,iraq:3,iraqi:3,iron:[],isi:3,item:[],its:[0,3],itself:1,javascript:[],job:[],jpg:3,json:[0,3],just:[],keyboard:[],kill:3,know:[],known:0,lab:3,las:3,lastli:0,lastmodifiedd:3,later:0,latest:3,layer:[],less:[],let:[],level:0,librari:[0,3],licens:[],like:[],line:3,link:3,list:[],load:[],local:3,localhost:3,localtarget:3,locat:[],log:3,logo:[],look:[],luigi:[0,3],luigi_auto_en:[],luigi_endpoint:[],luigi_endpoint_extern:[],made:3,mai:[],mail:[],main:0,mainli:[],make:[],manag:0,mandatori:0,manuel:[],marketplac:[],marl:[],mass:3,materi:[],max_scor:[],mean:[],meaningcloud:3,mechan:[],media:[],method:0,middleeast:3,militari:3,mine:0,miss:[],mit:[],modern:[],modifi:[],modul:[1,2],modular:0,moment:[0,1],monitor:[],more:[0,3],most:[0,1],move:[],msdkfmsdflsdml:[],multipl:0,mum:3,murder:3,must:[],my_compon:[],my_dashboard_rout:[],mydashboard:[],myfootballtweet:[],myweb:[],myweb_compon:[],name:[0,3],necessari:0,need:3,network:[],newdashboard:[],newsarticl:3,newsitem:3,next:3,niger:3,node:[],node_modul:[],node_path:[],notic:[],notif:[],notifi:[],now:3,num:3,number:0,object:[],observ:[],obtain:[0,3],obtent:[],offer:[0,3],onc:0,one:[0,3],ones:[],onli:3,onlin:[],ontolog:[],onyx:[],open:[0,3],openn:[],oper:[0,3],opinion:[],option:[],orchest:0,orchestr:[],order:[0,1,3],org:[],organ:0,organis:[],origin:0,oscar:[],other:0,our:3,out:[0,1],outfil:3,output:3,overview:2,own:[],packag:[],page:[1,3],paper:[],paradigm:0,paramet:[0,3],pars:0,part:[],parti:[],particip:[],partit:[],pascual:[],path:3,peopl:3,perform:[],period:0,permiss:[],persist:[],pertin:[],petit:0,piec:3,pipelin:[0,3],pipelinetask:3,place:[],plata:[],platform:[0,1],player:[],pleas:[0,3],point:0,polar:[],polarityvalu:[],polit:3,polym:2,polymerel:[],pop:[],popul:[],port:[],possibl:[0,3],power:0,pragmat:[],pre:[],prebuilt:[],present:[],presid:3,pretti:3,previou:0,previous:0,print:3,proceed:[],process:[0,3],produc:[],product:[],profund:[],program:3,progress:[],project:[1,3],properli:[],properti:[],provid:0,put:[],python3:[],python:[],querei:[],queri:0,queu:[],queue:[],quickest:3,rada:[],ran:[],rdf:[],read:[],readi:[],recaptur:3,receiv:0,recogn:0,recognit:[],recommend:[],reddit:[0,1],redirect:[],refer:0,regim:3,regist:[],rel:[],relat:3,relev:[],reload:[],rememb:[],remot:0,repositori:3,repres:0,represent:0,request:[0,3],requir:[0,3],resolut:[],resourc:[],respect:[],respons:[0,3],response_json:[],rest:0,restart:[],restaur:[],result:[0,3],retriev:0,retrievecnnnew:3,retrievenytimesnew:[],reusabl:[],review:[],root:3,rout:[],rtype:[],run:[0,3],russia:3,saavedra:[],sai:3,same:0,save:0,scalabl:[],scenario:[],schedul:[],schema:3,scheme:[],score:[],scrap:[],scrape:[0,3],scraper:[0,3],scrapi:[0,3],scrapytask:[],scratch:[],script:[],seach:[],search:[0,3],search_queri:[],second:3,section:0,see:3,sefarad:[0,3],sefarad_demo:[],select:[0,1],selector:[],self:3,semant:[0,1,2],semev:[],send:[],senpi:[0,3],senpytask:[],sent:0,sentenc:0,sentiment:[0,3],sentisdata:[],sequenc:0,server:2,servic:[0,3],set:[0,3],sever:0,shoot:3,should:3,show:[0,3],shown:0,side:[],similar:[],simpl:0,simpli:0,size:3,smtp:[],smtp_host:[],smtp_port:[],social:[],solv:0,some:3,sophist:[],sourc:[0,3],sourcer:[],spain:[],span:[],sparql:[],spec:[],special:[],specif:[],specifi:[],spider:0,stack:0,standard:[],start:2,static_fil:[],step:[0,3],storag:[0,2],store:[0,3],stori:3,str:3,strign:0,string:[],structur:[0,3],studi:1,style:[],sub:0,subindex:[],submodul:0,success:3,successfulli:[],sudo:3,suggest:[],summari:[],suppli:[],support:[],surround:3,syria:3,syrian:3,system:[0,1],tag:0,take:0,talk:[],tanf:3,target:3,task:[2,3],tass:[],team:[],techniqu:[],technolog:[],tediou:[],templat:[],terror:3,test:[],text:0,thank:0,thei:3,them:[],thi:[0,1,3],third:0,those:3,three:[0,1],through:1,thumbnail:3,thumbnailurl:3,time:0,timed_out:[],timestamp:[],titl:[],tmp:3,todai:[],took:[],tool:[0,1,3],top:[],topic:3,total:[],tourpedia:[],track:[],tracker:[],treat:[],trend:[],trigger:0,triplet:[],troop:3,trump:3,tuesdai:3,turn:[],turner:3,tutori:2,tutorial2:3,tutorial3:3,tutorialtask:3,tweet:0,twitter:[0,1,3],twitter_access_token:3,twitter_access_token_secret:3,twitter_consumer_kei:3,twitter_consumer_secret:3,two:0,type:[0,1,3],ubuntu:3,unanalys:[],uncov:0,understand:[0,1],understood:[],unexpect:0,updat:[],upload:[],upm:3,uri:[],url:[0,3],usb:[],use:[0,3],used:0,useful:1,user:[0,1],user_loc:[],uses:0,using:0,utf:[],valid:0,valu:3,valuabl:0,variabl:3,vega:3,via:[],video:1,view:0,visit:[0,3],visual:[0,3],visualis:[],wai:[0,3],want:1,web:[1,2,3],webpag:0,websit:0,wednesdai:3,well:0,were:[],what:[0,2],whatev:[],when:[0,3],where:0,whether:3,which:[0,1,3],whose:0,widget:[],wikipedia:[],wire:[],wish:3,within:0,work:0,workflow:[0,3],would:[0,3],write:3,wsgi:[],www:3,xxxx:[],yet:0,yml:[],you:3,your:2,youraccesstoken:3,youraccesstokensecret:3,yourconsumerkei:3,yourconsumersecret:3,yourfusekiendpoint:3,yourfusekiendpointextern:[],yourfusekipass:3,yourmeaningcloudapikei:3,zone:3},titles:["Architecture","What is GSI Crawler?","Welcome to GSI Crawler\u2019s documentation!","Getting started"],titleterms:{"import":[],"new":3,Adding:[],about:[],analys:[],app:0,architectur:0,aspect:[],avail:[],collect:[],compon:0,compos:[],configur:[],crawl:3,crawler:[1,2,3],cron:[],custom:[],dashboard:3,data:3,dataset:[],dbpedia:[],demo:[],develop:3,docker:[],document:2,elast:[],elasticsearch:[],enrich:3,extra:[],file:[],financi:[],first:3,footballmood:[],get:3,glanc:3,gsi:[1,2,3],gsicrawl:[],guid:[],iii:3,indic:[],instal:3,json:[],knowledg:[],librari:[],load:[],luigi:[],modul:0,overview:0,own:[],pipelin:[],polym:0,previou:[],quick:[],refer:[],run:[],seach:[],semant:3,senpi:[],server:0,servic:[],sourc:[],sparql:[],start:3,storag:3,store:[],tabl:[],task:0,tourpedia:[],tracker:[],tutori:3,tweet:[],twitter:[],updat:[],visualis:[],web:0,welcom:2,what:1,widget:[],your:3}})
<p>For docker installation in Ubuntu, visit this <aclass="reference external"href="https://store.docker.com/editions/community/docker-ce-server-ubuntu?tab=description">link</a>.</p>
<p>Docker-compose installation detailed instructions are available <aclass="reference external"href="https://docs.docker.com/compose/install/">here</a>.</p>
<p>First of all, you need to clone the repositories:</p>
<p>Then, it is needed to set up the environment variables. For this task, first create a file named <codeclass="docutils literal"><spanclass="pre">.env</span></code> in the root directory of each project (gsicrawler and dashboard-gsicrawler). As you can see, <aclass="reference external"href="https://developer.twitter.com/en/docs/basics/authentication/guides/access-tokens">Twitter</a> and <aclass="reference external"href="https://www.meaningcloud.com/developer/apis">Meaningcloud</a> credentials are needed if you wish to use those services.</p>
<p>Finally, in both repositories execute the following line:</p>
<p>Finally, execute the following lines:</p>
<divclass="code bash highlight-default"><divclass="highlight"><pre><span></span>$ cd gsicrawler
$ sudo docker-compose up
$ cd ../dashboard-gsicrawler
$ sudo docker/compose up
</pre></div>
</div>
<p>The information related to the initialization can be found in the console. If you wish to see how tasks are being executed, apart from seeing the logs you can access the Luigi task visualizer in <codeclass="docutils literal"><spanclass="pre">localhost:8082</span></code>. In the next steps you will discover more about Luigi.</p>
...
...
@@ -141,7 +137,7 @@ $ sudo docker/compose up
</pre></div>
</div>
<p>Finally, for running the tutorial execute the following line from your repository path.</p>
<divclass="code bash highlight-default"><divclass="highlight"><pre><span></span>$ docker-compose exec luigi python -m crontasks tutorial2
<divclass="code bash highlight-default"><divclass="highlight"><pre><span></span>$ sudo docker-compose run gsicrawler tutorial2
</pre></div>
</div>
<divclass="line-block">
...
...
@@ -183,7 +179,7 @@ $ sudo docker/compose up
</div>
<p>The Luigi pipeline has more complexity as now data has to be stored in Elastic Search and Fuseki. The code of the pipeline can also be found in <codeclass="docutils literal"><spanclass="pre">luigi/scrapers/tutorial3.py</span></code>, being the task execution workflow initiated by <codeclass="docutils literal"><spanclass="pre">PipelineTask</span></code>, which is in charge of calling its dependent tasks.</p>
<p>For executing this tutorial you should execute the following line:</p>
<divclass="code bash highlight-default"><divclass="highlight"><pre><span></span>$ docker-compose exec luigi python -m crontasks tutorial3
<divclass="code bash highlight-default"><divclass="highlight"><pre><span></span>$ sudo docker-compose run gsicrawler tutorial3
</pre></div>
</div>
<p>In order to access the stored data in Elastic Search, access <codeclass="docutils literal"><spanclass="pre">localhost:19200/tutorial/_search?pretty</span></code> from your web browser.</p>
<p>For developing visual analysis tools, we suggest to build a dashboard following this <aclass="reference external"href="http://sefarad.readthedocs.io/en/latest/dashboards-dev.html">documentation</a>.</p>
<h2>Tutorial IV: Developing your first dashboard<aclass="headerlink"href="#tutorial-iv-developing-your-first-dashboard"title="Permalink to this headline">¶</a></h2>
<liclass="toctree-l2"><aclass="reference internal"href="#tutorial-iii-semantic-enrichment-and-data-storage">Tutorial III: Semantic enrichment and data storage</a></li>
<liclass="toctree-l2"><aclass="reference internal"href="#tutorial-iv-developing-your-first-dashboard">Tutorial IV: Developing your first dashboard</a></li>
@@ -6,7 +6,7 @@ GSI Crawler is an innovative and useful framework which aims to extract informat
.. image:: images/crawler1.png
:align: center
In this documentation we are going to introduce this framework, detailing the global architecture of the project and explaining each module functionality. Finally we will expose most a case study in order to better understand the system itself.
In this documentation we are going to introduce this framework, detailing the global architecture of the project and explaining each module functionality. Finally we will expose most a case study in order to better understand the system itself. A demo video about GSI Crawler is available `here <https://www.youtube.com/watch?v=x9jzGDZs5hY&feature=youtu.be>`_.
Then, it is needed to set up the environment variables. For this task, first create a file named ``.env`` in the root directory of each project (gsicrawler and dashboard-gsicrawler). As you can see, `Twitter <https://developer.twitter.com/en/docs/basics/authentication/guides/access-tokens>`_ and `Meaningcloud <https://www.meaningcloud.com/developer/apis>`_ credentials are needed if you wish to use those services.
...
...
@@ -43,23 +42,20 @@ Then, it is needed to set up the environment variables. For this task, first cre
ES_PORT=9200
ES_ENDPOINT_EXTERNAL=localhost:19200
FUSEKI_PASSWORD={YourFusekiPass}
FUSEKI_ENDPOINT_EXTERNAL=fuseki:3030
FUSEKI_ENDPOINT_EXTERNAL=localhost:13030
FUSEKI_ENDPOINT={YourFusekiEndPoint}
API_KEY_MEANING_CLOUD={YourMeaningCloudApiKey, get it on Meaningcloud}
FUSEKI_ENDPOINT_DASHBOARD={YourFusekiEndpoint, e.g. localhost:13030}
FUSEKI_ENDPOINT = fuseki
FUSEKI_PORT = 3030
Finally, in both repositories execute the following line:
Finally, execute the following lines:
.. code:: bash
$ cd gsicrawler
$ sudo docker-compose up
$ cd ../dashboard-gsicrawler
$ sudo docker/compose up
The information related to the initialization can be found in the console. If you wish to see how tasks are being executed, apart from seeing the logs you can access the Luigi task visualizer in ``localhost:8082``. In the next steps you will discover more about Luigi.
...
...
@@ -133,7 +129,7 @@ Finally, for running the tutorial execute the following line from your repositor
.. code:: bash
$ docker-compose exec luigi python -m crontasks tutorial2
$ sudo docker-compose run gsicrawler tutorial2
|
...
...
@@ -181,7 +177,7 @@ For executing this tutorial you should execute the following line:
.. code:: bash
$ docker-compose exec luigi python -m crontasks tutorial3
$ sudo docker-compose run gsicrawler tutorial3
In order to access the stored data in Elastic Search, access ``localhost:19200/tutorial/_search?pretty`` from your web browser.
...
...
@@ -230,4 +226,7 @@ In the case of seeing it on Fuseki, the address would be ``localhost:13030/tutor
For developing visual analysis tools, we suggest to build a dashboard following this `documentation <http://sefarad.readthedocs.io/en/latest/dashboards-dev.html>`_.