Skip to content
GitLab
Projects
Groups
Snippets
Help
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Sign in
Toggle navigation
senpy
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Locked Files
Issues
6
Issues
6
List
Boards
Labels
Service Desk
Milestones
Iterations
Merge Requests
1
Merge Requests
1
Requirements
Requirements
List
Security & Compliance
Security & Compliance
Dependency List
License Compliance
Operations
Operations
Incidents
Packages & Registries
Packages & Registries
Container Registry
Analytics
Analytics
Code Review
Insights
Issue
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
senpy
senpy
Commits
aa35e62a
Commit
aa35e62a
authored
Aug 20, 2018
by
J. Fernando Sánchez
1
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Avoid duplication in split plugin
parent
6dd4a449
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
8 additions
and
5 deletions
+8
-5
senpy/plugins/misc/split_plugin.py
senpy/plugins/misc/split_plugin.py
+8
-5
No files found.
senpy/plugins/misc/split_plugin.py
View file @
aa35e62a
...
...
@@ -9,7 +9,7 @@ class Split(AnalysisPlugin):
'''description: A sample plugin that chunks input text'''
author
=
[
"@militarpancho"
,
'@balkian'
]
version
=
'0.
2
'
version
=
'0.
3
'
url
=
"https://github.com/gsi-upm/senpy"
extra_params
=
{
...
...
@@ -33,12 +33,15 @@ class Split(AnalysisPlugin):
if
chunker_type
==
"paragraph"
:
tokenizer
=
LineTokenizer
()
chars
=
list
(
tokenizer
.
span_tokenize
(
original_text
))
for
i
,
chunk
in
enumerate
(
tokenizer
.
tokenize
(
original_text
)):
print
(
chunk
)
if
len
(
chars
)
==
1
:
# This sentence was already split
return
for
i
,
chunk
in
enumerate
(
chars
):
start
,
end
=
chunk
e
=
Entry
()
e
[
'nif:isString'
]
=
chunk
e
[
'nif:isString'
]
=
original_text
[
start
:
end
]
if
entry
.
id
:
e
.
id
=
entry
.
id
+
"#char={},{}"
.
format
(
chars
[
i
][
0
],
chars
[
i
][
1
]
)
e
.
id
=
entry
.
id
+
"#char={},{}"
.
format
(
start
,
end
)
yield
e
test_cases
=
[
...
...
J. Fernando Sánchez
@balkian
mentioned in commit
4ba30304
·
Dec 07, 2018
mentioned in commit
4ba30304
mentioned in commit 4ba30304a4f3b35613102d10800696389702d555
Toggle commit list
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment