Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Lanman to simple-search #18

Open
funderburkjim opened this issue Nov 29, 2020 · 12 comments
Open

Add Lanman to simple-search #18

funderburkjim opened this issue Nov 29, 2020 · 12 comments

Comments

@funderburkjim
Copy link
Contributor

Two additional steps required to make Lanman dictionary available in simple-search:

For example, now this simple-search url works:

https://www.sanskrit-lexicon.uni-koeln.de/simple/lan/shiva

@gasyoun
Copy link
Member

gasyoun commented Nov 29, 2020

https://www.sanskrit-lexicon.uni-koeln.de/simple/lan/shiva

Hurray. But how one can know that such nice URLs exist from homepage?

@funderburkjim
Copy link
Contributor Author

funderburkjim commented Dec 1, 2020

Obviously, they can't know that.

I still don't think the simple search is quite ready for full disclosure.

For instance, I ran into some problems such as

I'm dubious about wide usage of the simple link with such problems still unsolved.
And I currently think they are hard problems, especially the 'long word' problem.
Probably the Devanagari/IAST problems are not too hard.

And there are still a couple of problems you have pointed out (such as use of capital letters).

@gasyoun
Copy link
Member

gasyoun commented Dec 2, 2020

I still don't think the simple search is quite ready for full disclosure.

Let's fill the gap, so it can be finally done after three years of development.

%E0%A4%AD%E0%A4%A6%E0%A5%8D%E0%A4%B0

So it's a code issue, not Unicode. Same with r%C4%81ma

In this example, 'working' never finishes, or you get 'error Internal Server Error'
which is probably due to a timeout.

This one is harder - any clue?

Probably the Devanagari/IAST problems are not too hard.

Exactly, I'm even eager to hire a developer to solve them, because simple is really important for me.

@funderburkjim
Copy link
Contributor Author

funderburkjim commented Dec 3, 2020

Use of Devanagari (or IAST) in url now works properly.

The need was to do a php 'uridecode' on the thing with '%' in the encoding.

Examples:

@gasyoun
Copy link
Member

gasyoun commented Dec 4, 2020

Use of Devanagari (or IAST) in url now works properly.

Perfect.
Can I ask again for

https://www.sanskrit-lexicon.uni-koeln.de/s/lan/bhagnāśa

instead of:

https://www.sanskrit-lexicon.uni-koeln.de/simple/lan/bhagnāśa

@funderburkjim
Copy link
Contributor Author

I prefer to keep 'simple' only.

  • 'simple' is intuitive -- 's' is not intuitive
  • to make 's' an additional alternative to 'simple' would require
    • adding a line to .htaccess
    • modify php parsing of url which currently happens in
      • list-0.2s_rw.php for cologne
      • list-0.2s_xampp_rw.php for local installations
    • code would need to be refactored to avoid duplication, and
      then modified in regard to the url parsing itself.

Don't want to do this now. You can make a separate 'enhancement' issue request if this detail is important to you.

@gasyoun
Copy link
Member

gasyoun commented Dec 4, 2020

'simple' is intuitive -- 's' is not intuitive

It's 5 letters shorter. Longer URLs break.

You can make a separate 'enhancement' issue request if this detail is important to you.

It is. Because it could become the default way of quoting Cologne URLs.

@funderburkjim
Copy link
Contributor Author

Revised simple-search algorithm. It is now much quicker, although I think it always will provide the same answers.

In particular, the 'long' word example now is quite speedy:

https://www.sanskrit-lexicon.uni-koeln.de/simple/lan/pratyakzadarSana

@gasyoun
Copy link
Member

gasyoun commented Dec 5, 2020

In particular, the 'long' word example now is quite speedy

So now we can make the simple URLs public?

@gasyoun
Copy link
Member

gasyoun commented Dec 9, 2020

cf

In book there is a white tab before, guess we can replicate that, @funderburkjim

@funderburkjim
Copy link
Contributor Author

funderburkjim commented Dec 10, 2020

Given the current markup of lan.txt, this would be fairly difficult.

Currently, the things that look like paragraphs are, in lan.txt, preceded by an empty div: <div n="1"/>.

So consecutive paragraphs look like

<div n="1"/>blah1 blah1 blah1
<div  n="2"/>blah2 blah2 blah2
<div n="1"/>blah3 blah3 blah3

We would have to change these to

<div n="1">blah1 blah1 blah1</div>
<div  n="2">blah2 blah2 blah2</div>
<div n="1">blah3 blah3 blah3</div>

and then we could add css text-indent for these divs

div {
  text-indent: 50px;
}

The hard part is closing the divs. The example above is over-simplified, as the 'blah...' part is more complicated.

Current opinion: Could be done along the lines just mentioned, but not worth the trouble.

@gasyoun
Copy link
Member

gasyoun commented Dec 11, 2020

Current opinion: Could be done along the lines just mentioned, but not worth the trouble.

Agree, thanks for the detailed layout analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants