How to implement Elasticsearch in Nextcloud

This did it for me, works like a charm:

yum install tesseract

Then pick you choice out of the many languages to be recognized and install them (English is installed by default): “yum search all tesseract”

and install the requered language(s) e.g. yum install tesseract-langpack-fra

Install the full text search OCR app in nextcloud, go to settings → search and set your installed languages (watch the language abbreviation e.g. eng,fra,deu) and enable OCR.

Just not sure the interval when elasticsearch re-indexes, maybe this is related to the cronjob?

Here is a test image to upload and test https://courses.cs.vt.edu/csonline/AI/Lessons/VisualProcessing/OCRscans_files/bowers.jpg

2 Likes

Just as a tip, the RedHat way syntax would be:

sudo -u apache scl enable php71 ‘php -d memory_limit=512M /usr/share/nextcloud/occ fulltextsearch:index’

scl enable php71 ensures that all php71 related environment variables are being used.

1 Like

Strangely, the yum search all comes up with many language packs, but none mention English. Even did a grep in case I was inadvertently looking past it. ?? OK, there is -enm for Middle English, but I’m not going to be working with Chaucer.

My bad example.

“THE ENGLISH LANGUAGE, DATAFILES ARE SUPPLIED IN THE STANDARD PACKAGE.”

Adjusted my post above. Sorry

That’s probably something I should have found in the docs myself–thanks!

I do not have the remi-phpscl module installed my directory is /opt/rh/rh-php71 and not /opt/rh/rh-php56/ do i need to install the remi-phpscl module to run the index command correctly?

The remi set comes from the php-scl module that can be installed seperately. Nextcloud takes care of a different scl version of php. Yes, that is confusing.

Use this command instead:

How is the experience so far? I am impressed with the results.

Hello I had to reinstall Elasticsearch now when I run sudo -u apache /opt/rh/rh-php71/root/usr/bin/php -d memory_limit=512M /usr/share/nextcloud/occ fulltextsearch:index I get the following error.

In IndexService.php line 149:

Check your user/password and the index assigned to that cloud

Did you try to reconfigure the Nextcloud fulltext app in the Nextcloud admin settings?

You may try to reset the index and start over.

sudo -u apache scl enable php71 'php -d memory_limit=512M /usr/share/nextcloud/occ fulltextsearch:reset'
2 Likes

I have the same error.
When I try to run fulltextsearch:reset it gives me
Check your user/password and the index assigned to that cloud

Welcome to Nethserver Community.

I couldn’t reproduce the issue. Did you reinstall elasticsearch?

Did you setup the app like this?

Did you add network.host: 127.0.0.1 to /etc/elasticsearch/elasticsearch.yml as described here ?

Did you use

sudo -u apache /opt/rh/rh-php72/root/usr/bin/php -d memory_limit=512M /usr/share/nextcloud/occ fulltextsearch:reset

for resetting the index?

could you please answer to @mrkbkr and provide us some details?

Still getting the error PHP Fatal error: Allowed memory size of 536870912 bytes exhausted (tried to allocate 57555268 bytes) in /usr/share/nextcloud/apps/files_fulltextsearch/lib/Service/FilesService.php on line 964" when trying to index files using nextcloud and elastic search I have read here https://help.nextcloud.com/t/allowed-memory-size-of-xxx-bytes-exhausted/39371/10 /etc/php/cli/php.ini should contain -1 for memory limit, how do I find this file in a Nethserver installation and how do I change the the setting.

On the current php version used by NethServer for Nextcloud, the global php.ini file would be /etc/opt/rh/rh-php73/php.ini, where memory_limit defaults to 128M.
But the PHP error says it exhausted 512MiB. I guess those 512MiB are the ones set in a custom rule at /etc/opt/rh/rh-php73/php-fpm.d/000-nextcloud.conf:

; PHP settings
php_admin_value[memory_limit] = 512M

The latter file comes from nethserver-nextcloud package and can be overridden by updates.

There is no esmith db prop for it that I recall, so one option could be to drop a fragment file. Untested but think something along the lines of this shall work:

cat << 'EOF' >> /etc/opt/rh/rh-php73/php-fpm.d/000-nextcloud-custom.conf

; PHP settings (custom)
php_admin_value[memory_limit] = 768M
EOF

And maybe need to restart the corresponding php service:

systemctl restart rh-php73-php-fpm

@stephdl, is that right or shall the fragment start with the section header [nethserver-nextcloud] ?

3 Likes

Right

The file is set as a configuration file so you can edit and adjust the value or add others. During the upgrade this file will not be modified by the rpm one

1 Like

@mrmarkuz

Hi Markus

I installed Elasticsearch according to your HowTo above. After adapting to the newer PHP SCL paths, it worked.

Since the NethServer upgrade to 7.9.2009, the elasticsearch service won’t start.

Any ideas?

Strange, after a couple of restarts, it’s working again!

Found the issue:
Older version of ingest-attachment.

Solution:

sudo /usr/share/elasticsearch/bin/elasticsearch-plugin remove ingest-attachment
sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install ingest-attachment
systemctl restart elasticsearch
systemctl status elasticsearch

The - at the moment again “untested” Tesseract module needs to be updated and reactivated (sometimes need 2x activate pressed…)

Also: in NextCloud as admin you need to reactivate Elasticsearch under Fulltext-search…

Thanks
Andy

1 Like

I made the changes as suggested now I’m getting the following " Job for rh-php73-php-fpm.service failed because the control process exited with error code. See “systemctl status rh-php73-php-fpm.service” and “journalctl -xe” when I run the command “systemctl restart rh-php73-php-fpm”

As Stephane said there’s no need for a custom template. If you created the template file, it can be removed.
You can edit the file /etc/opt/rh/rh-php73/php-fpm.d/000-nextcloud.conf directly, tweaking the value of php_admin_value[memory_limit].

2 Likes

@mrmarkuz

Hi Markus

Since the update, I’ve noticed that Elasticsearch in Nextcloud doesn’t index the contents of eg. PDFs. Most PDFs are from Adobe Pro in this case, with OCR already “prepared”, not just containing a scanned image of text. There are those too, but a PDF containing text should be indexed…

A search for documents only shows results, if the search is part of the filename. No results are shown for contents… :frowning:

It doesn’t even index simple textfiles (.txt)…

NethServer and Nextcloud including all installed Apps are top up to date.

Any ideas?

Thanks
Andy