La recherche (searchv2/)

Module situé dans zds/searchv2/.

Modèles (models.py)

class zds.searchv2.models.AbstractESDjangoIndexable(*args, **kwargs)

Version of AbstractESIndexable for a Django object, with some improvements :

  • Already include pk in mapping ;

  • Match ES _id field and pk ;

  • Override es_already_indexed to a database field.

  • Define a es_flagged field to restrict the number of object to be indexed ;

  • Override save() to manage the field ;

  • Define a get_es_django_indexable() method that can be overridden to change the queryset to fetch object.

classmethod get_es_django_indexable(force_reindexing=False)

Method that can be overridden to filter django objects from database based on any criterion.

Paramètres:

force_reindexing (bool) – force to return all objects, even if they may be already indexed.

Renvoie:

query

Type renvoyé:

django.db.models.query.QuerySet

classmethod get_es_indexable(force_reindexing=False)

Override get_es_indexable() in order to use the Django querysets and batch objects.

Renvoie:

a queryset

Type renvoyé:

django.db.models.query.QuerySet

classmethod get_es_mapping()

Overridden to add pk into mapping.

Renvoie:

mapping object

Type renvoyé:

elasticsearch_dsl.Mapping

save(*args, **kwargs)

Override the save() method to flag the object if saved (which assumes a modification of the object, so the need to reindex).

Note

Flagging can be prevented using save(es_flagged=False).

class zds.searchv2.models.AbstractESIndexable

Mixin for indexable objects.

Define a number of different functions that can be overridden to tune the behavior of indexing into elasticsearch.

You (may) need to override :

  • get_indexable() ;

  • get_mapping() (not mandatory, but otherwise, ES will choose the mapping by itself) ;

  • get_document() (not mandatory, but may be useful if data differ from mapping or extra stuffs need to be done).

You also need to maintain es_id and es_already_indexed for bulk indexing/updating (if any).

get_es_document_as_bulk_action(index, action='index')

Create a document formatted for a _bulk operation. Formatting is done based on action.

See https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html.

Paramètres:
  • index (str) – index in witch the document will be inserted

  • action (str) – action, either « index », « update » or « delete »

Renvoie:

the document

Type renvoyé:

dict

get_es_document_source(excluded_fields=None)

Create a document from the variable of the class, based on the mapping.

Attention

You may need to override this method if the data differ from the mapping for some reason.

Paramètres:

excluded_fields (list) – exclude some field from the default method

Renvoie:

document

Type renvoyé:

dict

classmethod get_es_document_type()

value of the _type field in the index

classmethod get_es_indexable(force_reindexing=False)

Return objects to index.

Attention

You need to override this method (otherwise nothing will be indexed).

Paramètres:

force_reindexing (bool) – force to return all objects, even if they may already be indexed.

Type renvoyé:

list

classmethod get_es_mapping()

Setup mapping (data scheme).

Note

You will probably want to change the analyzer and boost value. Also consider the index='not_analyzed' option to improve performances.

See https://elasticsearch-dsl.readthedocs.io/en/latest/persistence.html#mappings

Attention

You may want to override this method (otherwise ES choose the mapping by itself).

Renvoie:

mapping object

Type renvoyé:

elasticsearch_dsl.Mapping

class zds.searchv2.models.ESIndexManager(name, shards=5, replicas=0, connection_alias='default')

Manage a given index with different taylor-made functions

analyze_sentence(request)

Use the anlyzer on a given sentence. Get back the list of tokens.

See http://www.elastic.co/guide/en/elasticsearch/reference/current/indices-analyze.html.

This is useful to perform « terms » queries instead of full-text queries.

Paramètres:

request (str) – a sentence from user input

Renvoie:

the tokens

Type renvoyé:

list

clear_es_index()

Clear index

clear_indexing_of_model(model)

Nullify the indexing of a given model by setting es_already_index=False to all objects.

Use full updating for AbstractESDjangoIndexable, instead of saving all of them.

Paramètres:

model (class) – the model

delete_by_query(doc_type='', query=MatchAll())

Perform a deletion trough the _delete_by_query API.

See https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html

Attention

Call to this function must be done with great care!

Paramètres:
  • doc_type (str) – the document type

  • query (elasticsearch_dsl.query.Query) – the query to match all document to be deleted

delete_document(document)

Delete a given document, based on its es_id

Paramètres:

document (AbstractESIndexable) – the document

es_bulk_indexing_of_model(model, force_reindexing=False)

Perform a bulk action on documents of a given model. Use the objects_per_batch property to index.

See http://elasticsearch-py.readthedocs.io/en/master/api.html#elasticsearch.Elasticsearch.bulk and http://elasticsearch-py.readthedocs.io/en/master/helpers.html#elasticsearch.helpers.parallel_bulk

Attention

  • Currently only implemented with « index » and « update » !

  • Currently only working with AbstractESDjangoIndexable.

Paramètres:
  • model (class) – and model

  • force_reindexing (bool) – force all document to be returned

Renvoie:

the number of documents indexed

Type renvoyé:

int

refresh_index()

Force the refreshing the index. The task is normally done periodically, but may be forced with this method.

See https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-refresh.html.

Note

The use of this function is mandatory if you want to use the search right after an indexing.

reset_es_index(models)

Delete old index and create an new one (with the same name). Setup the number of shards and replicas. Then, set mappings for the different models.

Paramètres:
  • models (list) – list of models

  • number_shards (int) – number of shards

  • number_replicas (int) – number of replicas

setup_custom_analyzer()

Override the default analyzer.

See https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis.html.

Our custom analyzer is based on the « french » analyzer (https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html#french-analyzer) but with some difference

  • « custom_tokenizer », to deal with punctuation and all kind of (non-breaking) spaces, but keep dashes and other stuffs intact (in order to keep « c++ » or « c# », for example).

  • « protect_c_language », a pattern replace filter to prevent « c » from being wiped out by the stopper.

  • « french_keywords », a keyword stopper prevent some programming language from being stemmed.

Avertissement

You need to run manage.py es_manager index_all if you modified this !!

Setup search to the good index

Paramètres:

request (elasticsearch_dsl.Search) – the search request

Renvoie:

formated search

Type renvoyé:

elasticsearch_dsl.Search

update_single_document(document, doc)

Update given fields of a single document.

See https://www.elastic.co/guide/en/elasticsearch/guide/current/partial-updates.html.

Paramètres:
exception zds.searchv2.models.NeedIndex

Raised when an action requires an index, but it is not created (yet).

zds.searchv2.models.delete_document_in_elasticsearch(instance)

Delete a ESDjangoIndexable from ES database. Must be implemented by all classes that derive from AbstractESDjangoIndexable.

Paramètres:

instance (AbstractESIndexable) – the document to delete

zds.searchv2.models.get_django_indexable_objects()

Return all indexable objects registered in Django

Vues (views.py)

class zds.searchv2.views.SearchView(**kwargs)

Search view.

get(request, *args, **kwargs)

Overridden to catch the request and fill the form.

get_context_data(**kwargs)

Get the context for this view. This method is surcharged to modify the paginator and information given at the template.

get_queryset()

Return the list of items for this view.

The return value must be an iterable and may be an instance of QuerySet in which case QuerySet specific behavior will be enabled.

get_queryset_chapters()

Search in content chapters.

get_queryset_posts()

Search in posts, and remove result if the forum is not allowed for the user or if the message is invisible.

Score is modified if:

  • post is the first one in a topic;

  • post is marked as « useful »;

  • post has a like/dislike ratio above (has more likes than dislikes) or below (the other way around) 1.0.

get_queryset_publishedcontents()

Search in PublishedContent objects.

get_queryset_topics()

Search in topics, and remove the result if the forum is not allowed for the user.

Score is modified if:

  • topic is solved;

  • topic is sticky;

  • topic is locked.

search_form_class

alias de SearchForm

class zds.searchv2.views.SimilarTopicsView(**kwargs)
get(request, *args, **kwargs)

Handle GET requests: instantiate a blank version of the form.

class zds.searchv2.views.SuggestionContentView(**kwargs)
get(request, *args, **kwargs)

Handle GET requests: instantiate a blank version of the form.

zds.searchv2.views.opensearch(request)

Generate OpenSearch Description file.