La recherche (searchv2/)

Module situé dans zds/searchv2/.

Modèles (models.py)

class zds.searchv2.models.AbstractESDjangoIndexable(*args, **kwargs)

Version of AbstractESIndexable for a Django object, with some improvements :

  • Already include pk in mapping ;
  • Match ES _id field and pk ;
  • Override es_already_indexed to a database field.
  • Define a es_flagged field to restrict the number of object to be indexed ;
  • Override save() to manage the field ;
  • Define a get_es_django_indexable() method that can be overridden to change the queryset to fetch object.
classmethod get_es_django_indexable(force_reindexing=False)

Method that can be overridden to filter django objects from database based on any criterion.

Parameters:force_reindexing (bool) – force to return all objects, even if they may be already indexed.
Returns:query
Return type:django.db.models.query.QuerySet
classmethod get_es_indexable(force_reindexing=False)

Override get_es_indexable() in order to use the Django querysets and batch objects.

Returns:a queryset
Return type:django.db.models.query.QuerySet
classmethod get_es_mapping()

Overridden to add pk into mapping.

Returns:mapping object
Return type:elasticsearch_dsl.Mapping
save(*args, **kwargs)

Override the save() method to flag the object if saved (which assumes a modification of the object, so the need to reindex).

Note

Flagging can be prevented using save(es_flagged=False).

class zds.searchv2.models.AbstractESIndexable

Mixin for indexable objects.

Define a number of different functions that can be overridden to tune the behavior of indexing into elasticsearch.

You (may) need to override :

  • get_indexable() ;
  • get_mapping() (not mandatory, but otherwise, ES will choose the mapping by itself) ;
  • get_document() (not mandatory, but may be useful if data differ from mapping or extra stuffs need to be done).

You also need to maintain es_id and es_already_indexed for bulk indexing/updating (if any).

get_es_document_as_bulk_action(index, action='index')

Create a document formatted for a _bulk operation. Formatting is done based on action.

See https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html.

Parameters:
  • index (str) – index in witch the document will be inserted
  • action (str) – action, either “index”, “update” or “delete”
Returns:

the document

Return type:

dict

get_es_document_source(excluded_fields=None)

Create a document from the variable of the class, based on the mapping.

Attention

You may need to override this method if the data differ from the mapping for some reason.

Parameters:excluded_fields (list) – exclude some field from the default method
Returns:document
Return type:dict
classmethod get_es_document_type()

value of the _type field in the index

classmethod get_es_indexable(force_reindexing=False)

Return objects to index.

Attention

You need to override this method (otherwise nothing will be indexed).

Parameters:force_reindexing (bool) – force to return all objects, even if they may already be indexed.
Return type:list
classmethod get_es_mapping()

Setup mapping (data scheme).

Note

You will probably want to change the analyzer and boost value. Also consider the index='not_analyzed' option to improve performances.

See https://elasticsearch-dsl.readthedocs.io/en/latest/persistence.html#mappings

Attention

You may want to override this method (otherwise ES choose the mapping by itself).

Returns:mapping object
Return type:elasticsearch_dsl.Mapping
class zds.searchv2.models.ESIndexManager(name, shards=5, replicas=0, connection_alias='default')

Manage a given index with different taylor-made functions

analyze_sentence(request)

Use the anlyzer on a given sentence. Get back the list of tokens.

See http://www.elastic.co/guide/en/elasticsearch/reference/current/indices-analyze.html.

This is useful to perform “terms” queries instead of full-text queries.

Parameters:request (str) – a sentence from user input
Returns:the tokens
Return type:list
clear_es_index()

Clear index

clear_indexing_of_model(model)

Nullify the indexing of a given model by setting es_already_index=False to all objects.

Use full updating for AbstractESDjangoIndexable, instead of saving all of them.

Parameters:model (class) – the model
delete_by_query(doc_type='', query=MatchAll())

Perform a deletion trough the _delete_by_query API.

See https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html

Attention

Call to this function must be done with great care!

Parameters:
  • doc_type (str) – the document type
  • query (elasticsearch_dsl.query.Query) – the query to match all document to be deleted
delete_document(document)

Delete a given document, based on its es_id

Parameters:document (AbstractESIndexable) – the document
es_bulk_indexing_of_model(model, force_reindexing=False)

Perform a bulk action on documents of a given model. Use the objects_per_batch property to index.

See http://elasticsearch-py.readthedocs.io/en/master/api.html#elasticsearch.Elasticsearch.bulk and http://elasticsearch-py.readthedocs.io/en/master/helpers.html#elasticsearch.helpers.parallel_bulk

Attention

  • Currently only implemented with “index” and “update” !
  • Currently only working with AbstractESDjangoIndexable.
Parameters:
  • model (class) – and model
  • force_reindexing (bool) – force all document to be returned
Returns:

the number of documents indexed

Return type:

int

refresh_index()

Force the refreshing the index. The task is normally done periodically, but may be forced with this method.

See https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-refresh.html.

Note

The use of this function is mandatory if you want to use the search right after an indexing.

reset_es_index(models)

Delete old index and create an new one (with the same name). Setup the number of shards and replicas. Then, set mappings for the different models.

Parameters:
  • models (list) – list of models
  • number_shards (int) – number of shards
  • number_replicas (int) – number of replicas
setup_custom_analyzer()

Override the default analyzer.

See https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis.html.

Our custom analyzer is based on the “french” analyzer (https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html#french-analyzer) but with some difference

  • “custom_tokenizer”, to deal with punctuation and all kind of (non-breaking) spaces, but keep dashes and other stuffs intact (in order to keep “c++” or “c#”, for example).
  • “protect_c_language”, a pattern replace filter to prevent “c” from being wiped out by the stopper.
  • “french_keywords”, a keyword stopper prevent some programming language from being stemmed.

Warning

You need to run manage.py es_manager index_all if you modified this !!

Setup search to the good index

Parameters:request (elasticsearch_dsl.Search) – the search request
Returns:formated search
Return type:elasticsearch_dsl.Search
update_single_document(document, doc)

Update given fields of a single document.

See https://www.elastic.co/guide/en/elasticsearch/guide/current/partial-updates.html.

Parameters:
exception zds.searchv2.models.NeedIndex

Raised when an action requires an index, but it is not created (yet).

zds.searchv2.models.delete_document_in_elasticsearch(instance)

Delete a ESDjangoIndexable from ES database. Must be implemented by all classes that derive from AbstractESDjangoIndexable.

Parameters:instance (AbstractESIndexable) – the document to delete
zds.searchv2.models.get_django_indexable_objects()

Return all indexable objects registered in Django

Vues (views.py)

class zds.searchv2.views.SearchView(**kwargs)

Search view.

get(request, *args, **kwargs)

Overridden to catch the request and fill the form.

get_queryset_chapters()

Search in content chapters.

get_queryset_posts()

Search in posts, and remove result if the forum is not allowed for the user or if the message is invisible.

Score is modified if:

  • post is the first one in a topic;
  • post is marked as “useful”;
  • post has a like/dislike ratio above (has more likes than dislikes) or below (the other way around) 1.0.
get_queryset_publishedcontents()

Search in PublishedContent objects.

get_queryset_topics()

Search in topics, and remove the result if the forum is not allowed for the user.

Score is modified if:

  • topic is solved;
  • topic is sticky;
  • topic is locked.
zds.searchv2.views.opensearch(request)

Generate OpenSearch Description file.