Django 검색 쿼리 개선(Gin index)

django에서 Q객체를 이용하여 검색 기능을 구현하였다.

검색어가 게시물의 제목(title)과 내용(description)에 있다면 검색이 되어야 한다.

# views.py
def get_queryset(self):
        q = self.request.query_params.get("search")

        queryset = (
            CrawlingData.objects.annotate(
                fast_count=(
                    Count(
                        "emotion",
                        filter=Q(
                            emotion__emotion_type="F",
                            emotion__is_deleted=False,
                        ),
                    )
                ),
                is_bookmark=(
                    Exists(
                        self.request.user.bookmark_set.filter(
                            crawling_data_id=OuterRef("id"),
                            user=self.request.user,
                            is_deleted=False,
                        )
                    )
                ),
            )
            .filter(
                Q(is_private=True)
                & (Q(title__contains=q.lower()) | Q(description__contains=q.lower()))
            )
            .prefetch_related(
                Prefetch(
                    "emotion_set", queryset=Emotion.objects.all().select_related("user")
                ),
                Prefetch(
                    "bookmark_set",
                    queryset=Bookmark.objects.all().select_related("user"),
                ),
            )
        )

        return queryset

초기에는 위의 로직으로 이루어져 있었다. 데이터가 적을 땐 영향이 없었지만 데이터가 점점 쌓이면서 150만개 이상이 되었을 때 검색 시 30초 이상이 걸렸다. 방법을 찾다가 Gin Index를 알게 되었다.

또한 Pagination을 사용하고 있었는데 쿼리에 count를 포함하여 반환하고 있었다. Django 의 CursorPagination 으로 변경하니 쿼리 속도가 크게 개선되었다.(30초 → 15초)

Gin Index란 Full Text Search에 굉장히 효과적인 인덱싱 방법이다. 인덱스 타입에는 GIST와 GIN 인덱스 두 가지 종류가 있다. (https://www.postgresql.org/docs/9.4/textsearch-indexes.html)

일단 full test search 는 LIKE 쿼리를 이용하는데 LIKE를 이용한 쿼리는 굉장히 느리기 때문에 INDEX를 생성해야한다. 그냥 INDEX를 생성하는 경우 btree기반 INDEX가 생성된다.

{search}% 를 쿼리하는 것은 빠르다.
- btree의 상단에서부터 분기점을 고를 수 있기 때문에
- O(log(n))
%{search}% 를 쿼리하는 것은 느리다.
- btree의 상단에서부터 분기점을 고를 수 없으므로 결국 full table search와 같다.
- O(n)

Postgresql에서는 텍스트 검색을 위한 GIN, GIST 인덱싱을 제공한다.

GIN, GIST 비교

GIN index lookups are about three times faster than GiST
GIN indexes take about three times longer to build than GiST
GIN indexes are moderately slower to update than GiST indexes, but about 10 times slower if fast-update support was disabled (docs)
GIN indexes are two-to-three times larger than GiST indexes

GIN INDEX 구현 설명

gabcgaga 문자열에서 {'ga': [(0,1), (4,5), (6,7)]}과 같은 형태로 Index를 미리 생성

GIN 적용 방법

# models.py
from django.contrib.postgres.indexes import GinIndex

indexes = [
        GinIndex(Lower("title"), name="title_lower_gin_idx"),
    GinIndex(Lower("description"), name="description_lower_gin_idx"),
        ]

title과 description필드를 소문자로 변환하여 GIN 인덱스를 설정해주고 쿼리에서도 소문자로 변환된 검색어로 쿼리를 조회하니 2초대까지 줄어들었다.

'DRF' 카테고리의 다른 글

Django csv파일 다운로드 (0)	2022.12.08

너구리와 돈다발의 블로그

Django full text search (Gin index)

Django 검색 쿼리 개선(Gin index)

'DRF' 카테고리의 다른 글

티스토리툴바

Django full text search (Gin index)

Django 검색 쿼리 개선(Gin index)

'DRF' 카테고리의 다른 글

'DRF' Related Articles

티스토리툴바