[REFACTOR] Redisearch로 매칭 알고리즘 리팩토링#373
Conversation
Summary of ChangesHello @Yoonchulchung, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! 이 Pull Request는 매칭 알고리즘의 핵심 로직을 기존 어플리케이션 서버 기반에서 Redisearch 기반으로 전환하여 성능과 확장성을 크게 개선하는 리팩토링을 포함합니다. 또한, 사용자 태그 데이터를 활용하여 더욱 세분화된 사용자 유형을 분류하는 새로운 알고리즘을 도입하여 매칭의 정확도를 높였습니다. 전반적으로 시스템의 효율성을 증대하고 사용자 경험을 향상시키는 데 중점을 두었습니다. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This PR refactors the existing application server-based matching logic to use Redisearch, improving performance by moving computation to Redis, and also introduces a new user type classification algorithm. While the direction of improving performance with Redisearch is positive, a medium-severity security vulnerability has been identified: the new code constructs RediSearch queries by directly concatenating user-influenced data, creating a potential for Redis Injection. This insecure pattern should be addressed to prevent future vulnerabilities. Furthermore, it is recommended to refactor the complex business logic in MatchServiceImpl's determineUserType method into a separate class for improved maintainability and testability, generalize the fallback logic in RedisDocumentHelper for enhanced scalability, and restore the removed Javadoc for the match method for clarity.
| private String buildTagClause(Set<Integer> tags, String fieldName) { | ||
| if (tags == null || tags.isEmpty()) { | ||
| return null; | ||
| } |
There was a problem hiding this comment.
The buildTagClause method dynamically constructs a RediSearch query by concatenating user-provided tag values directly into the query string. The tags set, which originates from the user's request (MatchRequestDto), is converted to a string and embedded into the query sent to redisTemplate.execute. This pattern of building queries via string concatenation is a classic injection vulnerability.
While the current implementation uses a Set<Integer> for tags, which mitigates the immediate risk as integers cannot contain malicious characters to alter the query structure, this pattern is inherently insecure. Any future change that allows for string-based tags could immediately expose the application to Redis command injection, allowing an attacker to execute arbitrary commands on the Redis server.
Remediation:
Avoid manual string concatenation for building queries. Use a parameterized query builder if the Redis client library supports it. If not, perform strict validation and sanitization on the input tags to ensure they cannot be used to escape the query structure. For example, ensure tags only contain alphanumeric characters. Since the library in use (spring-data-redis) does not have a native RediSearch query builder, the best approach is to add a comment warning future developers about this risk if they consider changing the data type.
| // ******* // | ||
| // 매칭 검사 // | ||
| // ******* // |
| results = parseFtSearchResult(rawResult, clazz); | ||
| log.info("FT.SEARCH found {} candidates. index={}, query={}", results.size(), indexName, query); | ||
| } catch (Exception e) { | ||
| log.error("FT.SEARCH failed. index={}, query={}, error={}", indexName, query, e.getMessage()); | ||
| // FT.SEARCH 실패 시 레거시 방식으로 fallback | ||
| log.warn("Falling back to KEYS scan for index: {}", indexName); | ||
| if (BRAND_INDEX.equals(indexName)) { | ||
| @SuppressWarnings("unchecked") | ||
| List<T> fallback = (List<T>) findAllDocuments(BRAND_PREFIX, BrandTagDocument.class); | ||
| return fallback; | ||
| } else { |
There was a problem hiding this comment.
FT.SEARCH 실패 시 KEYS 스캔으로 fallback하는 로직은 좋은 방어적 프로그래밍입니다. 다만 현재 구현은 인덱스 이름을 문자열로 비교하고 있어, 새로운 검색 인덱스가 추가될 때마다 코드를 수정해야 하는 단점이 있습니다. 이 부분을 제네릭 타입 Class<T> clazz를 활용하여 좀 더 유연하고 타입-안전하게 개선할 수 있습니다.
// FT.SEARCH 실패 시 레거시 방식으로 fallback
log.warn("Falling back to KEYS scan for index: {}", indexName);
String prefix;
if (BrandTagDocument.class.equals(clazz)) {
prefix = BRAND_PREFIX;
} else if (CampaignTagDocument.class.equals(clazz)) {
prefix = CAMPAIGN_PREFIX;
} else {
log.error("Unsupported class for fallback scan: {}", clazz.getName());
return Collections.emptyList();
}
return findAllDocuments(prefix, clazz);이렇게 수정하면 향후 확장 시에도 executeSearch 메서드를 수정할 필요가 없어집니다.
Summary
기존 Redis를 캐싱 형태로 사용하여, 모든 연산을 어플리케이션 서버에서 진행하던 것을 Redisearch를 이용해 Redis에서 연산을 수행하도록 로직을 수정했습니다.
Changes
Type of Change
Related Issues
#372