Skip to content

Self-serve CSV user data export has poor query performance #5954

@bjester

Description

@bjester

This issue is not open for contribution. Visit Contributing guidelines to learn about the contributing process and how to find suitable issues.

Overview

The generateusercsv_task which is triggerable from a user's settings page causes queries that perform poorly, particularly for users with many channels.

Complexity: Medium
Target branch: hotfixes

Context

A single query spawned from the task took nearly 3 hours before it was killed. The task was queued for a user with large upload usage (20GB) and many channels.

The Change

Trace generateusercsv_task to each query it produces and optimize them as needed using previous known techniques for optimizing queries:

  • aligning filters with indices
  • using CTEs
  • avoiding complex joins

In particular, special attention should be given to file-related queries.

How to Get There

The task can be triggered from the /en/settings/#/account page, using the EXPORT DATA button

Out of Scope

Any queries not related to generateusercsv_task

Acceptance Criteria

  • For optimized queries, before and after SQL dumps and EXPLAIN analysis are ideal for communicating improvements

Testing

Ideally, tests should exist (be written if not) before any changes are made, to ensure the changes do not break the task's functionality.

References

https://learningequality.slack.com/archives/C0WHZ9FPX/p1780509080212179

Metadata

Metadata

Assignees

Type

No fields configured for Task.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions