Skip to content

Conversation

@Chenglong-MS
Copy link
Collaborator

  • fix mysql connector issue
  • test pgsql connector issue
  • simplify UI

dependabot bot and others added 11 commits November 17, 2025 19:54
Bumps [vega](https://github.com/vega/vega) from 5.33.0 to 6.2.0.
- [Release notes](https://github.com/vega/vega/releases)
- [Commits](vega/vega@v5.33.0...v6.2.0)

---
updated-dependencies:
- dependency-name: vega
  dependency-version: 6.2.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [validator](https://github.com/validatorjs/validator.js) from 13.15.20 to 13.15.22.
- [Release notes](https://github.com/validatorjs/validator.js/releases)
- [Changelog](https://github.com/validatorjs/validator.js/blob/master/CHANGELOG.md)
- [Commits](validatorjs/validator.js@13.15.20...13.15.22)

---
updated-dependencies:
- dependency-name: validator
  dependency-version: 13.15.22
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements three major changes: fixes MySQL and PostgreSQL database connectors, simplifies the UI by removing derived field functionality, and upgrades Vega from v5 to v6.

  • Database connector fixes: MySQL connector now uses native pymysql instead of DuckDB's MySQL extension; PostgreSQL connector improves parameter validation
  • UI simplification: Removes complex derived field/concept transformation features, simplifies model selection from slot-based to single selection, streamlines dialog interactions
  • Vega upgrade: Major version upgrade from 5.x to 6.x with corresponding dependency updates

Reviewed changes

Copilot reviewed 23 out of 25 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
py-src/data_formulator/data_loader/mysql_data_loader.py Complete rewrite to use native pymysql connection instead of DuckDB extension
py-src/data_formulator/data_loader/postgresql_data_loader.py Improved parameter validation and error handling
src/components/ComponentType.tsx Removed ConceptTransformation interface and "derived" field source type
src/app/dfSlice.tsx Simplified model selection from slot-based system to single model selection
src/views/VisualizationView.tsx Changed renderer from SVG to canvas, removed derived field UI elements
src/views/ConceptCard.tsx Removed derived concept editing and transformation functionality
src/views/ModelSelectionDialog.tsx Simplified from multi-slot model assignment to single model selection
src/views/DataFormulator.tsx Updated welcome screen layout and model testing logic
package.json & yarn.lock Upgraded Vega from v5.32.0 to v6.2.0 and all related packages
requirements.txt & pyproject.toml Added pymysql dependency, bumped version to 0.5.0.3
Multiple view files Removed derived field functionality, simplified UI interactions, reduced animation durations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +864 to +866
chartWidth={Math.round(config.defaultChartWidth * localScaleFactor)}
chartHeight={Math.round(config.defaultChartHeight * localScaleFactor)}
scaleFactor={1}
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scaling approach has changed significantly: Instead of applying scaleFactor post-render, it's now applied directly to width/height before rendering (lines 864-865) and scaleFactor is set to 1 (line 866). This is a substantial behavior change that could affect chart rendering quality, especially for high-DPI displays or when users adjust zoom.

The old approach scaled the rendered SVG, while the new approach pre-sizes the chart. These have different implications for resolution and rendering quality.

Suggested change
chartWidth={Math.round(config.defaultChartWidth * localScaleFactor)}
chartHeight={Math.round(config.defaultChartHeight * localScaleFactor)}
scaleFactor={1}
chartWidth={config.defaultChartWidth}
chartHeight={config.defaultChartHeight}
scaleFactor={localScaleFactor}

Copilot uses AI. Check for mistakes.

transform?: ConceptTransformation;
temporary?: true; // the field is temporary, and it will be deleted unless it's saved
}
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ConceptTransformation type and transform property have been removed from FieldItem. This is part of the larger refactoring to simplify the UI by removing derived field functionality. This is a breaking change that affects the data model significantly.

Copilot uses AI. Check for mistakes.
let chartActionItems = isDataStale ? [] : (
<Box sx={{display: "flex", flexDirection: "column", flex: 1, my: 1}}>
{(table.virtual || table.rows.length > 1000) && !(chartUnavailable || encodingShelfEmpty) ? (
{(table.virtual || table.rows.length > 5000) && !(chartUnavailable || encodingShelfEmpty) ? (
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The threshold has been changed from 1000 to 5000 rows without explanation. This affects when the visualization shows a warning about large datasets. This change could impact performance or user experience for users with datasets between 1000-5000 rows.

Consider documenting the rationale for this change.

Suggested change
{(table.virtual || table.rows.length > 5000) && !(chartUnavailable || encodingShelfEmpty) ? (
{
// Threshold for showing large dataset warning changed from 1000 to 5000 rows.
// Rationale: Performance improvements and user feedback indicate that visualizations remain responsive up to 5000 rows.
// This avoids unnecessary warnings for moderately sized datasets and improves user experience.
(table.virtual || table.rows.length > 5000) && !(chartUnavailable || encodingShelfEmpty) ? (

Copilot uses AI. Check for mistakes.
} for _, col_row in columns_df.iterrows()]

# Get sample data
sample_query = f"SELECT * FROM `{schema}`.`{table_name}` LIMIT 10"
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL injection vulnerability: The query directly interpolates schema and table_name into SQL without proper escaping or parameterization. Even though backticks are used, this doesn't prevent SQL injection if the values contain backticks.

Use parameterized queries or properly escape table/schema names using pymysql.escape_string() or similar.

Copilot uses AI. Check for mistakes.
sample_rows = json.loads(sample_df.to_json(orient="records", date_format='iso'))

# Get row count
count_query = f"SELECT COUNT(*) as cnt FROM `{schema}`.`{table_name}`"
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL injection vulnerability: Same issue as above - schema and table_name are directly interpolated into the SQL query without proper escaping or parameterization.

Copilot uses AI. Check for mistakes.
import CheckCircleOutlineIcon from '@mui/icons-material/CheckCircleOutline';
import ErrorOutlineIcon from '@mui/icons-material/ErrorOutline';
import HelpOutlineIcon from '@mui/icons-material/HelpOutline';
import InfoOutlinedIcon from '@mui/icons-material/InfoOutlined';
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused import InfoOutlinedIcon.

Suggested change
import InfoOutlinedIcon from '@mui/icons-material/InfoOutlined';

Copilot uses AI. Check for mistakes.
Comment on lines 260 to 261
except Exception:
pass
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Suggested change
except Exception:
pass
except Exception as e:
logger.exception("Exception occurred while closing MySQL connection in __del__")

Copilot uses AI. Check for mistakes.
Chenglong-MS and others added 6 commits December 4, 2025 17:41
…tor-13.15.22

Bump validator from 13.15.20 to 13.15.22
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 26 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

},
getAllSlotTypes: () : ModelSlotType[] => {
return [...MODEL_SLOT_TYPES];
return state.models.find(m => m.id == state.selectedModelId) || state.models[0];
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect equality operator: Using == for string comparison instead of ===. While JavaScript's == will work for string comparison in this case, the codebase should use strict equality === for consistency and to avoid potential type coercion issues.

Change m.id == selectedModelId to m.id === selectedModelId on line 933, and similar occurrences throughout the file.

Copilot uses AI. Check for mistakes.
Comment on lines 181 to 187
sample_query = f"SELECT * FROM `{schema}`.`{table_name}` LIMIT 10"
sample_df = self._execute_query(sample_query)
sample_rows = json.loads(sample_df.to_json(orient="records", date_format='iso'))

# Get row count
count_query = f"SELECT COUNT(*) as cnt FROM `{schema}`.`{table_name}`"
count_df = self._execute_query(count_query)
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL injection vulnerability: The query uses unsanitized schema and table names directly in f-strings without proper escaping or parameterization. Lines 181, 186 use backtick-quoted identifiers, but the values inside the backticks are not sanitized. A malicious schema/table name like `; DROP TABLE users; -- could potentially execute arbitrary SQL.

Consider using parameterized queries or proper escaping mechanisms provided by pymysql.

Copilot uses AI. Check for mistakes.
Comment on lines +53 to +54
password_part = f" password={password}" if password else ""
attach_string = f"host={host} port={port} user={user}{password_part} dbname={database}"
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing password sanitization in log output: The connection string is constructed with the password and could potentially be logged in error messages. On line 54, if an error occurs after this line and the attach_string is included in the error message, it would expose the password.

Consider redacting sensitive information from logs or using a more secure connection method that doesn't expose credentials in string form.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Comment on lines +82 to +95
self.mysql_conn = pymysql.connect(
host=host,
user=user,
password=password,
database=database,
port=port,
cursorclass=pymysql.cursors.DictCursor,
charset='utf8mb4'
)
self.database = database
logger.info(f"Successfully connected to MySQL database: {self.database}")
except Exception as e:
logger.error(f"Failed to connect to MySQL: {e}")
raise
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resource leak: The MySQL connection is stored in self.mysql_conn but there's no guarantee it will be closed. While close() and context manager methods are implemented (lines 264-278), there's no cleanup in the destructor (__del__), so if the object is destroyed without calling close() or using it as a context manager, the connection will leak.

Consider adding a __del__ method that calls self.close() to ensure cleanup.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Chenglong-MS and others added 2 commits December 4, 2025 18:53
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Collaborator

@Mestway Mestway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good!

@Chenglong-MS Chenglong-MS merged commit 7b6fd1a into main Dec 9, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants