-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Dev #208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dev #208
Conversation
Chenglong-MS
commented
Dec 4, 2025
- fix mysql connector issue
- test pgsql connector issue
- simplify UI
Bumps [vega](https://github.com/vega/vega) from 5.33.0 to 6.2.0. - [Release notes](https://github.com/vega/vega/releases) - [Commits](vega/vega@v5.33.0...v6.2.0) --- updated-dependencies: - dependency-name: vega dependency-version: 6.2.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>
….2.0 Bump vega from 5.33.0 to 6.2.0
Add BigQuery DataLoader
Bumps [validator](https://github.com/validatorjs/validator.js) from 13.15.20 to 13.15.22. - [Release notes](https://github.com/validatorjs/validator.js/releases) - [Changelog](https://github.com/validatorjs/validator.js/blob/master/CHANGELOG.md) - [Commits](validatorjs/validator.js@13.15.20...13.15.22) --- updated-dependencies: - dependency-name: validator dependency-version: 13.15.22 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements three major changes: fixes MySQL and PostgreSQL database connectors, simplifies the UI by removing derived field functionality, and upgrades Vega from v5 to v6.
- Database connector fixes: MySQL connector now uses native
pymysqlinstead of DuckDB's MySQL extension; PostgreSQL connector improves parameter validation - UI simplification: Removes complex derived field/concept transformation features, simplifies model selection from slot-based to single selection, streamlines dialog interactions
- Vega upgrade: Major version upgrade from 5.x to 6.x with corresponding dependency updates
Reviewed changes
Copilot reviewed 23 out of 25 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
py-src/data_formulator/data_loader/mysql_data_loader.py |
Complete rewrite to use native pymysql connection instead of DuckDB extension |
py-src/data_formulator/data_loader/postgresql_data_loader.py |
Improved parameter validation and error handling |
src/components/ComponentType.tsx |
Removed ConceptTransformation interface and "derived" field source type |
src/app/dfSlice.tsx |
Simplified model selection from slot-based system to single model selection |
src/views/VisualizationView.tsx |
Changed renderer from SVG to canvas, removed derived field UI elements |
src/views/ConceptCard.tsx |
Removed derived concept editing and transformation functionality |
src/views/ModelSelectionDialog.tsx |
Simplified from multi-slot model assignment to single model selection |
src/views/DataFormulator.tsx |
Updated welcome screen layout and model testing logic |
package.json & yarn.lock |
Upgraded Vega from v5.32.0 to v6.2.0 and all related packages |
requirements.txt & pyproject.toml |
Added pymysql dependency, bumped version to 0.5.0.3 |
| Multiple view files | Removed derived field functionality, simplified UI interactions, reduced animation durations |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| chartWidth={Math.round(config.defaultChartWidth * localScaleFactor)} | ||
| chartHeight={Math.round(config.defaultChartHeight * localScaleFactor)} | ||
| scaleFactor={1} |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scaling approach has changed significantly: Instead of applying scaleFactor post-render, it's now applied directly to width/height before rendering (lines 864-865) and scaleFactor is set to 1 (line 866). This is a substantial behavior change that could affect chart rendering quality, especially for high-DPI displays or when users adjust zoom.
The old approach scaled the rendered SVG, while the new approach pre-sizes the chart. These have different implications for resolution and rendering quality.
| chartWidth={Math.round(config.defaultChartWidth * localScaleFactor)} | |
| chartHeight={Math.round(config.defaultChartHeight * localScaleFactor)} | |
| scaleFactor={1} | |
| chartWidth={config.defaultChartWidth} | |
| chartHeight={config.defaultChartHeight} | |
| scaleFactor={localScaleFactor} |
|
|
||
| transform?: ConceptTransformation; | ||
| temporary?: true; // the field is temporary, and it will be deleted unless it's saved | ||
| } |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ConceptTransformation type and transform property have been removed from FieldItem. This is part of the larger refactoring to simplify the UI by removing derived field functionality. This is a breaking change that affects the data model significantly.
| let chartActionItems = isDataStale ? [] : ( | ||
| <Box sx={{display: "flex", flexDirection: "column", flex: 1, my: 1}}> | ||
| {(table.virtual || table.rows.length > 1000) && !(chartUnavailable || encodingShelfEmpty) ? ( | ||
| {(table.virtual || table.rows.length > 5000) && !(chartUnavailable || encodingShelfEmpty) ? ( |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The threshold has been changed from 1000 to 5000 rows without explanation. This affects when the visualization shows a warning about large datasets. This change could impact performance or user experience for users with datasets between 1000-5000 rows.
Consider documenting the rationale for this change.
| {(table.virtual || table.rows.length > 5000) && !(chartUnavailable || encodingShelfEmpty) ? ( | |
| { | |
| // Threshold for showing large dataset warning changed from 1000 to 5000 rows. | |
| // Rationale: Performance improvements and user feedback indicate that visualizations remain responsive up to 5000 rows. | |
| // This avoids unnecessary warnings for moderately sized datasets and improves user experience. | |
| (table.virtual || table.rows.length > 5000) && !(chartUnavailable || encodingShelfEmpty) ? ( |
| } for _, col_row in columns_df.iterrows()] | ||
|
|
||
| # Get sample data | ||
| sample_query = f"SELECT * FROM `{schema}`.`{table_name}` LIMIT 10" |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQL injection vulnerability: The query directly interpolates schema and table_name into SQL without proper escaping or parameterization. Even though backticks are used, this doesn't prevent SQL injection if the values contain backticks.
Use parameterized queries or properly escape table/schema names using pymysql.escape_string() or similar.
| sample_rows = json.loads(sample_df.to_json(orient="records", date_format='iso')) | ||
|
|
||
| # Get row count | ||
| count_query = f"SELECT COUNT(*) as cnt FROM `{schema}`.`{table_name}`" |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQL injection vulnerability: Same issue as above - schema and table_name are directly interpolated into the SQL query without proper escaping or parameterization.
| import CheckCircleOutlineIcon from '@mui/icons-material/CheckCircleOutline'; | ||
| import ErrorOutlineIcon from '@mui/icons-material/ErrorOutline'; | ||
| import HelpOutlineIcon from '@mui/icons-material/HelpOutline'; | ||
| import InfoOutlinedIcon from '@mui/icons-material/InfoOutlined'; |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused import InfoOutlinedIcon.
| import InfoOutlinedIcon from '@mui/icons-material/InfoOutlined'; |
| except Exception: | ||
| pass |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'except' clause does nothing but pass and there is no explanatory comment.
| except Exception: | |
| pass | |
| except Exception as e: | |
| logger.exception("Exception occurred while closing MySQL connection in __del__") |
…tor-13.15.22 Bump validator from 13.15.20 to 13.15.22
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 24 out of 26 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| }, | ||
| getAllSlotTypes: () : ModelSlotType[] => { | ||
| return [...MODEL_SLOT_TYPES]; | ||
| return state.models.find(m => m.id == state.selectedModelId) || state.models[0]; |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect equality operator: Using == for string comparison instead of ===. While JavaScript's == will work for string comparison in this case, the codebase should use strict equality === for consistency and to avoid potential type coercion issues.
Change m.id == selectedModelId to m.id === selectedModelId on line 933, and similar occurrences throughout the file.
| sample_query = f"SELECT * FROM `{schema}`.`{table_name}` LIMIT 10" | ||
| sample_df = self._execute_query(sample_query) | ||
| sample_rows = json.loads(sample_df.to_json(orient="records", date_format='iso')) | ||
|
|
||
| # Get row count | ||
| count_query = f"SELECT COUNT(*) as cnt FROM `{schema}`.`{table_name}`" | ||
| count_df = self._execute_query(count_query) |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQL injection vulnerability: The query uses unsanitized schema and table names directly in f-strings without proper escaping or parameterization. Lines 181, 186 use backtick-quoted identifiers, but the values inside the backticks are not sanitized. A malicious schema/table name like `; DROP TABLE users; -- could potentially execute arbitrary SQL.
Consider using parameterized queries or proper escaping mechanisms provided by pymysql.
| password_part = f" password={password}" if password else "" | ||
| attach_string = f"host={host} port={port} user={user}{password_part} dbname={database}" |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing password sanitization in log output: The connection string is constructed with the password and could potentially be logged in error messages. On line 54, if an error occurs after this line and the attach_string is included in the error message, it would expose the password.
Consider redacting sensitive information from logs or using a more secure connection method that doesn't expose credentials in string form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot open a new pull request to apply changes based on this feedback
| self.mysql_conn = pymysql.connect( | ||
| host=host, | ||
| user=user, | ||
| password=password, | ||
| database=database, | ||
| port=port, | ||
| cursorclass=pymysql.cursors.DictCursor, | ||
| charset='utf8mb4' | ||
| ) | ||
| self.database = database | ||
| logger.info(f"Successfully connected to MySQL database: {self.database}") | ||
| except Exception as e: | ||
| logger.error(f"Failed to connect to MySQL: {e}") | ||
| raise |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resource leak: The MySQL connection is stored in self.mysql_conn but there's no guarantee it will be closed. While close() and context manager methods are implemented (lines 264-278), there's no cleanup in the destructor (__del__), so if the object is destroyed without calling close() or using it as a context manager, the connection will leak.
Consider adding a __del__ method that calls self.close() to ensure cleanup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot open a new pull request to apply changes based on this feedback
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Add __del__ method to MySQLDataLoader for connection cleanup
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Feat/expand chart support
Add BigQuery DataLoader support
Add MongoDB Support
Mestway
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good!