Skip to content

Conversation

@ankitamk14
Copy link
Contributor

No description provided.

Copy link
Contributor

@sunilshetye sunilshetye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comments. initialise reader and loop only once. build cache if required. do not use bulk query.

Comment on lines +701 to +702
decoded_file = csv_file.read().decode('utf-8')
reader = csv.DictReader(io.StringIO(decoded_file))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
decoded_file = csv_file.read().decode('utf-8')
reader = csv.DictReader(io.StringIO(decoded_file))
decoded_file = io.TextIOWrapper(csv_file, encoding='utf-8')
reader = csv.DictReader(decoded_file)

This will still read the entire file in memory. Use something like the one suggested.

Comment on lines +703 to +706
row_count = sum(1 for _ in reader)
if row_count > MAX_ROWS:
messages.error(request, f"CSV has too many rows ({row_count}). Limit is {MAX_ROWS}.")
return redirect('events:new_ac')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the row count check here as this is still going to read the entire file.

messages.error(request, f"CSV columns do not match the expected format. Expected columns: {', '.join(EXPECTED_COLUMNS)}")
return redirect('events:new_ac')

reader = csv.DictReader(io.StringIO(decoded_file))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't initialise reader again.

Comment on lines +730 to +742
for idx, row in enumerate(reader, start=2): # Start from 2nd row
_states.add(row.get('state').strip())
_universities.add(row.get('university').strip())
_institution_types.add(row.get('institution_type').strip())
_districts.add(row.get('district').strip())
_cities.add(row.get('city').strip())

# Bulk query
states = { s.name.lower(): s for s in State.objects.filter(name__in=_states)}
universities = { (u.name.lower(), u.state.name.lower()): u for u in University.objects.filter(name__in=_universities).select_related('state')}
districts = { (d.name.lower(), d.state.name.lower()): d for d in District.objects.filter(name__in=_districts).select_related('state')}
cities = { (c.name.lower(), c.state.name.lower()): c for c in City.objects.filter(name__in=_cities).select_related('state')}
institution_types = { i.name.lower(): i for i in InstituteType.objects.filter(name__in=_institution_types)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this bulk query is also incorrect. you have to loop only once. you may build the cache in that loop.

return 1 if val == 'yes' else 0
else:
return 0
reader = csv.DictReader(io.StringIO(decoded_file))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't initialise the reader again

else:
return 0
reader = csv.DictReader(io.StringIO(decoded_file))
for idx, row in enumerate(reader, start=2): # Start from 2nd row
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't specify the start parameter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants