Conversation
| class SpeakerWordCounts(models.Model): | ||
| def __str__(self): | ||
| return ",".join([self.crec_id, self.bioguide_id]) | ||
| bioguide_id = models.CharField(max_length=7, primary_key=True) |
There was a problem hiding this comment.
Do I understand it correctly that these are named entities and noun chunks within a given document (crec_id) attributed to a particular speaker (bioguide_id)? In that case, should the primary key be a compound key of ('bioguid_id', 'crec_id'). Also, does the data come from the segments right now?
There was a problem hiding this comment.
Yes, a single row in this table (or a single instance of this class) contains the noun chunk and named entity counts attributed to a given speaker within a single snippet, or a single document if it is a single-speaker document.
The primary key question is a little tricky. It would need to be a compound of bioguide_id, crec_id and some sequence number in for the attributed segments (it is possible that a person speaks in separate segments within a single document, so crec + bioguide alone may not be unique). We'll also want to make bioguide_id a foreign key (see the legislators models for an example of how to do that in Django's ORM) so we can easily retrieve all the segments/documents for a legislator object via the ORM.
| noun_chunks = text_utils.named_entity_dedupe(noun_chunks, named_entity_freqs.keys()) | ||
| record['noun_chunks'] = str(Counter(noun_chunks).most_common()) | ||
|
|
||
| if bool(record['speaker_ids']): |
@rmangi @will-horning @rappoport @heavi5ide @alberttoledo