Skip to content

Comments

[cwapi] adding named entities and noun chunks counts django stuff#26

Open
jeiranj wants to merge 1 commit intomasterfrom
jj_django
Open

[cwapi] adding named entities and noun chunks counts django stuff#26
jeiranj wants to merge 1 commit intomasterfrom
jj_django

Conversation

@jeiranj
Copy link

@jeiranj jeiranj commented Nov 3, 2017

class SpeakerWordCounts(models.Model):
def __str__(self):
return ",".join([self.crec_id, self.bioguide_id])
bioguide_id = models.CharField(max_length=7, primary_key=True)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand it correctly that these are named entities and noun chunks within a given document (crec_id) attributed to a particular speaker (bioguide_id)? In that case, should the primary key be a compound key of ('bioguid_id', 'crec_id'). Also, does the data come from the segments right now?

Copy link

@will-horning will-horning Nov 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a single row in this table (or a single instance of this class) contains the noun chunk and named entity counts attributed to a given speaker within a single snippet, or a single document if it is a single-speaker document.

The primary key question is a little tricky. It would need to be a compound of bioguide_id, crec_id and some sequence number in for the attributed segments (it is possible that a person speaks in separate segments within a single document, so crec + bioguide alone may not be unique). We'll also want to make bioguide_id a foreign key (see the legislators models for an example of how to do that in Django's ORM) so we can easily retrieve all the segments/documents for a legislator object via the ORM.

noun_chunks = text_utils.named_entity_dedupe(noun_chunks, named_entity_freqs.keys())
record['noun_chunks'] = str(Counter(noun_chunks).most_common())

if bool(record['speaker_ids']):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the bool cast?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants