Skip to content

Conversation

@rod-glover
Copy link
Contributor

@rod-glover rod-glover commented Jan 16, 2025

Resolves #229

NOTE: This branch extends i-226-hxtk-obs_raw but will be rebased when #227 is merged.

TODO:

@rod-glover rod-glover marked this pull request as draft January 16, 2025 20:19
@rod-glover
Copy link
Contributor Author

There are still some outstanding questions to be answered regarding this design. These are copied directly from the Confluence document Data model(s) for multiple climatological normal periods.

Climatological Station

  1. A Climatological Station, unlike a Station, does not sensibly have a history. Therefore this entity combines select attributes of the existing Station and History tables.
  2. What does the term "base station" mean as used in Multiple climatological normal periods on the PCDS - Conceptual work plan ?
  3. Multiple climatological normal periods on the PCDS - Conceptual work plan includes an attribute "Composite (binary): Indicates if station is a long-record or composite station"
    1. What is a long-record station? (And what if any is its relationship to the term "base station"?)
    2. Can the "long-record-ness" of a Climatological Station be deduced from the History records associated to it?
    3. Are there any other types of station we might want to consider? Any reason not to use instead of composite a more open-ended value named "type" or the like?
  4. How should we represent position? With an attribute of type geometry, or with latitude and longitude attributes, or both (which has caused us problems in the existing History table)?
  5. The relation histories will be implemented by a cross-table. That table could, if desired, include info about how each History participates in the Climatological Station. That can be added later if desired.

Climatological Variable

  1. We've followed CF Metadata Standards, 7.4 Climatological Statistics, in using climatology_bounds and value_time to indicate the climatological parameters of the statistic. Is there any reason not to do this?
  2. Given that, duration can be computed from climatology_bounds, hence is redundant and can be removed (normalization). Nevertheless it might be convenient, at the risk of not being consistent with climatology_bounds. Omit
  3. Attribute precision is in the existing table Variable a numeric value that is abused – e.g., 3 digits before the decimal and 2 digits after is represented as the real number 3.2. Let's not do that here. How should we do it? Propose either a string (with format restrictions) or a two-element vector of integers.
  4. What if anything is the use of net_var_name in this context? Does it need to be of type citext?

Climatological Value

  1. Multiple climatological normal periods on the PCDS - Conceptual work plan includes an attribute "Years per month: years of data going into each month".
    1. So the number of years going into a month (or other annual subdivision) varies by month? This is a novel datum; I'd like to understand it better.
    2. We should generalize this to arbitrary periods, so "Years per annual subdivision" or other generic term, suitably turned into an identifier, say num_contributing_years.
  2. See remark re. CF Metadata Standards, value_time above.

@rod-glover rod-glover force-pushed the i-229-multi-climo-normals branch from 2101ba8 to f9ac3e3 Compare January 22, 2025 22:02
@rod-glover
Copy link
Contributor Author

More questions:

  1. Many columns in the corresponding observation tables are (a) fixed-length strings and (b) nullable. In the new tables they are neither.
    • Fixed-length strings aren't really: they all are stored as variable length by PG. So there is no particular reason to specify lengths, except perhaps to prevent long strings from being entered erroneously. Questionably, some columns such as comments have fixed lengths, and others that might be expected to be short (possibly) fixed, such as country, do not.
    • Why do the original tables allow null values in so many places. E.g., All columns in History. Nulls in most of these columns cannot be correct.

@Nospamas Nospamas force-pushed the i-229-multi-climo-normals branch from faa40e8 to 02635c0 Compare August 14, 2025 17:14
@rod-glover
Copy link
Contributor Author

@Nospamas , I have transferred all new info in the questions documented above into the Confluence doc Data model(s) for multiple climatological normal periods

end_date = Column(DateTime, nullable=False)


class ClimatologicalPeriodHistory(Base):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inclusion of this and other history tables means that the ORM will not be valid if the schema is only migrated to rev 758b (support multi climo normals). But that might be a good thing, as we probably never want to rest at that intermediate rev.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way around this issue? Or is it a case of checking out an older tagged version of pycds to work with an older database?

Copy link
Contributor Author

@rod-glover rod-glover Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way around this issue is to release an ORM compatible with 758b (sans hx tracking tables), and another compatible with 7244 (with hx tkg). I think that will require splitting hx tkg into a separate PR, and release a version after each one.

display_name = Column(String, nullable=False)
short_name = Column(String, nullable=False)
cell_methods = Column(String, nullable=False)
net_var_name = Column(CIText(), nullable=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest we check with scientist(s) and/or James which of these columns are useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, to some degree they're holdovers from the former version, but I might just try to grab what I can during the data import and drop any columns that aren't obviously fillable.

@Nospamas Nospamas force-pushed the i-229-multi-climo-normals branch from a2ad42c to adda2b1 Compare September 29, 2025 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support multiple climatological normal periods

4 participants