Skip to content

Conversation

@yonigottesman
Copy link
Contributor

No description provided.

Ohad Shacham and others added 18 commits February 7, 2018 12:44
…o HBase. In many cases, as in the Phoenix case, these attributes are required and should be propagated to the server side. In Phoenix for example, attributes are used to mark data as one that should propagate to the secondary index.

        This commit propagates the attributes to HBase side.
… in conflict analysis. The purpose of this feature is to let the user decide for each write, whether it should take part in the conflict analysis.

 The motivation infers from Apache Phoenix that utilizes this feature when writing to the secondary index and also when writing to the data table for immutable tables (each key is added once and is not modified).
…write as a write that was done by a specific transaction.

   However, due to lack of shadow cells, getting the commit timestamp of the transaction can be done only by access the commit table.
The motivation of this feature is to add the shadow cells during the write and save the commit table access.
 This feature is required by Apache Phoenix that during index creation adds the data table's entries, appeared before
 creation, to the index. In this case, the version and the commit timestamp should be the fence id and therefore, a direct write to HBase with the
 addition of shadow cells is required.
 incorrect since the family deletion qualifier needs to be added instead of a
 row marker. Therefore, this commit fixes this case. This is crucial since
 the write set information is needed for adding shadow cells, when transaction
 successfully commits, and for garbage collection when transaction aborts.
 metadata (shadow cells) to an existing mutation. This feature is required by
 Apache Phoenix both for local index population and update.
 This commit changes visibility of several function in order to run Omid in
 testing mode from Phoenix testing environment.
…on. This comes in addition to the option of marking each column family using HBase metadata.
…ger also for conflict free writes. This is because fences should also force conflict free transactions to abort.
 found. We need to continue looking until we either find a  committed one in
 the past or no committed family deletion marker for this column is found.
 Otherwise, we might miss committed family deletion markers that exists in a
 transaction snapshot.
 family deletion marker. This is noticeable after a checkpoint.
@yonigottesman yonigottesman changed the title Support for user Filter when using coprocessor for snapshot filtering [OMID-102] Support for user Filter when using coprocessor for snapshot filtering Aug 1, 2018
@yonigottesman yonigottesman reopened this Aug 1, 2018
// Create the required data
final byte[] validShadowCellQualifier =
com.google.common.primitives.Bytes.concat(qualifier, SHADOW_CELL_SUFFIX);
com.google.common.primitives.Bytes.concat(new byte[1], qualifier, SHADOW_CELL_SUFFIX);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define SHADOW_CELL_PREFIX to "new byte[1]" and replace all the new byte[1].

cell.getQualifierLength());
if (!matchingQualifier(otherCell,
cell.getQualifierArray(), cell.getQualifierOffset(), qualifierLength)) {
cell.getQualifierArray(), cell.getQualifierOffset() + 1, qualifierLength)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it work when the shadow cell prefix is absent? legacy data.

qualifierLength = qualifierLengthFromShadowCellQualifier(cell.getQualifierArray(),
cell.getQualifierOffset(),
cell.getQualifierLength());
qualifierOffset = qualifierOffset + 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it work when the shadow cell prefix is absent? legacy data.

}
HBaseTransaction hbaseTransaction = getHBaseTransaction(get.getAttribute(CellUtils.TRANSACTION_ATTRIBUTE));
SnapshotFilterImpl snapshotFilter = getSnapshotFilter(e);
get.setMaxVersions();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a comment that we set to max versions since we are doing Omid filtering in the VisibilityFilter

package org.apache.omid.transaction;

import com.google.common.base.Optional;
import com.sun.istack.Nullable;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different Nullable usage

private final SnapshotFilterImpl snapshotFilter;
private final Map<Long ,Long> shadowCellCache;
private final HBaseTransaction hbaseTransaction;
private final Map<String, Long> familyDeletionCache;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a comment that the row info is redundant in here since reset is called between rows and we clear this map in reset.

if (snapshotFilter.getTSIfInTransaction(v, hbaseTransaction).isPresent()) {
return runUserFilter(v, ReturnCode.INCLUDE);
} else {
return runUserFilter(v, ReturnCode.INCLUDE_AND_NEXT_COL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a comment that since v is in snapshot and not in transaction then it is the last result for SNAPSHOT_ALL.

if (isCellInSnapshot(v)) {

if (CellUtils.isTombstone(v)) {
return ReturnCode.NEXT_COL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about tombstones in snapshot_all? I assume it should be SKIP.



private boolean isCellInSnapshot(Cell v) throws IOException {
if (shadowCellCache.containsKey(v.getTimestamp()) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two collection calls.

@Override
public boolean filterRowKey(byte[] buffer, int offset, int length) throws IOException {
if (userFilter != null) {
return userFilter.filterRowKey(buffer, offset, length);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this function call before each filterRowKey(Cell) ? That's what the documentation say. If it is true then the whole implementation should be different...

"Filters a row based on the row key. If this returns true, the entire row will be excluded. If false, each KeyValue in the row will be passed to filterCell(Cell) below"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this ok (this is what Tephra does).


if (shadowCellCache.containsKey(v.getTimestamp()) &&
hbaseTransaction.getStartTimestamp() >= shadowCellCache.get(v.getTimestamp())) {
familyDeletionCache.put(Bytes.toString(CellUtil.cloneFamily(v)), shadowCellCache.get(v.getTimestamp()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid allocating memory within this call. Either use TreeMap<byte[],Cell> and don't do the copy or use HashMap<ImmutableBytesPtr,Cell>. You can copy/paste ImmutableBytesPtr from Phoenix - it's a wrapper around byte[] that handles equality and hashcode without doing any copying.

}

@Test(timeOut = 60_000)
public void testGetSecondResult() throws Throwable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to have a test that uses a scan with FirstKeyOnlyFilter that would get incorrect results without this new implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test which would have failed without this patch (i.e. one that demonstrates the need for having the visibility filtering done as a pure HBase filter)?


private static final Logger LOG = LoggerFactory.getLogger(CellUtils.class);
static final byte[] SHADOW_CELL_SUFFIX = "\u0080".getBytes(Charsets.UTF_8); // Non printable char (128 ASCII)
static byte[] DELETE_TOMBSTONE = Bytes.toBytes("__OMID_TOMBSTONE__");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will Cells end up with these constant values and if so can we make them shorter?

@JamesRTaylor
Copy link
Contributor

This is great, @yonigottesman! Do the Phoenix unit tests FlappingTransactionIT.testInflightUpdateNotSeen() and testInflightDeleteNotSeen() pass with this change? You can try running them from the omid2 feature branch.

Copy link
Contributor

@ohadshacham ohadshacham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

return result == 0;
}

private static boolean startsWith(byte[] value, int offset, int length, byte[] prefix) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just use Bytes.startsWith() instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No because Bytes.startWith doesnt work with offset

import org.apache.hadoop.hbase.client.HTableInterface;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the use of wildcarding here intentional? Not sure about Omid, but in Phoenix we always explicitly specify all the imports.

Result shadowCell = snapshotFilter.getTableAccessWrapper().get(get);

if (!shadowCell.isEmpty() &&
Bytes.toLong(CellUtil.cloneValue(shadowCell.rawCells()[0] )) <= hbaseTransaction.getStartTimestamp()){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't repeat Bytes.toLong(CellUtil.cloneValue(shadowCell.rawCells()[0] )) here. You want to minimize heap allocation during filtering.

@JamesRTaylor
Copy link
Contributor

Nice work, @yonigottesman. I made a few minor comments, @ohadshacham. My main question is do the Phoenix unit tests FlappingTransactionIT.testInflightUpdateNotSeen() and testInflightDeleteNotSeen() pass with this change? You can try running them from the omid2 feature branch in Phoenix against the phoenix-integration branch in omid2 with your patch applied.

@yonigottesman
Copy link
Contributor Author

yonigottesman commented Aug 7, 2018

All FlappingTransactionIT tests pass and i added a test that will pass only if snapshot filtering is done in a coprocessor before user filter is called

@asfgit asfgit force-pushed the phoenix-integration branch from 30c49d3 to 1e5ee3f Compare November 14, 2018 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants