added support for 1 to 1 relations and added support for n unique key… #3

2018-09-03T06:59:41+02:00

tiger5226 commented

2018-09-03 06:59:41 +02:00

(Migrated from github.com)

added support for 1 to 1 relations and added support for n unique keys for conflict resolution during merging.

tiger5226 (Migrated from github.com) reviewed 2018-09-03 07:50:37 +02:00

templates/singleton/boil_queries.tpl

tiger5226 (Migrated from github.com) commented

2018-09-03 07:50:37 +02:00

Just thought of a case not covered. If the secondary user has a row but the primary does not the secondary row will still be deleted. We need to also check that the primary has a row too in order for the secondary to be deleted, otherwise there is no conflict to resolve.

nikooo777 (Migrated from github.com) reviewed 2018-09-03 23:05:15 +02:00

templates/singleton/boil_queries.tpl

nikooo777 (Migrated from github.com) commented

2018-09-03 23:05:15 +02:00

would wrapping %s in ` help? I'm thinking about those column names that might have spaces or special keywords

would wrapping %s in \` help? I'm thinking about those column names that might have spaces or special keywords

nikooo777 (Migrated from github.com) reviewed 2018-09-03 23:09:08 +02:00

nikooo777 (Migrated from github.com) left a comment

I don't fully understand this PR, but I took a look at it and it seems to logically make sense.

Still I don't feel experienced enough to say anything else.

I don't fully understand this PR, but I took a look at it and it seems to logically make sense. Still I don't feel experienced enough to say anything else.

tiger5226 (Migrated from github.com) reviewed 2018-09-04 00:08:26 +02:00

templates/singleton/boil_queries.tpl

tiger5226 (Migrated from github.com) commented

2018-09-04 00:08:26 +02:00

We discussed. I think if we have a column name that has spaces, this would not be accepted by SQLBoiler, but also if this is the case we should just fix the column name because it should be snake_case.

lyoshenka (Migrated from github.com) requested changes 2018-09-04 21:30:39 +02:00

lyoshenka (Migrated from github.com) left a comment

i recommend splitting the two cases out into separate functions. it will be far easier, and i don't think there's much overlap

templates/singleton/boil_queries.tpl

lyoshenka (Migrated from github.com) commented

2018-09-04 20:46:05 +02:00

id prefer not to have large comments like this in the code. we can either use godoc-style comments at the top of the function which describe what the function does, or write up the docs somewhere else. other than that, the code should speak for itself as much as possible

lyoshenka commented

2018-09-04 21:31:16 +02:00

(Migrated from github.com)

some code i threw together real quick

func mergeModels(tx boil.Executor, primaryID uint64, secondaryID uint64, foreignKeys []foreignKey, conflictingKeys []conflictingUniqueKey) error {
    if len(foreignKeys) < 1 {
        return nil
    }
    var err error

    for _, conflict := range conflictingKeys {
        if len(conflict.columns) == 1 && conflict.columns[0] == conflict.objectIdColumn {
            err = deleteOneToOneConflictsBeforeMerge(tx, conflict, primaryID, secondaryID)
        } else {
            err = deleteOneToManyConflictsBeforeMerge(tx, conflict, primaryID, secondaryID)
        }
        if err != nil {
            return err
        }
    }

    for _, fk := range foreignKeys {
        // TODO: use NewQuery here, not plain sql
        query := fmt.Sprintf(
            "UPDATE %s SET %s = %s WHERE %s = %s",
            fk.foreignTable, fk.foreignColumn, strmangle.Placeholders(dialect.IndexPlaceholders, 1, 1, 1),
            fk.foreignColumn, strmangle.Placeholders(dialect.IndexPlaceholders, 1, 2, 1),
        )
        _, err = tx.Exec(query, primaryID, secondaryID)
        if err != nil {
            return errors.Err(err)
        }
    }
    return checkMerge(tx, foreignKeys)
}

func deleteOneToOneConflictsBeforeMerge(tx boil.Executor, conflict conflictingUniqueKey, primaryID uint64, secondaryID uint64) error {
    query := fmt.Sprintf(
        "SELECT COUNT(*) FROM %s WHERE %s IN (%s)",
        conflict.table, conflict.objectIdColumn,
        strmangle.Placeholders(dialect.IndexPlaceholders, 2, 1, 1),
    )

    var count int
    err := tx.QueryRow(query, primaryID, secondaryID).Scan(&count)
    if err != nil {
        return errors.Err(err)
    }

    if count > 2 {
        return errors.Err("it should not be possible to have more than two rows here")
    } else if count != 2 {
        return nil // no conflicting rows
    }

    query = fmt.Sprintf(
        "DELETE FROM %s WHERE %s = %s",
        conflict.table, conflict.objectIdColumn, strmangle.Placeholders(dialect.IndexPlaceholders, 1, 1, 1),
    )

    _, err = tx.Exec(query, secondaryID)
    return errors.Err(err)
}

func deleteOneToManyConflictsBeforeMerge(tx boil.Executor, conflict conflictingUniqueKey, primaryID uint64, secondaryID uint64) error {
    conflictingColumns := strmangle.SetComplement(conflict.columns, []string{conflict.objectIdColumn})

    if len(conflictingColumns) < 1 {
        return nil
    } else if len(conflictingColumns) > 1 {
        return errors.Err("this doesnt work for unique keys with more than two columns (yet)")
    }

    query := fmt.Sprintf(
        "SELECT %s FROM %s WHERE %s IN (%s) GROUP BY %s HAVING count(distinct %s) > 1",
        conflictingColumns[0], conflict.table, conflict.objectIdColumn,
        strmangle.Placeholders(dialect.IndexPlaceholders, 2, 1, 1),
        conflictingColumns[0], conflict.objectIdColumn,
    )

    rows, err := tx.Query(query, primaryID, secondaryID)
    defer rows.Close()
    if err != nil {
        return errors.Err(err)
    }

    args := []interface{}{secondaryID}
    for rows.Next() {
        var value string
        err = rows.Scan(&value)
        if err != nil {
            return errors.Err(err)
        }
        args = append(args, value)
    }

    // if no rows found, no need to delete anything
    if len(args) < 2 {
        return nil
    }

    query = fmt.Sprintf(
        "DELETE FROM %s WHERE %s = %s AND %s IN (%s)",
        conflict.table, conflict.objectIdColumn, strmangle.Placeholders(dialect.IndexPlaceholders, 1, 1, 1),
        conflictingColumns[0], strmangle.Placeholders(dialect.IndexPlaceholders, len(args)-1, 2, 1),
    )

    _, err = tx.Exec(query, args...)
    if err != nil {
        return errors.Err(err)
    }
    return nil
}

some code i threw together real quick ```go func mergeModels(tx boil.Executor, primaryID uint64, secondaryID uint64, foreignKeys []foreignKey, conflictingKeys []conflictingUniqueKey) error { if len(foreignKeys) < 1 { return nil } var err error for _, conflict := range conflictingKeys { if len(conflict.columns) == 1 && conflict.columns[0] == conflict.objectIdColumn { err = deleteOneToOneConflictsBeforeMerge(tx, conflict, primaryID, secondaryID) } else { err = deleteOneToManyConflictsBeforeMerge(tx, conflict, primaryID, secondaryID) } if err != nil { return err } } for _, fk := range foreignKeys { // TODO: use NewQuery here, not plain sql query := fmt.Sprintf( "UPDATE %s SET %s = %s WHERE %s = %s", fk.foreignTable, fk.foreignColumn, strmangle.Placeholders(dialect.IndexPlaceholders, 1, 1, 1), fk.foreignColumn, strmangle.Placeholders(dialect.IndexPlaceholders, 1, 2, 1), ) _, err = tx.Exec(query, primaryID, secondaryID) if err != nil { return errors.Err(err) } } return checkMerge(tx, foreignKeys) } func deleteOneToOneConflictsBeforeMerge(tx boil.Executor, conflict conflictingUniqueKey, primaryID uint64, secondaryID uint64) error { query := fmt.Sprintf( "SELECT COUNT(*) FROM %s WHERE %s IN (%s)", conflict.table, conflict.objectIdColumn, strmangle.Placeholders(dialect.IndexPlaceholders, 2, 1, 1), ) var count int err := tx.QueryRow(query, primaryID, secondaryID).Scan(&count) if err != nil { return errors.Err(err) } if count > 2 { return errors.Err("it should not be possible to have more than two rows here") } else if count != 2 { return nil // no conflicting rows } query = fmt.Sprintf( "DELETE FROM %s WHERE %s = %s", conflict.table, conflict.objectIdColumn, strmangle.Placeholders(dialect.IndexPlaceholders, 1, 1, 1), ) _, err = tx.Exec(query, secondaryID) return errors.Err(err) } func deleteOneToManyConflictsBeforeMerge(tx boil.Executor, conflict conflictingUniqueKey, primaryID uint64, secondaryID uint64) error { conflictingColumns := strmangle.SetComplement(conflict.columns, []string{conflict.objectIdColumn}) if len(conflictingColumns) < 1 { return nil } else if len(conflictingColumns) > 1 { return errors.Err("this doesnt work for unique keys with more than two columns (yet)") } query := fmt.Sprintf( "SELECT %s FROM %s WHERE %s IN (%s) GROUP BY %s HAVING count(distinct %s) > 1", conflictingColumns[0], conflict.table, conflict.objectIdColumn, strmangle.Placeholders(dialect.IndexPlaceholders, 2, 1, 1), conflictingColumns[0], conflict.objectIdColumn, ) rows, err := tx.Query(query, primaryID, secondaryID) defer rows.Close() if err != nil { return errors.Err(err) } args := []interface{}{secondaryID} for rows.Next() { var value string err = rows.Scan(&value) if err != nil { return errors.Err(err) } args = append(args, value) } // if no rows found, no need to delete anything if len(args) < 2 { return nil } query = fmt.Sprintf( "DELETE FROM %s WHERE %s = %s AND %s IN (%s)", conflict.table, conflict.objectIdColumn, strmangle.Placeholders(dialect.IndexPlaceholders, 1, 1, 1), conflictingColumns[0], strmangle.Placeholders(dialect.IndexPlaceholders, len(args)-1, 2, 1), ) _, err = tx.Exec(query, args...) if err != nil { return errors.Err(err) } return nil } ```

tiger5226 commented

2018-09-05 02:14:34 +02:00

(Migrated from github.com)

@lyoshenka I split it out into the two methods per your suggestion. I also tested the merge for the flagged edge case and it passes.

lyoshenka (Migrated from github.com) reviewed 2018-09-05 17:45:16 +02:00

lyoshenka (Migrated from github.com) left a comment

im having a bit of trouble following some of this. whats an example in lbry where we'd use this now?

also, could you add some tests for this behavior? sqlboiler already has a lot of features for testing. it would be good to keep using them.

finally, comments like //Grab scanned values for query arguments are unnecessary, and may be a sign that the code itself is not readable. please consider rewriting the code instead so that it reads better.

im having a bit of trouble following some of this. whats an example in lbry where we'd use this now? also, could you add some tests for this behavior? sqlboiler already has a lot of features for testing. it would be good to keep using them. finally, comments like `//Grab scanned values for query arguments` are unnecessary, and may be a sign that the code itself is not readable. please consider rewriting the code instead so that it reads better.

templates/singleton/boil_queries.tpl

lyoshenka (Migrated from github.com) commented

2018-09-05 17:43:39 +02:00

log.Fatal?

templates/singleton/boil_queries.tpl

					
				@ -98,0 +127,4 @@

					defer rows.Close()

					//This query will adjust dynamically depending on the number of conflicting keys, adding AND expressions for each

					// key to ensure the right conflicting rows are deleted.

lyoshenka (Migrated from github.com) commented

2018-09-05 17:43:58 +02:00

why don't we know in advance how many columns will be returned. isn't it just len(conflictingColumns)?

why don't we know in advance how many columns will be returned. isn't it just `len(conflictingColumns)`?

tiger5226 (Migrated from github.com) reviewed 2018-09-06 01:49:18 +02:00

templates/singleton/boil_queries.tpl

					
				@ -98,0 +127,4 @@

					defer rows.Close()

					//This query will adjust dynamically depending on the number of conflicting keys, adding AND expressions for each

					// key to ensure the right conflicting rows are deleted.

tiger5226 (Migrated from github.com) commented

2018-09-06 01:49:18 +02:00

We do, but we can't predict the columns in the code. So we have to be dynamic. One query might have 1, another might have 5. I can't create a generic struct for scanning purposes. Front the research I did online, they said to use this pointer solution for dynamic scans.

tiger5226 (Migrated from github.com) reviewed 2018-09-08 17:52:47 +02:00

templates/singleton/boil_queries.tpl

					
				@ -98,0 +127,4 @@

					defer rows.Close()

					//This query will adjust dynamically depending on the number of conflicting keys, adding AND expressions for each

					// key to ensure the right conflicting rows are deleted.

tiger5226 (Migrated from github.com) commented

2018-09-08 17:52:47 +02:00

@lyoshenka is this acceptable? or were you just curious? If you approve I would like deploy this fix.

lyoshenka (Migrated from github.com) approved these changes 2018-09-18 22:11:04 +02:00

lyoshenka commented

2018-09-18 22:11:39 +02:00

(Migrated from github.com)

Feel free to merge, though you should still remove the log.Fatal line.

Feel free to merge, though you should still remove the `log.Fatal` line.

lyoshenka commented

2018-09-18 22:12:15 +02:00

(Migrated from github.com)

Also, would still love tests for this if possible

lyoshenka commented

2018-09-26 15:11:28 +02:00

(Migrated from github.com)

@tiger5226 should i merge this?

tiger5226 commented

2018-09-27 04:28:40 +02:00

(Migrated from github.com)

No, I was talking with @tzarebczan and we will need to make changes to the schema. There are certain conflicts where we don't want to delete. For example, youtube data. So in this example we need to change the index to not be unique.

lyoshenka commented

2018-09-27 23:19:38 +02:00

(Migrated from github.com)

If its not on github, it didnt happen 😉

What schema changes are you planning to make?

I think youtube_data should probably still be unique. All the code Niko wrote depends on that - it would be a lot of work to change it. I'd prefer allowing Merge to error if there's a conflict, rather than making youtube_data not unique. Then we can handle the merge manually.

If its not on github, it didnt happen :wink: What schema changes are you planning to make? I think youtube_data should probably still be unique. All the code Niko wrote depends on that - it would be a lot of work to change it. I'd prefer allowing Merge to error if there's a conflict, rather than making youtube_data not unique. Then we can handle the merge manually.

tiger5226 commented

2018-09-28 01:21:32 +02:00

(Migrated from github.com)

In my defense I posted that about 2 minutes after we discussed it :) to make sure it was on github haha. I was actually planning on merging it when Tom flagged his concern based on your comment to merge it.

tiger5226 commented

2018-09-28 04:33:46 +02:00

(Migrated from github.com)

I talked with Tom about this more tonight. So this PR fixes the fact that manual merging is required. However, this is for things like stripe where we want this to happen for example. However, youtube data is a special exception potentially. @tzarebczan mentioned that he thinks the youtube data is actually unique per channel not per user. He plans on double checking with @nikooo777 tomorrow.

nikooo777 commented

2018-09-28 05:31:23 +02:00

(Migrated from github.com)

what do you mean by unique per channel and not per user?

the youtube_data table is strictly related to the user table, a youtube_data row can only exist if there is a user associated with it, and only one row can exist per user.

i don't think we have a concept of channels in our database, do we?

what do you mean by unique per channel and not per user? the youtube_data table is strictly related to the user table, a youtube_data row can only exist if there is a user associated with it, and only one row can exist per user. i don't think we have a concept of channels in our database, do we?

tiger5226 commented

2018-09-29 02:41:22 +02:00

(Migrated from github.com)

@nikooo777 I will let @tzarebczan answer your question, but based on what you said, what is the expected behavior when we merge 2 users that each have a youtube account? If we have two records conflict ( meaning the uniqueness keys all match ) we delete the secondary users record. The main concern is that we would have two accounts for two users that request to be merged into one. I think, we should delete the secondary youtube_data until we support multiple youtube accounts per user. What impacts could this gave?

nikooo777 commented

2018-10-01 04:38:33 +02:00

(Migrated from github.com)

I don't think deleting data is acceptable in this case. I'd much rather drop the uniqueness constraint as I had previously proposed in an issue on internal-apis.
What this implies is that we'll have to rework rewards to allow a user to claim the YouTube rewards multiple times and verify that nowhere else we depend on that constraint.

I don't think deleting data is acceptable in this case. I'd much rather drop the uniqueness constraint as I had previously proposed in an issue on internal-apis. What this implies is that we'll have to rework rewards to allow a user to claim the YouTube rewards multiple times and verify that nowhere else we depend on that constraint.

tiger5226 commented

2018-10-02 01:52:48 +02:00

(Migrated from github.com)

the youtube_data table is strictly related to the user table, a youtube_data row can only exist if there is a user associated with it, and only one row can exist per user.

I was referring to this comment. I would prefer that we drop the constraint as well and allow multiple channels. Does this mean that this PR is blocked by that or can this be done until that is handled? How common is it that we merge two users with different youtube channels? Also is there an issue for this?

>the youtube_data table is strictly related to the user table, a youtube_data row can only exist if there is a user associated with it, and only one row can exist per user. I was referring to this comment. I would prefer that we drop the constraint as well and allow multiple channels. Does this mean that this PR is blocked by that or can this be done until that is handled? How common is it that we merge two users with different youtube channels? Also is there an issue for this?

nikooo777 commented

2018-10-03 22:45:58 +02:00

(Migrated from github.com)

probably not my call to make, but I'd vote for blocked. I don't agree in ever dropping any youtube_data rows.

Opinions @lyoshenka and @kauffj ?

probably not my call to make, but I'd vote for blocked. I don't agree in ever dropping any youtube_data rows. Opinions @lyoshenka and @kauffj ?

tiger5226 commented

2018-10-04 00:16:45 +02:00

(Migrated from github.com)

Per @nikooo777 from slack, issue for multiple youtube accounts is https://github.com/lbryio/internal-apis/issues/323, will block this by for now. @alyssaoc just want to make sure this is on your radar.

tiger5226 commented

2018-11-12 03:17:04 +01:00

(Migrated from github.com)

We have a PR in review https://github.com/lbryio/internal-apis/pull/639. Once this PR is merged we can potentially merge this as well.

tzarebczan commented

2019-01-25 12:41:43 +01:00

(Migrated from github.com)

@tiger5226 how comfortable do you feel merging this? Should we still run through a few scenarios locally before doing so?

tiger5226 commented

2019-01-26 01:27:16 +01:00

(Migrated from github.com)

Yes! This can be merged now! I am really confident. Nonetheless, I would still want to retest it since it's a months old issue.

Sign in to join this conversation.

No reviewers

lyoshenka

No milestone

No project

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: LBRYCommunity/sqlboiler#3

Rows
Columns