Click here to Skip to main content
15,887,746 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I have a query generated from a MSTR report which needs to be run to look into 300+ product ids. The product id is a prompt in the report and hence is always in a where condition and in several different passes like this:
SQL
where (a19.list_nbr in (1696079, 1696078)
 and a11.gtz_purch_amt > 0))

or
SQL
from T5FDY1URMMD002	pa11
	join	list_active_trade_item_grp_v	a12
	  on 	(pa11.grp_nbr = a12.seq_grp_sort_nbr)
where	(a12.list_nbr in (1696079, 1696078)
 and ((pa11.combined_lgl_entity_nbr not in (35)
 and pa11.GODWFLAG5_1 = 1))))

And so on with different passes. Now there's one option that I have in mind is I can put in all the product ids in the where clause using 'IN' condition, but there are 300+ ids and I don't think using 'Where IN' condition is a very optimized. Is there a better way to do this?
We use Netezza database to run the queries.

What I have tried:

I haven't tried anything yet. Like I mentioned there is one option of using where condition maybe in batches, but if I want to run all together it could be a problem.
Posted
Updated 26-Apr-21 8:27am
v2

1 solution

Unfortunately I'm not familiar with Netezza but I'd like to share some thoughts in general.

Very often IN comparison has a maximum number of items you can place inside the comparison. While 300 doesn't sound so many I'd run some tests just to make sure that you're not hitting any limit or close to one.

IN comparison is basically just another way of writing multiple OR's so performance wise what applies to OR's typically applies to IN, taken that sub-queries are not involved.

From optimization point of view, since you have many values inside the IN comparison, I hope that the optimizer chooses a full scan to the table. Using an indexed approach would probably suffer from performance issues if the selectivity of the comparison (in whole) is low. Often an efficient, sequential full scan performs very well since the database engine only needs decide for each row read if the condition applies or not, and the move to the next row. Of course if another column in your conditions has a high selectivity, then it's a different story.

Using binds would be preferred. While it has many other benefits it also allows optimizer more easily to reuse plans when same amount of variables is used.

If you hit performance issues or want to prevent them, I would concentrate on the join order, which table is driving the query and in what order other tables are joined. Depending on the cardinality of the column, it could be efficient to start from this table and then join other tables to it.

Hope these help.
 
Share this answer
 
Comments
mrinalbh 26-Apr-21 16:10pm    
Hey thanks so much for giving more info! Yeah im dont think IN would be a smart move. What im trying now is using the table that have these ids and joining it with the main report to get the results. While doing that I have also broken down the metrics making sure im not running all of them together. So far it seems to be doing just fine. Im running more tests with different metrics to see how well it works. Really appreciate your inputs!
Richard Deeming 27-Apr-21 4:27am    
Bear in mind that if you store the chosen values in a regular table, and multiple users run the report at the same time, their selections could overwrite each other. You'd either need to use some sort of key to identify the session, use a temp table to store the selected values, or pass the values in a table-valued parameter.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900