restrict access to database: How to run Search on 800,000 records

Hi !!

We are doing a project where we have a table with 800,000 records. We need to implement a search on this 800,000 records. How do we do that?

Assue State = NJ

Criteria: Select County, Select City, Enter Zip Code, First Name and Last Name

This should bring up Information about the person.

How can we implement it efficiently ?

Thanks

Things that will increase performance of a query with a lot of records:

- More CPU power / Ram / faster hard drives on the database server

- Proper use of indexes on the affected table.

- Use of stored procedures or other method that results in only the specific page of the record you are looking for being sent from the database server to the web server - for example -- if you wanted records 1-10 of a 800,000 record query, make sure whatever method you use only returns 10 records.

The web sitesql-server-performance.com has an abundance of tips that might help.

Jason

|||

Jason,

We have dedicated Database sevrer with good RAM, CPU, HDD etc.

Its like people search ...

I showed you my selection/search criteria. Based on that Criteria, we will get 1 or more records. I don't want Query to take long time.

Where should we have our indexes? How can we write query

For eg. Query Like this will Help us?

SELECT ContactInfoID, FirstName, LastName, HouseNo, StreetName, City, State, Zip, County, Municipality, HomePhone
FROM ContactInfo
WHERE County = COALESCE(@.County, County) AND
City = COALESCE(@.City, City) AND
Zip = COALESCE(@.ZipCode, Zip)AND
FirstName = COALESCE(@.FirstName, FirstName) AND
LastName = COALESCE(@.LastName, LastName)

|||

Hello,

Basically if that's the query that generates the results you want, than that's the query you should use. Focus on getting the results you want first, then use the "Index Tuning Wizard" with your query in SQL Query Analyzer to determine what indexes would be helpful.

Jason

|||

Hey !!

I indicated one of the query.. if there is a better way to query I would like to know that .

If we have multiple people with this query, we ask one more question like DOB and that will narrow down records.

|||

Basically you need to figure out which queries you'll most commonly be doing against this table of your database. As you mentioned you'll supply info as you have it, dob etc.. but are their some fields that you'll almost always be using? If so definitely place indexes on these fields this will speed up your query performance. If you find there's one that you use predominantly i would make it a clustered index which will further speed up your performance.

Also, I would write my application query as best you can, try not to use double wildcards ie (select * from table where myfield = '%somevalue%') queries like this will blow the indexes and do full table scans. If you place an index across more than one field if you do not always query it by the fields in your index this also "blows" the index and will result in a full table scan. Moral of the story, use the indexes as you create them (in the same fashion etc..)

Once your app is built, i would use sql profiler to watch the queries, be sure to grab the query plans from sql profiler. Anywhere that it says FULL Table Scan, go back and apply an index if possible to eliminate these full table scans.

Hope this helps

|||

If these are the results you are looking for, then there is no better query! :) If these aren't the results you are looking for, then the simplest query (uses the least functions, joins, etc) that returns the least # of results is the best.

Jason

|||

Open the Database then click on tools then click on wizards, then Index tuning wizard it will give you hints on indexes you need on the table. Make the size of your clustered index small and you can create 249 none clustered indexes on the table. The wizards tool is context sensitive you may not see it if your database is not open. Hope this helps.

Kind regards,

Gift Peddie

|||

I have read the article and implemented the Dynamic Sql. We are giving users capability of Ad Hoc Query on 800,000+ records. Sometime query takes long time to return results and we have time out issue. OR sometime return results are too many (150,000) records.

We have designed Ad Hoc Query tool with idea of

http://www.aspose.com/Products/Aspose.AdHoc/Demos/

Since it is Dynamci SQL I can't have indexes. Can someone help how to reduce execution time on Dynamic SQL?

Our Dynamic SQL Looks like this

CREATE PROCEDURE TEST

(

@.whereClause varchar(4000)

)

DECLARE @.sql varchar(2000)

SET @.sql = 'SELECT FirstName, LastName, Address, City, State, Zip, County, Township, Phone

FROM VotersProfile

WHERE + ' @.whereClause

"whereClause" is build into C# code based on the columns user select in Tool

For eg @.whereClause is FirstName = 'John' AND 'County = 'Bergen' AND (Gender = 'M' OR Gender = 'F')

Can partitioned view or some other technique can help me? Its like search.. can we you some other search techniques?

Thanks

|||Tip: Use FieldName IN('Value1', 'Value2') instead of multiple ANDs...|||

Ok, let's see...

Sometime query takes long time to return results and we have time out issue. OR sometime return results are too many (150,000) records.

Your problem, really. If you get too many results, limit the number of results you get. Say you want to only get 500 rows, limit the returning rows to 501 - fif you get 501 tell the user "more than 500 items were returned" and be done. One of the clasical SQL problems - people not limiting their results to what they need.

If query processing takes too long then put indices on the table, and/or normalize the data into multiple tables.

Since it is Dynamci SQL I can't have indexes.

Ok, where exactly did you get this idea? The whole genius idea of SQL databases was that you define query and indices separatly and that the query analyzer can then decide which indices to use or not to use. In the times before SQL people had to manually select which indices to use (in which order).

That you can not have indices as you use dynamic SQL is silmply not true. Get a beginner book about SQL and read it. Actually you CAN have as many indices as you want, and the system will use as many as it thinks make sense (sometimes an arguable decision).

Now, going down to your problems....

There is a lot you can do to speed up your queries. First, do you use the proper database structure? For a fully query optimized system I would propably use four to six different tables, at elast to identify a person, putting it into NF5 (5th normal form) This would significantly cut down the size of the data and make processing much faster, by having less string comparisons. It is a terrible mess to update then on most databases, but on SQL Server an instead of update query on a view can handle the updates nicely without the obvious use of a stored procedure.

Then, second, can you identify "main query parameters"? Given your six parameters - are there some people will always use? This can be important for the main selection axis for the indices.

Or, just being lazy, run 10000 queries and just let query analyzer make recommendations. It is not perfect, though.

Your static SQL Statement that you gave, though, is a classical beginner mistake example:

WHERE County = COALESCE(@.County, County) AND
City = COALESCE(@.City, City) AND
Zip = COALESCE(@.ZipCode, Zip)AND
FirstName = COALESCE(@.FirstName, FirstName) AND
LastName = COALESCE(@.LastName, LastName)

See, the problem is your use of "Coalesce" and the field on the right side. Take "City =COALESCE(@.County, County)". This statement needs to be evaluated for every row, despite of whether @.City has a value or not. In order to optimize this out, SQL Server's Query optimizer would have to identify that:

* Coalesce returns the same value for the same type of input. It propably can do it. NOte though that this would mean one evaluation for every different city name stored, at least.

* Coalesce returns the same value regardless of the field entered (second parameter) if @.City is not null. Now, it definitly does not do this.

Result: brutally said you force a table scan.

Beginner rule: try to formualte your queries in such a way that database row fields only appear on ONE side of the condition (normally the left one) and do not involve ANY processing.

You break the rule by having field references in both sides, which results in the same logic, logically, ut it forces a table scan and makes any index totally useless.

And it does so because you do not use dynami SQL. Static SQL and dynamic query conditions simply to not match.

I can see you already got rid of this in the dynamic SQL you use (shown later).

Well, all I can suggest then I do dump the table and come up with a normalized form. Last time I had to do something like this we reduced the database size by about 50% and performance really screamed up. Keeping all the data in one table does not follow proper procedures for query optimized databases - that simple. The data is not normalized. Not being normalized the query otimizer is really limited in what it can do.

|||

Thanks for the suggestions. Our problem is Voters Profile table is coming to us every year from Political Parties. They have same structure. As a result we can't normalize it.

Table has about 25+ columns:

In case of GXX G=General Voting XX = Year and PXX P=Primary Voting XX = Year

We created 2 computer columns G00_04 & P00_04 i.e. How many times person voted in last 4 yrs

FirstName, LastName, Address1, Address2, City, State, Zip, Zip4, County, Township, district, ward, HomePhone, Ethinicity, Gender, DOB, Age, Registered Party, VotingID, MailID, G04, P04, G03, P03, .... G96, P96, VotingHistory

Out of this we gave user to do "Adhoc Search" based on any or all combination of County, Township, Zip, Gender, Age, General Voting and Primary Voting

In the Result Set we display Name, Address, city, State, Zip, HomePhone, General Voting and Primary voting

As I meantioned Where Clause is composed in C# code and passed to Dynamic SQL

CREATE PROCEDURE TEST

(

@.whereClause varchar(4000)

)

DECLARE @.sql varchar(2000)

SET @.sql = 'SELECT FirstName, LastName, Address, City, State, Zip, County, Township, Phone, G00_04, P00_04

FROM VotersProfile

WHERE + ' @.whereClause

So if they Select only County is Where Condition

@.whereClause = 'County = ''Atlantic''' and result may be 90000 records.

But if they add some more condition like 1) Age Between 25 and 35 2) Gender = M etc they get less records as query is more specific.

We need general way to optimize the performance. I know the COALESCE but, if some values are NULL they will not show up in result set.

Reading the Dynamic SQL articles we decided to follow them and implement it. Now, we need to know how to do performance tunning.

I have seen that some query result into 4 table scan. How can we reduce that.

I hope this should help to understand our problem and giving suggestion !!

Thanks,

|||Why not use DTS to take the supplied data and normalize it upon import? That way your queries will execute faster. :)
Jason
|||

Our need is growing from 800K records to 4 million record. Is there a better way to search ?

How about Partitioned view/table ?

|||Change your table layout to 5th normal form.

Friday, March 23, 2012

How to run Search on 800,000 records

No comments:

Post a Comment

restrict access to database

Blog Archive

About Me