Hi everyone - first post and question about SSAS so be gentle
I am attempting to build a cube with sample warehouse data of a handful of dimensions (including date) and one fact table. Popular data warehouse insertion (from what i've read - although I could be wrong) is based on an initial load of data then incremental updates based on the changes and the date of change. This would then present the problem of duplicate rows in the warehouse, and therefore the cube???
One of the reporting requirements from the cube is to report on a given date, what are the total financials for a given selection claims, grouped by area. If an initial then ongoing updated load of data is utilised, how would I be able to query via date (today or otherwise)? Because of the duplicate rows in the cube, would a claim be counting a financial value twice?
Any clarification needed, please let me know.
I would assume your fact table has a primary key, if you are using aggregated value you still need a unique key, just insert any records that do not exist. Or if you are working by date, query the cube for existing dates.
Never underestimate the power of human stupidity
Hi - thanks for your reply.
The fact table has a composite PK - FactDateID (the date of insertion) and DimRepairID (each repair is unique in the OLTP system). These combined will give a unique reference for a row.
My issue is how I solve the issue of displaying the correct data in a cube when I choose a date. For example:
RepairID 1 and date of 20140615 is inserted into the fact table. The financial value associated with this is 10. No new row is inserted for the 16th of June, as there was no activity on this repair. However on the 17th of June, the financial value was updated to 20 by a user in the OLTP system and therefore transferred to the DW. A new row of 20140617 with RepairID of 1 and value of 20 is then inserted.
Bearing in mind that there will be many RepairIDs in the fact table and some will have changed recently and others not, how do I enable users to pick a date and see what rows were what financial values at a given point in time?
At the end of the 90ties we somehow got into ORM. At the time - no matter how hard we tried - we always ended up with a messy and complicated code and filthy compromises. And it hasn't changed. But somehow frameworks like NHibernate and EntityFramework are becoming very popular.
So here is my rant.
Some time ago I worked on large projects and things were pretty predictable. You've got DB model. You generate your SQL procedure layer and your C# layer. Then you create your component / web service / wcf service / restful service and serialize results of your C# calls as POCOs.
Now try to do this with fancy NHibernate objects with auto-resolving proxy objects for related entities. It won't work. Because when objects are serialized their auto-resolving lazy evaluating proxies aren't. To solve it you duplicate /I'll write it again, for drama effect: duplicate/ your objects to create serializable POCOs. And then you create them CRUD functions on top of object models. Or even separate the entities /drama: dereference them having no two entities connected/... And, hey, you are back where you were with the stored procedures - only with lousier performance and three layers of crap on top of it.
So next time someone comes with a fancy-schmancy ORM wrapper it better already include web service / wcf service or restful service abstraction and work on top of it; rather then bellow it! Because otherwise we just off-load drudgery to the web services and call it "business layer" when in fact it is really a freaking "ORM back to stored procedures layer."
In 2005, I inherited a project which made use of NHibernate. Wow, did that look fancy! How easy you could add data to a datagrid and move in along foreign key relationships...
But there was a catch (no, it did not return 22): every foreign key relationship was treated with referential integrity by NHibernate, even when there was no referential integrity designed into the database. And working in a really-existing company of today's capitalism, reality did not always match theory (that's way we could not use the referential integrity on the database). We had big problems when data had to be inserted into a "dependent" table while the corresponding data in the "master" table were not yet present...
With that experience, I still do avoid ORM, though I believe things might be less cruel nowadays.
I've been messing with this for hours today. I'm trying to do this in 1 call to the database. Normally I would just get a list of unique items first, and then go back and get the numbers. Perhaps I'm trying to do something that can't be done, or it's just beyond my knowledge level, I'll take the latter of thought.
I need to get the qty sum and records of a sales history file
So I wrote this to get the count of unique items which is 2. I'm not sure if my code returns the right number or not, because the database file contains thousands of records. I get 14, not reflective of the example above.
SELECT COUNT(FITEMNO) AS cCount
GROUP BY FITEMNO
SELECT COUNT(FITEMNO) AS hCount
GROUP BY FITEMNO
I wrote this to get the records, in which I get 36. I'm just testing on the history file, so there is no union to join the 2 database files yet. I keep getting 36 records with duplicates instead of 14 unique records.
SELECT FITEMNO, FSHIPQTY AS hItems"
SELECT h.FITEMNO, h.FSHIPQTY FROM ARTRS01H.dbf h WHERE h.FCUSTNO=@FCUSTNO GROUP BY FITEMNO, FSHIPQTY
I tried that at first but got an error
(missing operator in query expression 'COUNT(distinct FITEMNO)
And did research and ended up with the example in my post.
So I thought it was too advanced for the old foxpro or it was a OLEDB thing
I ended up doing something similar. I wrote one function to get the distinct items, and went back and got the sums with the distinct item list. I don't know what I was thinking, was trying to do it all in one shot.
I finally got it in 1 shot. Runs super fast now.
Customer complained about the 5 minute run time, so I took another stab at it.
Don't know why it I got it this time, perhaps the nap time and the beers!
, SUM(v.FSHIPQTY * v.FPRICE)
, (SELECT FDESCRIPT FROM ICITM01.dbf WHERE FITEMNO=v.FITEMNO) AS FREALDESC
FROM ARTRS01H.dbf v
GROUP BY v.FITEMNO
A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 0 - No connection could be made because the target machine actively refused it.) (Microsoft SQL Server, Error: 10061)
Simply the error means that the machine you try to connect is exists but no SQL server (service) can be found on it...
1. check that the machine name/ip address is the right one
2. SQL installed as named instance? In that case you may add the instance name to your address
3. SQL using the default port (1433) or it installed with a different one?
4. You may have a firewall between you and the SQL, check it and open ports as needed...
I'm not questioning your powers of observation; I'm merely remarking upon the paradox of asking a masked man who he is. (V)
With SQL people often say you shouldn't do 'SELECT *'. I tend to write highly optimized and selective queries. Do others do selective queries or do you think this is not necessary? The benefit will obviously vary depending on the size and usage of the table. I'm considering simplifying my architecture by doing all of the SELECTing on the web server. There will always be special cases eg. massive tables, but everything is fairly small in this particular application.
There are two main benefits I can see from selective queries, less data is transferred and covered indexes can be much smaller. I'm not sure that the amount of data makes much difference when we'll only be getting one screen (eg. 10-50 records) of data at a time.
It would be great to hear what others do and think.
If you want all the columns, then I see no problem with using *. The primary issue is when you use * even when you want only a few columns, and some of the unneeded columns contain large data. This can also happen when a new column is added.
Additionally, there may be times when a column is removed or renamed -- this will likely cause a problem, but do you want the problem to be reported when the data is queried or farther downstream? Early detection is probably better.
Someone here (other than me) wrote a good rant against * some years back, but I'm having trouble finding it.
To add to the other comments, using 'select *' can also cause issues in columns are added or the order rearranged. If your application is expecting data in a particular column, then it may not be there; if your application is not expecting the columns that have been added, why bother spending the effort to retrieve the data and parse out the unwanted column?
With SQL people often say you shouldn't do 'SELECT *'.
I also tell people not to run their stupid queries without starting a transaction that can be safely rolled back.
Again, you DON'T do a SELECT *. It's not that you save a lot by omitting a DateTime column - but it would prevent that blob-field of 2Gb each that was added last month to be pulled over the network with each and every friggin' request, killing the network and the database-server. Or a nice calculated field that cripples the DB-server.
It doesn't take much time, and makes the application a bit more robust. Makes it easier for me to debug when I get thrown in your team as a maintenance-programmer.
Yes, it takes extra time, but it has a good ROI. It's not a religious thing - I won't go medievel if you do a simple "SELECT * FROM". Still, if you do it in a query that contains several joins you'll get this lecture, as each extra table means another chance at pulling columns you don't need.
Bastard Programmer from Hell
If you can't read my code, try converting it here[^]
Last Visit: 31-Dec-99 18:00 Last Update: 19-May-22 1:23