-
Flexviews 1.6.0-RC1 is released
Whats new in Flexviews 1.6.0RC1This is the first release candidate before the final release. If no major bugs are uncovered, then the next release will be the first GA release. Flexviews now has a test suite for all major features. The creation of these tests uncovered a number of issues which have been resolved in this release. All MySQL aggregate functions except GROUP_CONCAT are now supported. A special aggregate function called PERCENTILE is now also supported. The calculation uses a modified version of the GROUP_CONCAT based solution suggested by Ronald Bouman for percentiles. This function should be considered experimental. Please report bugs if you find any.You can add indexes to enabled materialized views using SQL_API/add_exprAdding PRIMARY KEY indexes is no longer supported. All views get an auto_incrementing primary key. You can add additional UNIQUE indexes instead.There is an upgrade process from 1.5.3b (see UPGRADE and upgrade.sql)Significant bug fixesViews with aggregate functions but no GROUP BY columns now work properly for all supported aggregate function typesNULL values in GROUP BY columns are now properly supportedNULL values now work properly with distributive aggregate functions There is a wrapper script around run_consumer.php which can restart the consumer if it stops runningAs always, get it at:http://sourceforge.net/projects/Flexviews
-
Getting temporal configuration values into date-range value equivalents
I collect a lot of configuration values from my database servers and most of these values are stored by date.
So often I end up with values such as:
config_date config_value
2010-09-01 value_1
2010-09-02 value_1
2010-09-03 value_2
2010-09-04 value_3
2010-09-05 value_3
2010-09-06 value_3
2010-09-07 value_3
2010-09-08 value_4
2010-09-09 value_4
2010-09-10 value_1
2010-09-11 value_5
2010-09-12 value_5
2010-09-13 value_5
I´ve been unsuccessfully been trying to figure out how to convert this in SQL into something like the following:
config_from config_to config_value
2010-09-01 2010-09-02 value_1
2010-09-03 2010-09-03 value_2
2010-09-04 2010-09-07 value_3
2010-09-08 2010-09-09 value_4
2010-09-10 2010-09-10 value_1
2010-09-11 2010-09-13 value_5
The second format is often much shorter if configuration changes are not that frequent. While this is straight forward to do in a programming language in a single pass once you have the config_date ordered data, I can’t figure out how to do this with SQL and no programming. Celko´s books haven’t enlightened me either.
So do you know how to do this?
-
A gentle introduction to CouchDB for relational practitioners
CouchDB is a document-oriented database written in Erlang that addresses a particular “sweet spot” in data storage and retrieval needs. This blog post is an introduction to CouchDB for those of us who have a relational database background.
A CouchDB database doesn’t have tables. It has a collection of documents, stored in a B+Tree. A document is a collection of attributes and values. Values can be atomic, or complex nested structures such as arrays and sub-documents. When you add a document to a database, CouchDB stores it in the B+Tree, indexed by two attributes with special meaning: _id and _rev.
CouchDB lets you store related data together even if it isn’t all the same type of data; you can store documents representing blog posts, users, and comments — all in the same database. This is not as chaotic as it sounds. To get your data back out of CouchDB in sensible ways, you define views over the database. A view stores a subset of the database’s documents. You can think of them as materialized partial indexes. You can create a view of blog posts, and a view of comments, and so on. Each view is another B+Tree. It stays up-to-date with the changes you make to the database.
You can structure your documents anyway you want. There is no fixed schema. If you decide after a while that you want to add tags to your blog posts, you can simply write new posts with a collection of tags and save them into the database. Old posts won’t have tags, but that’s OK; if your application code can read the old format and write the new format, you have an application that doesn’t need a fixed schema.
Updates are never done in-place. Everything is copy-on-write. New revisions are saved into the database as new documents, obsoleting old ones, and CouchDB increments the _rev property each time. To update a document, you fetch it, change it, and send it back, specifying the _id and the most recent _rev. If someone else changed the document in the meantime, your _rev is stale, and your update fails. You must re-fetch and re-save; you can’t lock a document.
CouchDB runs on HTTP and JSON. All of its operations, such as store and retrieve, are standard HTTP requests. The documents themselves are represented in JSON. You can talk directly to CouchDB with curl, Ajax, and anything else that can speak HTTP. There is no “protocol” other than this. CouchDB isn’t just Web-friendly, it is actually made of the same technologies that the Web is made of. You query CouchDB by specifying the database, document ID, view name, and so forth directly in the URL. For example, to fetch a blog post document from the “blog” database, you might issue a GET /blog/helloworld. Queries against views and other objects have simple clean URLs, too.
CouchDB uses special documents, called “design documents,” to store JavaScript code in the database. The code defines the views I mentioned earlier. Another thing you can store is validation functions. This is code that CouchDB executes when you save a document to the database. It accepts a document as input, and can reject it, so you do have control over the schema of documents — it doesn’t have to be a free-for-all. In the blog application, you can have a validation function that starts by enforcing “every document must have a ‘type’ property, and its content must be one of (post,user,comment).” Then you can have separate validation logic for each type of document.
Design documents can also contain something called “show functions.” CouchDB will execute the function’s code in response to HTTP requests to that URL, and send the resulting data back as an HTTP response (as usual). With show functions, you can store entire applications inside the database. Your browser might never even know that it’s talking to a database directly, instead of a web server with a database behind it.
CouchDB isn’t designed for arbitrary queries at runtime. You can only query one view, show function, or database at a time. You can’t do joins. You can’t do arbitrary GROUP BY and ORDER BY. You have to decide in advance what operations you’re going to need, and build views for them. You can then issue requests to those views, essentially the equivalent of key lookups and range scans with a few basic options such as an offset, limit, and reverse order. Now, having said that, you can define views that reduce the database down to aggregates, create a custom ordering, and so on. You can define the equivalent of the relational “project” operation inside your view code.
Here’s how: a view is a map-reduce operation. A view is defined in two parts, the map and the reduce. The map is not optional; it generates the contents of the view. It is a JavaScript function. CouchDB iterates over the database and feeds each document into the function, collects the results, and inserts them into the view’s B+Tree index. Inside the view function’s code, you emit key-value 2-tuples.
The key will identify the tuple in the index that’s built to store this view. It can be simple or complex, so you can create a view that’s keyed by [this,that,the_other_thing]. The view will be ordered by the same thing; that’s how B+Trees work.
The value you emit is whatever you want the B+Tree to store at its leaf nodes, and can also be complex (it’s a document, like any other).
The “reduce” part of the operation is optional. It computes what is stored in the non-leaf nodes of the B+Tree index. For example, you can use it to create aggregates, such as summing up counts of comments. In addition to the reduce part of the code, the is a “rereduce”. The rereduce is called as the operation is invoked on higher and higher non-leaf nodes, all the way to the root of the tree. CouchDB knows how to take advantage of the data that’s stored by these reduce and rereduce operations, so for example, it doesn’t necessarily have to descend all the way to the leaf nodes and scan in order to count how many documents match a particular query.
An important thing to know about all this code is that nothing is allowed to have side effects. You can’t modify the database in a view definition, for example. Documents are immutable; it’s all copy-on-write. You get input; you can specify output; that’s it, period. It’s a form of functional programming. Why do we care? Because it keeps things simple and elegant, and enables all kinds of nice properties and functionality, such as replication and eventual consistency and cache expiry and scaling to multiple nodes and so on.
The database file is append-only. Old versions don’t automatically get cleaned up. The database grows forever until you compact it. This process builds a new database and then does a swap-and-discard. The append-only, copy-on-write design makes backups easy, and data corruption unlikely.
CouchDB comes with a “graphical user interface” called Futon. It’s built right into the database, and surprise! — it works through HTTP and Ajax. You just fire up CouchDB, point your Web browser to /_utils, and go. It’s a fun way to explore CouchDB.
With all that in mind, why would you want to use CouchDB instead of a relational database? For most things I’m involved with, I want a relational database. But I got asked recently to help with a database that’ll store records about people. Although nobody has implemented anything yet, it’s a terrible match for a relational database, and an excellent fit for a document-oriented one. The inputs are going to be arbitrary documents with different structures, such as census records, birth records, tax records, estate and probate records, marriage records, and so on. Nobody knows what it’s going to store in the future. When people build “flexible schemas” in relational databases, they usually go for the so-called EAV or EBLOB models. In other words, they aren’t using the database relationally at all, and it simply doesn’t work well. This type of project needs a document-oriented database.
I’ve left out a lot of important details, but the point of this post is to understand the high-level CouchDB concepts and how they’re implemented, so you can reason for yourself about it. If you’ve read this far and you think that CouchDB might be a good fit for your needs, I encourage you to take a look at CouchDB, The Definitive Guide.
Related posts:NoSQL doesn’t mean non-relationalA review of SQL and Relational Theory by C. J. DateKickfire: relational algebra in a chipAn introduction to InnoDB error handlingHow to use extended properties as documentation with sp_showdoc
-
MySQL GIS – Part 3
TweetWhat data is available?
GEO data is expensive to create, so has been created by governments. In the past governments charged for this data. In 1980 the USGS was charging $300 (usd) per county for Oklahoma GEO data. (I complained to my congressman.) Today, a quick Internet search turns up lots of free GIS data.
I was hoping to find a neat collection of basic GEO data. It would be nice if there was one place you could get world political borders (Polygons), postal codes (Polygons) and points of interest like hospitals and airports. What you can find is lots of lists, often collections of odd data created for a virility of complex political purpose. For example, The Global Change Master Directory is a large list of data sources on earth and climate change, but you will not find the data here.
Remember, as you dig for data there are two types Vector (text) and Raster (pictures). Most sites don’t distinguish between them or combine them for you making them less useful your your own uses.
If you find good sources of GEO data PLEASE share them with me so I can share them with everyone else.
Here is a short list of the sites I found and used to create my test data in my series of posts.
www.cloudmade.com – has shape files for the entire world with administrative, natural, coastline, water and points of interest. The data some from Open Street Maps and is available under the Creative Commons Attribution-ShareAlike 2.0 license.
data.geocomm.com -
gisdata.blogspot.com – Public domain GIS data and Free GIS data repositories and clearinghouses.
collinssoftware.com – http://www.collinssoftware.com/freegis_by_region.htm
census.gov/geo/www/tiger – http://www.census.gov/geo/www/tiger/index.html
http://www.hostip.info/dl/index.html
I live in the state of Oklahoma. Because I know it well, I’m using it for my examples. My search for Oklahoma GIS data turned up these sources. This should give you some idea of the data you might find in your searching. I found county borders (polygons), a list of hospitals (points), city borders (polygons), points of interest (points) and zip codes (polygons). Each of these came as shape files so the process was simple to get the data into MySQL.
libremap.org/data/state/oklahoma/ – View the entire list of USGS Oklahoma Digital Raster Graphic Maps
geo.ou.edu – County Boundary, Municipal Boundaries, Voting Precincts, School Districts, State House Districts, State Senate Districts
tin.er.usgs.gov/geology/state/ – Oklahoma geologic map data
okmaps.onenet.net – 25 digital-map data sets, known as the Digital Atlas of Oklahoma
Adding these to my ‘geo’ database was simple. I ran each .shp file through the ogr2ogr program.
wget http://www.okladot.state.ok.us/hqdiv/p-r-div/maps/shp-files/munibnd.zip
unzip munibnd.zip
ogr2ogr -f "MySQL" MySQL:"geo,user=root,host=localhost,password=" -nln oklahoma_cities -lco engine=MYISAM munibnd.shp
More examples on what you can do with GIS data and MySQL.
Viewing our GIS data.
How to collect your own GIS data.
Good and bad examples of searching GIS data.
Optimizing MySQL GIS. Is it really worth using?
Data sources shared by users.
-
dbbenchmark.com – MySQL (basic) connection pool support added
In this latest release I’ve added a basic MySQL connection pool to the benchmarking script which improves the method in which connections to MySQL are handled and reused. In addition, there have been some optimizations made to the thread handler functions for better debug reporting. Download the latest release now and see how your MySQL server performs against the rest of the community! Download here: download page.
|