I needed to compare Lotus Notes data with other data residing outside of Lotus Notes. Except for the comparison each matching record also had to be updated. The amount of data was rather big. The non-notes data consisted of 1,15 million records and the Lotus Notes data of 76000 records. I needed a union of the data on a specific id that both sources had.
I came up with two approaches.
#1 Read the data from excel files or an ODBC-connection and compare each of these rows with each of the Lotus Notes rows until there is a match. This algorithm can of course be tuned and and simplified to make it go faster. Nevertheless I would say that Lotus Notes is not the master in working with large amounts of data and this operation would probably take days. And although it would work and I would get both the comparison and the update at the same time it has to many cons:
- It would be too time-consuming
- It would block my entire Lotus Notes client.
- It would make it difficult and time-consuming to run several different comparisons
#2 Get both of the different sources into a relational database and do the comparison there. Then either take the data directly from there to where you want it or export the matches and import into Lotus Notes again to perform the updates (using approach #1). This is a two-stage process but the pros for this is that:
- Working with data is easy in a relational database
- It is fast
- Due to the above bullets I can make several different processing to reach the best result.
- I can still work with my Lotus Notes client.
So I used approach #2.
I imported the data from Lotus Notes and the other source into a MySQL-database that resides locally on my computer. For the import I tried out different tools like and . Both are good tools but to be able to use the import feature you would have to do it either in the trial period or to buy the product.
Instead it is both cheaper and pretty easy to use the built-in functionality of . You can run the queries below in the MySQL Query Browser.
Both of the data sources in my example were currently in csv-format. It is for example easy to get data out of Lotus Notes in csv-format using the built-in export functionality on a view. So now we use the 'LOAD DATA INFILE' command for both of the csv-sources to get the data into the relational database. For this example assume we create two tables named 'table_name_1' and 'table_name_2' and read data into them.
| LOAD DATA INFILE 'file_to_import.csv' INTO TABLE table_name FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '\\' (field1, field2, field3) |
This command reads all records from the file 'file_to_import.csv' to MySQL table 'table_name' to the specificed fields 'field1', 'field2' and 'field3'. We are also specifying that column values in the csv-file are separated by commas (,) and if a column value consists of more than one word it will be incapsulated by apostrophes (").
So we do this for both data sources and these will both be in one MySQL table each - 'table_name_1' and 'table_name_2', Now we come to the actual data processing. We want to compare/join data between the two tables and also being able to handle the result in some way. This is done using MySQL's INNER JOIN command:
INSERT INTO result_table (result_field1, result_field2, result_field3) table_name_1.field1, table_name_2.field1, table_name_1.field3 FROM table_name_1 INNER JOIN table_name_2 ON table_name_1.someId = table_name_2.someId WHERE table_name_1.someId != ""
|
We are inserting the results of an inner joined select query into a results table 'result_table'. This table will contain all the matches. Processing this query with 1,15 million record in one table and 76000 records in the other table took about 5 hours on my local machine. However if you index the fields in both of the tables that you need to compare to each other, the processing is much faster than that.
Now we want to get the data out of the database to get it into Lotus Notes and update the data there. Use the 'OUTFILE' command for this.
SELECT result_field1, result_field2, result_field3 INTO OUTFILE 'results.csv' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '" LINES TERMINATED BY '\n' FROM result_table' |
This command writes the specified fields into the file 'results.csv'. Each field is separated with a comma (,) and any fields containing more than one word will be enclosed by apostrophes ("). When all fields for a record has been written to the file it will be ended with a newline character (\n).
In both cases when we read and write to files the files are by default on a windows system located in the mysql data directory for the specific database that you're working on. So in this example if the database's name is 'db3' the default path would be 'C:\Program Files\MySQL\MySQL Server 5.0\data\db3\'.
As another sidenote. Both getting the data into MySQL from a file and writing it from MySQL to a file is a very fast process.
Now that we have this file we can do several things with it. Assuming we want to use it to update Lotus Notes data we can create a rather simple Lotus Notes agent that reads a csv- or excelfile and compares and updates for each record. I will provide this agent in a future blog post.