Our last post started to scratch the surface of what a state-of-the-art Smart Storage system can provide in terms if integration with other systems. In our next couple posts we will look at how to send data back and forth between systems, starting here with the tried-and-true method of sending data: file transfers.
The old standby
Transferring files is, and has been, one of the most common methods of exporting data from one system to another. There just needs to be a network location that is accessible to both sides, and an agreed-upon file extension and format. The types of files sent back and forth range from simple .txt or .csv files to more structured files like .xml. The data inside the files is organized in a machine-readable way, and one system exports files to the shared folder and the other system reads and processes the files.
This methodology is simple to implement, but is not the fastest or most efficient. You are limited by the constraints of working with file systems. Machines need access and rights to shared folders, different systems cannot access the same file(s) simultaneously, writing from and reading to files can be slow, and folders can balloon in size as more and more files are sent back and forth.
File transfer recommendations
First, each time data is exported from one system it should be written to a new file with a unique name that preferably includes a date- and timestamp. Some file-based integrations will append data to a single file as a sort of database, but that individual file will get more and more time-consuming to parse as it grows over time. Separate files also reduces the likelihood that two machines will try to access the same file at once. There are also more technical reasons this is preferable, like easier testing, troubleshooting, and auditing.
Second, we recommend using separate folders for sending and receiving files. Machine A should write files to Folder A, and Machine B should write files to Folder B. This, of course, means that Machine A is reading files in from Folder B, and Machine B is reading files from Folder A. This is similar in concept to the Tx and Rx lines in some types of serial communications. This helps to fully separate the reading and writing processes, and with keeping track of exactly what each system is doing.
Third, after a file is read and processed we recommend moving it to a separate folder as a way to archive it. This helps keep you transfer folder clean while keeping a record of past transactions. By rule it means that if there is a file in the transfer folder, it has not been processed yet. Conversely, if the transfer folder is empty it means there is no data waiting to be processed. Just note that it's best to have some sort of garbage collection policy in place so that the archive folder does not simply grow in size endlessly.
We also recommend using a more structured file format like XML instead of simple Text or CSV files. While text/CSV files can work just fine - and we have done a lot of text/CSV integrations - something like XML makes it much easier to encode more complicated data with nested values and relationships.
Tried-and-true for a reason
The reason that file transfers are so common is because it's a method that just works. While it isn't the fastest or most flexible solution, it is easy to understand and not overly complicated to set up. For common data in a straightforward process, transferring files can be a great way to get started with data integration
In our next post we'll touch on using web services to transfer data, which can be a better way to transmit data in more complex systems, so stay tuned! And don't forget to reach out if you have any questions or want to discuss any of these points in more detail!