Properly Configuring Your Web Server and PHP Will Help Your Location Imports

Some our our Store Locator Plus customers are loading huge lists of locations.   Power 4.9 was reworked to make this process far more efficient and prevent browser connections from being weak point in the process.  This has helped thousands of customers import larger lists than ever before.   Configuring your web server and PHP may be the “secret sauce” you need if your site is still having import issues.

Luckily our MySLP users do not have to worry about this as our MySLP servers are already tuned to handle large location lists.    For those of you on WordPress and managing your own plugins , it may be time to talk to your hosting company.

Import processing in 4.9

With Power 4.9 , which is bundled in our year-end update for MySLP users, launched with an entirely new process for importing locations.    Previously the import was a one-step process which tied up your browser.   If your network connection went down, you stopped the browser page loading for any reason, or your allowed server-to-browser response time (usually 120 seconds) was exceeded the import would stop.   That was not a good thing.    In Power 4.9 we broke the process into 3 steps to give the server far more time to deal with the slower parts of the import process such as getting the data from the CSV file into the database and asking Google for latitude & longitude data for every site.    It is not only faster but more resilient.   The files typically upload to the server quickly and then our data-loading process takes over.   It continually checks to make sure the import is still running and if not auto-restarts.

Power 4.9.2 Import Processing Progress
Power 4.9.2 Import Processing Progress Bar

This process works well for most servers and most location lists.   Customers are easily importing thousands of locations without any issues.  Even on under-powered servers.     Some customers, however, are still having issues when importing tens-of-thousands of locations.     After a lot of trial-and-error we finally reproduced the problem on our internal development servers and learned a lot in the process.

Importing locations background

A customer was having issues importing a list of 25,000 locations on their server.    Many of the improvements in Power 4.9 came about as we worked with this customer and several others processing tens-of-thousands of locations on a daily basis.    It did not take long to refine the import process and get more than 10,000 locations to load into the server in less than the 30-second PHP maximum execution time allowed by some hosting services.   A problem was still lurking in the background.  One that cannot be fixed by clever coding alone.

It turns out that if we used the default tuning parameters for our nginx server on our local VVV development box we could simulate what the customer was seeing on their import of 25,000 locations.    We did have to set our PHP time limit to 30 seconds as forcing the fatal PHP error for Maximum Execution Time Exceeded was a critical trigger for this process.   It also turns out we needed to be running the slower PHP 5.6 version with FPM enabled as well.

The problem?

When the PHP maximum execution time is exceeded the program, in this case WordPress processing a file import, crashes with a fatal error.   Since fatal errors have no way to be caught, you just have to let it happen.  Luckily Store Locator Plus is smart enough to check in on import progress every few minutes and pick up the pieces after a crash like this.

Sometimes this would not happen.   In fact when the import did not continue the entire web server would stop responding.  Sometimes permanently. Sometimes for 10 minutes.    It was odd.

What has happening?   The web server and PHP were fighting.    Turns out that the fatal crash from maximum execution time was throwing the web server into  a tailspin because the web server would keep trying to run more background processes to serve more web requests while PHP was trying to get out of the mess that was created.

Web server and PHP limits

We quickly learned that if we restarted the PHP 5.6 FPM process on our server everything would come back to life.  Almost instantaneously.   The web pages would start coming back up and , as if  by magic, the location import process would continue right where it left off.     Running this simple server command fixed the problem every time:

# sudo service php5.6-fpm restart

But why?  Why do we need to restart PHP every time an import stops?

It is not the import itself but a lower-level PHP issue.

Nearly all production servers set a fixed amount of time that a program is allowed to use up in CPU resources.  Smaller shared hosts may give your program 30 seconds of allowed CPU time to finish all the work it needs to do.    Our development server is only for us so it allows 600 seconds which is eons in computer time (enough to process a 500,000 location import with time to spare).     This is the “safety valve” for your PHP applications to ensure that a single program cannot take over an entire server.  It could be a coding error running in an infinite loop or a rogue application doing nefarious things.  Either way you don’t want one program taking over the universe , so the limit prevents that.     It is also a HARD limit meaning once it is reached PHP , quite abruptly, pulls the plug.  NO warning.  No “hey, you may want to stop now”.   BOOM.  Done.   The program crashes hard at whatever point it happens to be at during the import.

Properly configuring your web server and PHP can help

Normally when this happens the process ends and all other programs go on their merry way.     However this requires some choreography between the web server and PHP to make this work well.     One of the issues that often comes up when you reach a maximum execution time error is that one of your web server’s PHP processes has been busy for quite some time.   That means one process has been taken up an not allowed to send web pages to people.    Your web server, as you can guess, does not rely on a single process but several processes to send pages.  It uses one of the backups.     But what happens when 10 people show up at once?  For example when Jetpack runs its every-5-minute background checks?   Or someone runs a social media publishing campaign?    All the extra processes available to serve a web page can be quickly used up.

All web servers have a process manager that handles these things.    They also have memory limitation settings for each process.   Your server also has a limited amount of memory available to do all this work.       If all 3 elements are not configured intelligently all hell can break loose.

nginx and web server config examples
nginx and web server config examples

Checking the nginx process manager is critical.     The process manager (pm in apache or worker in nginx) determines how many connections are “ready at all times”.    This is your initial connections allowed.     There is also a setting that determines how many “children” or extra processes can be started up as needed.    There are even special modes to tell the process manager about how to behave in general such as “worker mode” or “on demand”.      Your system administrator should know enough about your server to ensure that there are enough processes ready at all times without using up all the server memory, which will make everything ground to a halt in a hurry.    They also should know how much overhead your server needs for “breathing room” for “janitorial stuff” which they factor into how many extra children can be created to handle a web request.

Knowing your memory limits and usage per web request is also important.   This is where your PHP configuration and nginx configuration need to be intelligent about what is going on.  Many system administrators run into a limitation and “crank up” the memory limit to “10”.    On many servers this is an issue.   If your server has 16GB of RAM, which is a BIG hosting plan with most web hosts, you cannot tell PHP to allow 512MB per process and then tell the nginx process manager to start 16 processes but allow it to grow to 48 as needed.    If your processes each used up all of the allowable 512MB of RAM then your server runs out of memory and everything stops working.

Your system administrator should know how much memory your typical web requests uses.  They should know how much peak memory is used by the most intensive operations you may run, such as a nightly import or backup process.    They should set your PHP memory limits to a reasonable value based on this information.   They can work backwards from there to determine the maximum number of processes your web server should allow EVER.

They will also need to have some general information about your typical web traffic.  How many simultaneous visitors do you have on average and how long does it take to serve up a web page to them?   If you get 10 visitors per minute but each page only takes 1 second to render then having a handful of processes ready at all times is all you need.   If you are getting 100 visitors per minute and you take 5 seconds to show them the page you’ll need a lot more processes ready at the start.      Do you have high fluctuation where you get 10 visitors per hour most days but your coffee shop locator page is slammed with 300 visits per minute at 7AM?   You need to configure your starting processes at a lower value but allow a large number of children to be spawned.

Want to get “all techie” about it?  Keep an eye on my personal blog over at Lance.BIO where I share more details tech-geek stuff.

Why is all of this important?  How does it relate to imports?

If you are doing an import of tens-of-thousands of locations and your server hits that maximum execution time limit it should go unnoticed.   If everything is tuned properly your server will stop the import process when the resource limit is reached and Store Locator Plus will start a new process within a minute or two and continue where it left off.   If things are NOT configured properly then your web server and PHP start fighting and everything hangs.    Your admin pages stop responding and your import never finishes.      If you ask your web host to restart the web server and the import continues it is a good indicator your web server is misconfigured.

Workarounds

How do you get around the issue if your do not have a managed hosting plan or a cooperative ISP?

For Store Locator Plus imports you can cut your file size down into smaller chunks.      Only import one at a time and when it finished go on to the next piece.

You can also ask your hosting provider to increase the maximum execution time for your hosting plan.   You should be allowed to have at least 60 seconds to process files and not have a significant impact on the server.   60 seconds on a decent web host should allow 20,000 locations to be imported.

Still having issues? Consider changing hosts to someone with a bit more experience or considered our MySLP SaaS service.

Improved Import With Power 4.9.2

Power 4.9.2 for our WordPress plugins was released tonight with improved import processing.  Several updates are included as we progress towards improved large location import processing.    All of the upgrades are coming to the MySLP service for our non-WordPress users on services like Wix, Weebly, and Squarespace but we have a bit more testing and integration to ensure 99.999% up time on our SaaS service.

Fixing large location imports

If you were importing thousands of locations and found the import process would get partway through then stop, the Power 4.9.2 patch should help.    If your file could not be processed in one pass by the background import processor, Store Locator Plus would detect the unfinished import and pick up where it left off.  Problem was it would not re-read the header row to match up data to locator fields and no additional locations would come into the location list.    This issue, while leaving the new 4.9 import architecture no better but no worse than the 4.8 and earlier releases, has now been resolved.     The second , and third or fourth passes if necessary, properly pick up where things left off.

The second fix is to prevent that second or third location import pass from starting prematurely when the first pass slows down.    Power 4.9.1 would sometimes think the first import stopped processing and start where the first pass paused.   This slowed EVERYTHING down and if Locations > Import > File Settings > Duplicates Handling was set to Add versus Update it may have duplicated records.

The import issue could manifest for smaller lists of locations if your server is underpowered or your hosting service is severely restricting PHP time slices (processing time) on your account.   A typical moderately-loaded server should be able to easily process 2500 locations within a 60-second PHP time slice with time to spare.     For servers that cannot process imports that quickly this patch will help keep the process going.

Improved import for all list sizes

While traveling the path of fixing the large list imports we also found several ways to improve import performance.  Where we would run through around 600 locations being imported in a 15 second processing time limit ( Settings > General > Server > PHP Time Limit ) with Power 4.9.1 the same file would load all 1,000 locations before the time ran out.     Our 25,000 location test would load nearly 10,000 locations with a typical 60-second PHP time limit.

As an aside, the default time limit setting for SLP is 600 seconds.   We were turning this DOWN to try to force our servers to behave like a severely taxed web server or a severely underpowered server.    Many shared hosting plans will NOT honor this setting and changing it will not have any impact on most server.     MySLP users will not see this setting in Advanced Options as our servers are already tuned to allow maximum PHP processing for large location lists.

Keeping data intact with slashes in CSV files

The most commonly referenced specification for CSV file processing, RFC 4180, dictates what most computer scientist agree is a valid CSV file format.   One part of that specification is how to deal with data that has a quote in it.   Since CSV data that has any special character within is to be wrapped in quotes (“) there has to be a special way to deal with things like a store name My “Special” Store.    RFC 4180 says that the CSV file should use double-quotes to manage this.

Rather than code all of this ourselves we use the built-in PHP CSV file processing functions.    It is far faster than running custom PHP code.

Until our 4.9 releases we had been trying to keep all of our code compatible with PHP 5.2.4 due to the WordPress minimum requirement being 5.2.4 despite their recommendation to use PHP 7.     We finally decided we could not longer support the 11-year old version if we were going to write efficient applications.   Our plugins no longer support PHP 5.2.4 and now require PHP 5.3.

The benefit of this requirement is we can now support the proper RFC 4180 specification instead of the PHP “special Sybase compatible” version that sees the \” character combination as an escape sequence.   This messes up the import with any field that ends with a \ and starts “shifting fields” into the wrong columns.

As of Power 4.9.2 the \ character is treated like any other character which makes the imports a lot happier with files that Excel is mangling.

Power 4.9.2 still requires proper line endings, so for our Mac users running older versions of Excel DO NOT export in Mac compatible mode.   For some reason Microsoft things Mac computers still use the 2003 standard return (\r) line ending instead of the modern standard CRLF (\r\n) line endings that are required for a proper CSV file formatting.    Why Excel allows a CSV export of ANY format that does NOT have a CRLF is a mystery as this is not part of ANY CSV standard including the “special Sybase compatible” version that PHP allows.

Improved import instructions and reporting

One of the obvious changes is some notes are on the Locations > Import tab to help guide you to some of the updates in the 4.9 import process.  This includes explaining the basic 3-step process and the fact that you can find your location CSV file in the Media > Library of WordPress.

The Power 4.9.2 release has simplified the message logging on the bottom of the import page timestamping only the start and end of processing and reporting problem records.   Added, updated, and not updated records are no longer reported on an individual basis in order to improve performance and consume far less memory on large file imports.

PHP Versions
PHP Versions

In addition to the improved import reporting,  the summary with a count of how many records in the CSV file has added, updated, not updated, were malformed and other summary data is now visible on a per-file basis in the Media Library file details for location CSV files.    This is in addition to the Power 4.9 standard file status , file size, current processing offset, and record count (when completed).

Other tweaks and updates

We snuck in a few smaller things into 4.9.2.

The Identifier column has always had special meaning in CSV files; enabled in General > Data > Add On Data Extensions > Enable Contact Fields, this special column override the sl_id column to match up records to be updated.  While useful for aligning Store Locator Plus with third party data systems it does add some extra overhead to the file processing.    Power will no longer add the overhead to check for that column if the Enable Contact Fields setting is not enabled.  It also means updates will no longer work if you DO include that field in your data source but have not ensured the setting is enabled.   Most sites will not need the Identifier column and update feature.

We fixed the Remote File Retrieval settings.  Somewhere during our updates over the past year these settings stopped saving when you changed them.   We also happened to create a better system for marking “quick save” fields like these the auto-save when changed. You may start seeing quick-save options in more places as they reduce button clicks and are faster than saving entire pages of settings to change one thing.

One last improved import addition — we now report on CSV files that look like they have too many columns.  A CSV file with more than 256 columns is a good indicator that the file is corrupt and the line endings (see Excel and MacOS above) are not set properly.   Rather than create a huge string that includes the entire file and may crash the server, Power 4.9.2 flags this potential error and truncates the logging.   The number of columns (aka fields) is shown in the Media Library details as well and provides more insight into what is going on “behind the scenes” should an import fail.

 

Location Import Process Improved With Power 4.9

With the release of the WordPress plugins version 4.9 of Store Locator Plus and the Power add on we focused on improving the location import process for large lists.      MySLP Power users will see this release by the end of the year after it has passed full testing on our SaaS platform.

Why Did We Change The Process?

Prior to version 4.9 there were a number of limitations that could prevent larger lists of locations from importing properly.    For the fastest servers with lots of memory and super-fast networking as well as a premium tier on the Google API service, Store Locator Plus could import 40,000 locations without any problem.

As WordPress started adding more network overhead, all those update notifications and the news feed in the WordPress dashboard comes from somewhere, the network started to become an issue.   On top of that Google started tracking every single map request so they could start charging people using their maps on a per-lookup basis.     Both of these changes increased the network traffic, slowing down how quickly Store Locator Plus could get a latitude/longitude for those new locations being imported.

Often your server would give up, saying “you had your 30 seconds of my time” and stopping the import mid-stream.    The PHP time limitation on processing is often the first thing that “gives out” when importing large lists of locations.   For most shared hosts the limit is 30 seconds.   Even on a dedicated host it is good practice to not let a PHP process consume more than 120 seconds of CPU time.  This prevents “runaway” processes that take over the entire server which can grind EVERYTHING to a halt forcing a server restart.

The bad news is that if you are importing 10,000 locations and the process gives up after 9,800 locations you have to start over.   At least that is how it USED TO BE with SLP 4.8 and earlier.

How Imports Improved In Power 4.9

With version 4.8 the location import process was “tied to the browser” until it was complete.    When you clicked the “upload CSV” button your browser would sit there attached to the website while the file was uploaded, read line-by-line, each location sent to Google to get a latitude and longitude and then get added to the Store Locator Plus locations list.     If the network was slow this could take a long time, often 10 minutes or more.   That is FOREVER to keep a browser window open these days.

With Power 4.9 we split this process into 3 stages.

Stage 1 : Get The File On The server

This is the only step of the process that stays “attached to your browser”.    The browser will bar processing the page while the file is copied from your laptop to the web server.   This is typically a quick process even for a file with 250,000 locations.    The time it takes depends on how fast your local network and ISP connection is.

Previous versions stored  the uploaded location files in a “hidden” SLP-specific directory under WordPress uploads.   Version 4.9 stores them in the media library.     This gives you an easy way to see that the file was uploaded properly, see what the current import state is, and later download or delete the files when the import is complete.

Stage 2 : Import The Locations Into SLP

After the file is uploaded a detached process starts to work on the actual location import. This ensures your browser doesn’t time out as this part happens in the background.    Your browser is “free to go” while your server does the work of loading in the locations.

It runs a WP Cron job in the background to import the locations WITHOUT geocoding. This is much faster and ensures the import doesn’t disconnect partway through.

As part of improving visibility into this process, Power 4.9.1 will be out soon and will be included in our end-of-year MySLP update.  It includes a cool interactive progress bar as well showing you how many records have been read.

Power 4.9.2 Import Processing Progress
Power 4.9.2 Import Processing Progress Bar

If your server times out before our background location loader finishes reading the file , SLP remembers the last record read and re-spawns a new background job to pick up where it left off. It will do this until it reaches the end of the file no matter how often PHP times out.

Stage 3: Geocode The Locations

AFTER the file has been loaded it will then geocode all uncoded locations.   That means you may catch SLP with a list of locations none of which are geocoded and not showing up on the map.  This is a temporary state while SLP talks to Google.    Since that is always the slowest part of an import we decided to do this in a separate process so at least SOME locations start showing up right away and eventually of them are loaded.

Power 4.9.2 Geocoding Progress
Power 4.9.2 Geocoding Progress

This is similar to the import. It does it in the background and if PHP times out it will re-spawn itself until it has tried to geocode very location at least once. This is a big step toward not having an import time out. It also keeps you from manually having to re-geocode if you hit your daily 2500 location limit and didn’t opt for a paid API key.