Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

PHP Script Timeout Issue

Options
  • 13-08-2014 8:24am
    #1
    Registered Users Posts: 1,987 ✭✭✭


    I have a PHP script which runs every evening at 19:00and the script takes about 20 minutes to run. The script is readying in a CSV file with numbers and querying via an external API and inserting data into a local database table.

    If I process a file with a small number of numbers say 10 entries all works fine and the timestamp is updated as expected in the database.

    The issue I'm having is with a large CSV file of over 3000 entries that the script processes the CSV file completely as I can see the same amount of inserts as there are numbers in the CSV file which is correct but the last bit of the processing where it updates a timestamp in another table once the files processed doesn't seem to happen and the final notification via email that the file has been processed isn't sent.

    I'm setting the PHP timeout to a longer execution time then the default and resetting to default after the script runs.

    Anyone ever come across an issue like this when a script runs longer then the default timeout?

    A shorten'ed down version of the script.
    $dbObj = new db();
    
    if (!empty($pOauthKey) || !empty($pOauthSecret))
    {
    	$this->mOauth_key = $pOauthKey;
    	$this->mOauth_secret = $pOauthSecret;
    }
    
    $vFiles = array();
    $vPendingFilePath = $this->mFileStore.'pending/';
    
    // Get list of files to process
    $vDir = scandir($vPendingFilePath);
    foreach ($vDir as $vKey => $vValue)
    {
    	if (!in_array($vValue, array(".","..")))
    	{
    		$vValue = explode("_", $vValue);
    		$vFiles[] = array($vValue[0], $vValue[1]);
    	}
    }
    
    for ($i=0;$i < count($vFiles);$i++)
    {
    	// Update file start time
    	$dbObj->update("UPDATE tbl_hub_files 
    		SET huf_started = {started_timestamp} 
    		WHERE huf_file = {file} 
    		LIMIT 1;"
    		, array("started_timestamp" => date("Y-m-d G:i:s")
    			, "file" => $vFiles[$i][0]."_".$vFiles[$i][1]
    		)
    	);
    
    	// Get file details
    	$vResult = $dbObj->select("SELECT huf_id
    		, huf_uid
    		, huf_file
    		, huf_source
    		, huf_report 
    		FROM tbl_hub_files 
    		WHERE huf_uid = [uid]
    		AND huf_file = {file} 
    		AND huf_started <> '0000-00-00 00:00:00' 
    		LIMIT 1;"
    		, array("uid" => $vFiles[$i][0]
    			, "file" => $vFiles[$i][0]."_".$vFiles[$i][1]
    		)
    	);
    	if ($vResult)
    		$this->mFileDetails = $vResult[0];
    				
    	$vResponse = null;
    	$vPendingFilePath = $this->mFileStore.'pending/'.$this->mFileDetails->huf_file;
    				
    	$vCSVData = self::getCSVData($vPendingFilePath);
    	
    	foreach ($vCSVData AS $vKey => $vValue)
    	{
    		$j = 1;
    		switch ($this->mFileDetails->huf_source)
    		{
    			case 1:
    				if (is_numeric($vValue))
    					break;
    				
    				$vCompanyNumber = self::getCompanyNubmer($vValue);
    				
    				switch ($this->mFileDetails->huf_report)
    				{
    					case 1:
    						foreach ($vCompanyNumber AS $vSubKey => $vSubValue)
    						{
    							// API Request
    						}
    						break;
    					case 2:
    						foreach ($vCompanyNumber AS $vSubKey => $vSubValue)
    						{
    							// API Request
    						}
    						break;
    					case 3:
    						foreach ($vCompanyNumber AS $vSubKey => $vSubValue)
    						{
    							// API Request
    						}
    						break;
    					case 4:
    						foreach ($vCompanyNumber AS $vSubKey => $vSubValue)
    						{
    							// API Request
    						}
    						break;
    					case 5:
    						foreach ($vCompanyNumber AS $vSubKey => $vSubValue)
    						{
    							// API Request
    						}
    						break;
    				}
    				
    				break;
    			case 2:
    					
    				$vCompanyNumber = self::formatCompanyNumber($vValue);
    				
    				switch ($this->mFileDetails->huf_report)
    				{
    					case 1:
    						// API Request
    						break;
    					case 2:
    						// API Request
    						break;
    					case 3:
    						// API Request
    						break;
    					case 4:
    						// API Request
    						break;
    					case 5:
    						// API Request
    						break;
    				}
    				
    				break;
    		}
    		$j++;
    	}
            // Never runs after here when a large amount of CSV entries are processed
    	unset($vCSVData);
    	self::hubFileSuccess();
    }
    


Comments

  • Registered Users Posts: 7,410 ✭✭✭jmcc


    Ziycon wrote: »
    Anyone ever come across an issue like this when a script runs longer then the default timeout?
    There are somethings better left to a cronjob, if possible. A rough guess, when you reset the php execution time, did you restart the webserver?

    Regards...jmcc


  • Registered Users Posts: 1,987 ✭✭✭Ziycon


    Have it running via a cronjob and I've restarted Apache, the part that's not running is at the end of the same function that processes the file, it's almost as if when the for loop which loops through all the CSV entries completes the script stops and doesn't call the success function which is right after the for loop.


  • Registered Users Posts: 68,317 ✭✭✭✭seamus


    Are you sure it's the timeout that's killing the process? Ensure logging is configured in your php.ini and set logging to E_ALL at the top of your script. If the script is killed you should then see a log entry along the lines of, "Script exceeded maximum execution time. Script terminated."

    If that is it, you can use

    max_execution_time = -1

    In your php.ini to switch off the maximum time. Not recommended, but it will rule timeout in or out as the cause of this.

    Also, are you running this script directly, or calling it using wget or similar?

    And lastly, are you operating on the downloaded CSV in real-time or caching it first? That is, do you read a line from the API, process it and move onto the next line, or download the entire file, save it locally and then process it?


  • Registered Users Posts: 7,410 ✭✭✭jmcc


    Ziycon wrote: »
    Have it running via a cronjob and I've restarted Apache, the part that's not running is at the end of the same function that processes the file, it's almost as if when the for loop which loops through all the CSV entries completes the script stops and doesn't call the success function which is right after the for loop.
    Sounds a bit strange. Did you test the success function part of the PHP on its own or confirm that it is being called?

    Regards...jmcc


  • Registered Users Posts: 1,987 ✭✭✭Ziycon


    jmcc wrote: »
    Sounds a bit strange. Did you test the success function part of the PHP on its own or confirm that it is being called?

    Regards...jmcc
    Yes, it works fine on small CSV files with only a few line entries, I just can't see what or why it's stopping after the CSV file is processed.

    I'm trying to get my hands on logs, proving a bit difficult at present.


  • Advertisement
  • Registered Users Posts: 1,987 ✭✭✭Ziycon


    Ok, I've only just found out that when running the script via CLI that there is no execution time out limit (it's set by default to 0), so I'm moving more towards a memory issue as it's a larger file which in turn would require more memory depending on how well the code is written.


  • Registered Users Posts: 7,410 ✭✭✭jmcc


    Ziycon wrote: »
    Yes, it works fine on small CSV files with only a few line entries, I just can't see what or why it's stopping after the CSV file is processed.

    I'm trying to get my hands on logs, proving a bit difficult at present.
    If it can echo some kind of debug information (start/line count/success) to a webpage as the script is executed, it might be possible to see without using the log files. (Iffy on a production server though.)

    Also, (not having used PHP for a while) this looks strange:
    case 1:
    				if (is_numeric($vValue))
    					break;
    
    It should execute without the curly brackets but does the break statement have to be on the same line? Also if the CSV data is going into a database, a bulk insert might be faster than a line by line insert.

    Regards...jmcc


  • Registered Users Posts: 1,987 ✭✭✭Ziycon


    I've setup the cronjob to output to a log file and I've enabled all error reporting.
    if (is_numeric($vValue))
           break;
    
    This is just to break out of the switch statement if $vValue is not a numeric value so it's ok. more validation then any real processing.

    Only issue with a bulk insert is that it would have to store all 3000+ entries before running the insert which could cause more issues with memory if the underlining issue is memory related.

    Would it make any difference if I was to read the CSV file a line at a time and instead of reading the whole CSV file in and looping through the entries?


  • Registered Users Posts: 7,410 ✭✭✭jmcc


    Ziycon wrote: »
    Only issue with a bulk insert is that it would have to store all 3000+ entries before running the insert which could cause more issues with memory if the underlining issue is memory related.
    Was thinking that it could have been database index related rather than a PHP issue.
    Would it make any difference if I was to read the CSV file a line at a time and instead of reading the whole CSV file in and looping through the entries?
    It depends on the indexes used in the database. Bulk inserts tend to be more efficient because it generally involves a single index update (or the indexes only have to be updated once) rather than an index update each time a row is inserted. Normally with a block of data, it is better to adopt an ETL (extract/transform/load) approach to make the insertion into the database efficient. (The data here generally tends to be in the millions of rows per table so the data is extracted and transformed into bulk inserts (more precisely, loaded using "load data infile" which is just raw data rather than bulk inserts as such) which are then loaded into the dbs.) But it doesn't rely on APIs so it is probably a completely different approach to your PHP script.

    If the db is MySQL based this might provide some background on insert speeds:
    http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html

    Regards...jmcc


  • Registered Users Posts: 68,317 ✭✭✭✭seamus


    Performance-wise, reading each line in separately requires a new file handle to be opened and closed for each read. Whereas pulling it all in at once is faster. There are pros and cons to each method depending on that nature of the box. For your purposes however, you probably won't see a major difference between the two

    Unless the lines are huge chunks of binary data or the memory on the machine is particularly scant, I can't see a 3,000 line CSV presenting much difficulty.

    Check to see that the memory limit directive in php.ini hasn't been set to something stupidly small.

    I wrote something very similar in a previous job. It was in C#, though the theory was the same. It typically processed about 100,000 lines in each run, moving them from a CSV file to an MSSQL DB. What I found was the biggest sticking point was preparing the data to go to SQL. Obviously a 100,000 line bulk insert wasn't possible, so I had to periodically flush the bulk insert and commit it to the server before parsing the next bunch of rows and firing them across.
    You would think that doing the maximum bulk insert (1,000 rows) would be the way to go here, as that would only require 100 inserts every time the application ran.

    But it wasn't. It ran like a pig and consumed stupid amounts of memory. On some machines it crashed. So I dialled the number of lines before a commit up and down and did some tests to find the sweet spot, and in that case it was 50. So every fifty rows, flush the bulk insert to the SQL Server and begin building a new statement. Memory footprint was low and the application rocketed along, even though it required 2,000 separate BULK INSERT statements.


  • Advertisement
  • Registered Users Posts: 1,987 ✭✭✭Ziycon


    Seems to be processing the whole file of 3752 entries in the CSV file and inserting them into the database no problem since I up'ed the memory limit to 256M from 128M, the original issue still stands, it wont run the final success function after the for loop only on large CSV files. :confused:


  • Closed Accounts Posts: 5,857 ✭✭✭professore


    Log everything inside the for loop. It could be something funny in one of the CSV lines causing it to choke that you don't see in a small run. Also the api calls could be consuming lots of memory.


  • Registered Users Posts: 7,410 ✭✭✭jmcc


    Ziycon wrote: »
    Seems to be processing the whole file of 3752 entries in the CSV file and inserting them into the database no problem since I up'ed the memory limit to 256M from 128M, the original issue still stands, it wont run the final success function after the for loop only on large CSV files. :confused:
    Probably a bit of a wild goose chase but still trying to understand the API section and whether the $j is relevant there. The $j value is set to 1 at the start, and then incremented at the end of a loop, could one of the API connections be left hanging? Is the last line in the large CSV file blank while the last line in the small CSV is not blank?

    Regards...jmcc


  • Registered Users Posts: 68,317 ✭✭✭✭seamus


    jmcc wrote: »
    Is the last line in the large CSV file blank while the last line in the small CSV is not blank?
    Looking the foreach loop, this was a guess of mine, that there's a tonne of CRLFs at the end of the file or possibly null characters, that are causing this loop to go on for years.

    Open up the file that's been downloaded, in something like Notepad++ to see if it looks odd at the bottom.


  • Registered Users Posts: 1,987 ✭✭✭Ziycon


    I've added in checks for null entries as I thought null entries could cause the API call to fail and in turn cause the script to crash out, haven't been able to test it just yet due to a 3rd party script (legacy code & so badly written) being used for the API calls so have to clean up the database connections first as they are being dropped after a certain amount of connections are opened/closed.

    I'll post back once I have a more definite answer/cause.


  • Registered Users Posts: 7,410 ✭✭✭jmcc


    Interested to know how the problem was solved. Was it junk at the end of the file?

    Regards...jmcc


  • Registered Users Posts: 1,987 ✭✭✭Ziycon


    Ah yes, meant to post back, it was a mixture of two things related to the same issue. The script was timing out due to the API call failing which was due to blank lines in the CSV making API calls with a null/empty value, I increased the timeout to a massive size (for testing only), then found that the empty values caused a fatal error due to the way the code was written for the API example they give.

    To get around this I rewrote the API call code to stop fatal errors being thrown and also added a check to ignore any null/blank values before the API call is made.

    Thanks for everyone's help and advice.


  • Registered Users Posts: 7,410 ✭✭✭jmcc


    Ziycon wrote: »
    Ah yes, meant to post back, it was a mixture of two things related to the same issue. The script was timing out due to the API call failing which was due to blank lines in the CSV making API calls with a null/empty value, I increased the timeout to a massive size (for testing only), then found that the empty values caused a fatal error due to the way the code was written for the API example they give.

    To get around this I rewrote the API call code to stop fatal errors being thrown and also added a check to ignore any null/blank values before the API call is made.
    Sounds like the original API coders never designed for error conditions. When working with external data, (data not locally generated), it is always a good thing to defang it before processing.

    Regards...jmcc


Advertisement