Get Folder Contents for Very Large Folder

Hi,

I’m building a process to read files from a large folder (280k+ files). It should be looking for a specific file daily, but I will never have the complete file name, only a partial name. Get folder contents with a regex name filter works great for smaller folders, but with this many it times out and fails.

Is there another way to read a specific file when all I have is the folder and a partial file name, custom script maybe? The partial file name will result in a unique file, fyi.

Any help is appreciated, thanks!

Hi @wwylie.

Under “partial file name” do you mean some file name mask, for example, as Test_file_[MMDDYYYY].txt? How does partial file name look like?

The file names include a date as well as an identifier, but also random text
(ex MMDDYYYY_AIDKTHE1823AHD_TEST_FILE_SUEJ9834_97834)

So here, I would be looking for MMDDYYYY*TEST_FILE* with “*” being any text.

Thank you. And one more question, is this file the latest modified file by date or not? If it’s the latest modified file, you can try to use this custom script sample: https://kb.workfusion.com/display/RPAe/Code+Samples#CodeSamples-FindinglastmodifiedfileYellowWorkfusion

Unfortunately no, it will not be the latest modified.

Hi @wwylie,

Please find the sample Custom Action script which is working fine in large number of files in folder,

Avoid Regex Filter in Get Folder Content Activity and thus perform the script as shown in below image,

Samples.zip (249.2 KB)

Hope this resolves your problem and Have a great day :slight_smile:

2 Likes

Thank you for the response. However, my issue is getting Get Folder Contents action to work at all - your example still uses get folder contents to create a list of all the files, that is where my process is timing out. It’s a network drive with 290k+ files which takes a while to load just by opening in file explorer.

Hi @wwylie.

I’ve created small script that reads files and compares with mask (thanks @aravindhan_mr for regex :slight_smile: ) returning the List variable. It works for me, but I tested with small folder with 20 files. Recorder variables in Studio: folder - String and file_list - List.
Please see below:

import com.workfusion.studio.rpa.recorder.utils.ListComparator

@CustomScriptAction(
   input = ['folder'],
   output = 'file_list'
)

def customScript() {                     
         
       
	   def File[] files = new File(folder.toString()).listFiles()
	   def ArrayList<String> temp_list = new ArrayList<String>()
	   	   
	   for (File file: files) 
	   {
	     if (file.isFile()) {        
	        
	          if (file.getName().matches("[0-9]{1,}+\\_[a-zA-Z0-9_]{1,}+\\.[tx]{3}")) 
	          {
	            temp_list.add(file.getName().toString())
	          }
	        
	       }	      	       
	   }  
	   file_list = RList.of(temp_list.asList())
} 
1 Like

Thanks for the help, unfortunately the folder is too big for these custom scripts. I was able to use a dir command to get a text file with all of the file names (ex. “C:\ >dir”) which gives me what I need.
Thanks all!

2 Likes

Thank you for the update! Glad to know that you resolved your issue :+1: