2.3.18.2. Content¶
2.3.18.2.1. File utilities¶
- This module provides various file related features:
 - filesystem traversal with depth support
 - file search, wildcard or regex based
 - file rollover (backup)
 - size parsing and formatting
 - directory creation without error on existing directory
 
- 
mkdirs(d)[source]¶ Create a directory, including parents.
Params: - d: the directory, or list of directories, that may be created
 
Return: - created: For a single directory: d if directory has been created, ‘’ otherwise (already exists). For a list of directories, the list of directories which have been created.
 
- 
mkfdirs(f)[source]¶ Create a file directory, including parents. This may be used before writing to a file to ensure the parent directories exists.
Params: - f: the file, or list of files, for which the directory may be created
 
Return: - created: For a single file: f directory if it has been created, ‘’ otherwise (already exists). For a list of files, the list of f directories which have been created were created
 
- 
rollover(filepath, count=1, suffix='.%d', keep=True, verbose=False)[source]¶ Make a rollover of the specified file. Keep a certain number of backups of a file by renaming them with a suffix number.
Params: - filepath: the file to make a backup of
 - count: maximum number of backup files
 - suffix: suffix to use when renaming files, must contain a ‘%d’ marker which will be used to mark backup number
 - keep: whether to keep existing file in addition to the backup one
 
Return: True if a backup occured, False otherwise (count is 0 or filepath does not exists)
- 
strfsize(size, fmt=None, unit=None, si=False, suffix=True)[source]¶ Format a size in bytes using the appropriate unit multiplicator (Ko, Mo, Kio, Mio)
Params: - size:
 - the size in bytes
 
- fmt:
 - the format to use, will receive size and unit arguments, if None formats “%(size).3f %(unit)s” or “%(size)d %(unit)s” will be automatically used.
 
- unit:
 - use an auto determinated unit if None, or the given one among K, M, G, T, P, E, Z, Y
 
- si:
 - whether to use SI (International System) units (10^3, …) or binary units (2^10, …)
 
Return: a string
- 
strpsize(size, si=False)[source]¶ Parse a size in Ko, Mo, Kio, Mio, …
Params: - size: the size string (eg. “1Ko”, “1Kio”, “2 Mo”, ” 10 Go”
 - si: when unit does not ends with ‘io’ force interpretation as
 - International System units (10^3, …) instead of binary units (2^10, …)
 
Return: the float number of bytes
- 
tfind(regex, path=None, fmt='%Y-%m-%dT%H:%M:%SZ', min=None, max=None, group=None, getdate=False, getmatch=False, xmin=False, xmax=True, **kwargs)[source]¶ Find timestamped paths (e.g. files having a date string in their paths)
See: func:find for regex, path and kwargs arguments. The regex regular expression must define at least one group which describe the date string location in paths.
Params: - fmt: (python) date format
 - min: minimum date filter: a datetime object or a date string in fmt format. None means no max date filtering.
 - max: maximum date filter: a datetime object or a date string in fmt format. None means no max date filtering.
 - group: the regex group(s) number(s) or name(s): one or a list of string or integer. None means all groups.
 - xmin: if True, min is exclusive
 - xmax: if True, max is exclusive
 
The group(s) can be specified either by their number or name. These group will be concatenated to form the date that will be parsed.
Examples: Assuming we are lokking for the follwing files:
- path/to/data/data_2010-01-01T00H.nc
 - path/to/data/data_2010-01-01T12H.nc
 - path/to/data/data_2010-01-02T00H.nc
 - path/to/data/data_2010-01-02T12H.nc
 
The commands below will have the same result:
>>> items = tfind('data_(.*)\.nc', 'path/to', '%Y-%m-%dT%HZ', depth=2) >>> items = tfind('data_(....-..-..T..Z)\.nc', 'path/to/data', '%Y-%m-%dT%HZ')
Same but more precise / advanced examples:
>>> items = tfind('data_([0-9]{4}-[0-9]{4}-[0-9]{4}T[0-9]{2}Z)\.nc', 'path/to/data', '%Y%m%dT%HH') >>> items = tfind('data_([0-9]{4})-([0-9]{4})-([0-9]{4})T([0-9]{2})Z\.nc', 'path/to/data', '%Y%m%d%H') >>> items = tfind('(data)_(?P<y>[0-9]{4})-([0-9]{2})-([0-9]{2})T([0-9]{2})Z\.nc', 'path/to/data', '%Y%m%d%H', group=('y',3,4,5)))
Return: Depending on getdate and getmatch, a list in the form:
- If getdate=False and getmatch=False: [path1, path2, …]
 - If getdate=False and getmatch=True: [(path1, match1), (path2, match1), …]
 - If getdate=True and getmatch=False: [(path1, datetime1), (path2, datetime2), …]
 - If getdate=True and getmatch=True: [(path1, matchobj1, datetime1), (path2, matchobj2, datetime2), …]
 
- 
walk(top, topdown=True, onerror=None, followlinks=False, depth=None, onfile=None, ondir=None, _depth=0)[source]¶ New implementation of os.walk with depth support to avoid unnecessary large scans. This yield a supplementary depth value for each walk (top, dirs, nondirs, depth)
Params: - depth: Limit the depth of walk:
 - None: no limit
 - 0: limited to top directory entries
 - 1: limited to first directory under the top directory
 - N: limited to Nth directory under the top directory
 
Warning
Do not use the _depth attribute as it is used to track the current depth in the yield processing
See: os.walk()for more details on other parameters.
- 
xefind(regex, path=None, depth=0, files=True, dirs=False, matchall=False, abspath=True, exclude=None, followlinks=False, expandpath=True, onerror=None, onfile=None, ondir=None, onmatch=None, getmatch=False, rexflags=None, xrexflags=None)[source]¶ Find paths matching the regex regular expression.
Params: - regex: the file regular expression
 - path: if not None, entries are searched from this location, otherwise current directory is used
 - depth: if not None, it designate the recursion limit (0 based, None for no limit, see walk function)
 - files: if False, file entries will not be returned
 - dirs: if False, directory entries will not be returned
 - matchall: if False, only file/directory names are evaluated, entire path otherwise
 - abspath: if True, returned paths are absolute
 - exclude: if not None, it designate a regular expression which will be used to exclude files or directories
 - getmatch: if True, return a list of (path, match_object) couples
 - followlinks: if True, symbolic links will be walked (see walk function)
 - regexflags: if not None, it will be used as regex compile flags
 - xregexflags: if not None, it will be used as exclude regex compile flags
 - expandpath: if True, environment variables and special character ~ will be expanded in the passed search path
 
Example: >>> find('.*\.nc', '/path/to/data') ['/path/to/data/data_2010-01-01.nc', '/path/to/data/data_2010-01-02.nc', ...]
>>> filelist = find('data_([0-9]{4})-([0-9]{1,2})-([0-9]{1,2})\.nc', 'data', getmatch=True, abspath=False) >>> for filepath, matchobj in filelist: >>> print filepath, ':', matchobj.groups() data/data_2010-01-1.nc : ('2010', '01', '1') data/data_2010-01-10.nc : ('2010', '01', '10')
- 
xfind(pattern, path=None, depth=0, files=True, dirs=False, matchall=False, abspath=True, exclude=None, followlinks=False, expandpath=True, onerror=None, onfile=None, ondir=None, onmatch=None)[source]¶ Find paths matching the pattern wildcard.
Params: - pattern: pattern or list of patterns using special characters *,?,[seq],[!seq] (see standard module fnmatch)
 - path: if not None, entries are searched from this location, otherwise current directory is used
 - depth: if not None, it designate the recursion limit (0 based, None for no limit, see walk function)
 - files: if False, file entries will not be returned
 - dirs: if False, directory entries will not be returned
 - matchall: if False, only file/directory names are evaluated, entire path otherwise
 - abspath: if True, returned paths are absolute
 - exclude: if not None, it designate a pattern or list of patterns which will be used to exclude files or directories
 - followlinks: if True, symbolic links will be walked (see walk function)
 - expandpath: if True, environment variables and special character ~ will be expanded in the passed search path
 
Example: >>> find('*.nc', '/path/to/data') ['/path/to/data/data_2010-01-01.nc', '/path/to/data/data_2010-01-02.nc', ...]
>>> find(('*.nc', '*.grb'), '/path/to/data', depth=1, exclude=('*-01.nc', '*02.grb')) ['/path/to/data/data_2010-01-02.nc', '/path/to/data/grib/data_2010-01-01.grb', ...]