| |
- OleFileIO
- OleMetadata
class OleFileIO |
|
OLE container object
This class encapsulates the interface to an OLE 2 structured
storage file. Use the listdir and openstream methods to
access the contents of this file.
Object names are given as a list of strings, one for each subentry
level. The root entry should be omitted. For example, the following
code extracts all image streams from a Microsoft Image Composer file::
ole = OleFileIO("fan.mic")
for entry in ole.listdir():
if entry[1:2] == "Image":
fin = ole.openstream(entry)
fout = open(entry[0:1], "wb")
while True:
s = fin.read(8192)
if not s:
break
fout.write(s)
You can use the viewer application provided with the Python Imaging
Library to view the resulting files (which happens to be standard
TIFF files). |
|
Methods defined here:
- __init__(self, filename=None, raise_defects=40, write_mode=False, debug=False, path_encoding='utf-8')
- Constructor for the OleFileIO class.
:param filename: file to open.
- if filename is a string smaller than 1536 bytes, it is the path
of the file to open. (bytes or unicode string)
- if filename is a string longer than 1535 bytes, it is parsed
as the content of an OLE file in memory. (bytes type only)
- if filename is a file-like object (with read, seek and tell methods),
it is parsed as-is.
:param raise_defects: minimal level for defects to be raised as exceptions.
(use DEFECT_FATAL for a typical application, DEFECT_INCORRECT for a
security-oriented application, see source code for details)
:param write_mode: bool, if True the file is opened in read/write mode instead
of read-only by default.
:param debug: bool, set debug mode
:param path_encoding: None or str, name of the codec to use for path
names (streams and storages), or None for Unicode.
Unicode by default on Python 3+, UTF-8 on Python 2.x.
(new in olefile 0.42, was hardcoded to Latin-1 until olefile v0.41)
- close(self)
- close the OLE file, to release the file object
- dumpdirectory(self)
- Dump directory (for debugging only)
- dumpfat(self, fat, firstindex=0)
- Displays a part of FAT in human-readable form for debugging purpose
- dumpsect(self, sector, firstindex=0)
- Displays a sector in a human-readable form, for debugging purpose.
- exists(self, filename)
- Test if given filename exists as a stream or a storage in the OLE
container.
Note: filename is case-insensitive.
:param filename: path of stream in storage tree. (see openstream for syntax)
:returns: True if object exist, else False.
- get_metadata(self)
- Parse standard properties streams, return an OleMetadata object
containing all the available metadata.
(also stored in the metadata attribute of the OleFileIO object)
new in version 0.25
- get_rootentry_name(self)
- Return root entry name. Should usually be 'Root Entry' or 'R' in most
implementations.
- get_size(self, filename)
- Return size of a stream in the OLE container, in bytes.
:param filename: path of stream in storage tree (see openstream for syntax)
:returns: size in bytes (long integer)
:exception IOError: if file not found
:exception TypeError: if this is not a stream.
- get_type(self, filename)
- Test if given filename exists as a stream or a storage in the OLE
container, and return its type.
:param filename: path of stream in storage tree. (see openstream for syntax)
:returns: False if object does not exist, its entry type (>0) otherwise:
- STGTY_STREAM: a stream
- STGTY_STORAGE: a storage
- STGTY_ROOT: the root entry
- getctime(self, filename)
- Return creation time of a stream/storage.
:param filename: path of stream/storage in storage tree. (see openstream for
syntax)
:returns: None if creation time is null, a python datetime object
otherwise (UTC timezone)
new in version 0.26
- getmtime(self, filename)
- Return modification time of a stream/storage.
:param filename: path of stream/storage in storage tree. (see openstream for
syntax)
:returns: None if modification time is null, a python datetime object
otherwise (UTC timezone)
new in version 0.26
- getproperties(self, filename, convert_time=False, no_conversion=None)
- Return properties described in substream.
:param filename: path of stream in storage tree (see openstream for syntax)
:param convert_time: bool, if True timestamps will be converted to Python datetime
:param no_conversion: None or list of int, timestamps not to be converted
(for example total editing time is not a real timestamp)
:returns: a dictionary of values indexed by id (integer)
- getsect(self, sect)
- Read given sector from file on disk.
:param sect: int, sector index
:returns: a string containing the sector data.
- listdir(self, streams=True, storages=False)
- Return a list of streams and/or storages stored in this file
:param streams: bool, include streams if True (True by default) - new in v0.26
:param storages: bool, include storages if True (False by default) - new in v0.26
(note: the root storage is never included)
:returns: list of stream and/or storage paths
- loaddirectory(self, sect)
- Load the directory.
:param sect: sector index of directory stream.
- loadfat(self, header)
- Load the FAT table.
- loadfat_sect(self, sect)
- Adds the indexes of the given sector to the FAT
:param sect: string containing the first FAT sector, or array of long integers
:returns: index of last FAT sector.
- loadminifat(self)
- Load the MiniFAT table.
- open(self, filename, write_mode=False)
- Open an OLE2 file in read-only or read/write mode.
Read and parse the header, FAT and directory.
:param filename: string-like or file-like object, OLE file to parse
- if filename is a string smaller than 1536 bytes, it is the path
of the file to open. (bytes or unicode string)
- if filename is a string longer than 1535 bytes, it is parsed
as the content of an OLE file in memory. (bytes type only)
- if filename is a file-like object (with read, seek and tell methods),
it is parsed as-is.
:param write_mode: bool, if True the file is opened in read/write mode instead
of read-only by default. (ignored if filename is not a path)
- openstream(self, filename)
- Open a stream as a read-only file object (BytesIO).
Note: filename is case-insensitive.
:param filename: path of stream in storage tree (except root entry), either:
- a string using Unix path syntax, for example:
'storage_1/storage_1.2/stream'
- or a list of storage filenames, path to the desired stream/storage.
Example: ['storage_1', 'storage_1.2', 'stream']
:returns: file object (read-only)
:exception IOError: if filename not found, or if this is not a stream.
- sect2array(self, sect)
- convert a sector to an array of 32 bits unsigned integers,
swapping bytes on big endian CPUs such as PowerPC (old Macs)
- write_sect(self, sect, data, padding='\x00')
- Write given sector to file on disk.
:param sect: int, sector index
:param data: bytes, sector data
:param padding: single byte, padding character if data < sector size
- write_stream(self, stream_name, data)
- Write a stream to disk. For now, it is only possible to replace an
existing stream by data of the same size.
:param stream_name: path of stream in storage tree (except root entry), either:
- a string using Unix path syntax, for example:
'storage_1/storage_1.2/stream'
- or a list of storage filenames, path to the desired stream/storage.
Example: ['storage_1', 'storage_1.2', 'stream']
:param data: bytes, data to be written, must be the same size as the original
stream.
|
class OleMetadata |
|
class to parse and store metadata from standard properties of OLE files.
Available attributes:
codepage, title, subject, author, keywords, comments, template,
last_saved_by, revision_number, total_edit_time, last_printed, create_time,
last_saved_time, num_pages, num_words, num_chars, thumbnail,
creating_application, security, codepage_doc, category, presentation_target,
bytes, lines, paragraphs, slides, notes, hidden_slides, mm_clips,
scale_crop, heading_pairs, titles_of_parts, manager, company, links_dirty,
chars_with_spaces, unused, shared_doc, link_base, hlinks, hlinks_changed,
version, dig_sig, content_type, content_status, language, doc_version
Note: an attribute is set to None when not present in the properties of the
OLE file.
References for SummaryInformation stream:
- http://msdn.microsoft.com/en-us/library/dd942545.aspx
- http://msdn.microsoft.com/en-us/library/dd925819%28v=office.12%29.aspx
- http://msdn.microsoft.com/en-us/library/windows/desktop/aa380376%28v=vs.85%29.aspx
- http://msdn.microsoft.com/en-us/library/aa372045.aspx
- http://sedna-soft.de/summary-information-stream/
- http://poi.apache.org/apidocs/org/apache/poi/hpsf/SummaryInformation.html
References for DocumentSummaryInformation stream:
- http://msdn.microsoft.com/en-us/library/dd945671%28v=office.12%29.aspx
- http://msdn.microsoft.com/en-us/library/windows/desktop/aa380374%28v=vs.85%29.aspx
- http://poi.apache.org/apidocs/org/apache/poi/hpsf/DocumentSummaryInformation.html
new in version 0.25 |
|
Methods defined here:
- __init__(self)
- Constructor for OleMetadata
All attributes are set to None by default
- dump(self)
- Dump all metadata, for debugging purposes.
- parse_properties(self, olefile)
- Parse standard properties of an OLE file, from the streams
"SummaryInformation" and "DocumentSummaryInformation",
if present.
Properties are converted to strings, integers or python datetime objects.
If a property is not present, its value is set to None.
Data and other attributes defined here:
- DOCSUM_ATTRIBS = ['codepage_doc', 'category', 'presentation_target', 'bytes', 'lines', 'paragraphs', 'slides', 'notes', 'hidden_slides', 'mm_clips', 'scale_crop', 'heading_pairs', 'titles_of_parts', 'manager', 'company', 'links_dirty', 'chars_with_spaces', 'unused', 'shared_doc', 'link_base', ...]
- SUMMARY_ATTRIBS = ['codepage', 'title', 'subject', 'author', 'keywords', 'comments', 'template', 'last_saved_by', 'revision_number', 'total_edit_time', 'last_printed', 'create_time', 'last_saved_time', 'num_pages', 'num_words', 'num_chars', 'thumbnail', 'creating_application', 'security']
| |