Package: ff 4.5.0

ff: Memory-Efficient Storage of Large Data on Disk and Fast Access Functions

The ff package provides data structures that are stored on disk but behave (almost) as if they were in RAM by transparently mapping only a section (pagesize) in main memory - the effective virtual memory consumption per ff object. ff supports R's standard atomic data types 'double', 'logical', 'raw' and 'integer' and non-standard atomic types boolean (1 bit), quad (2 bit unsigned), nibble (4 bit unsigned), byte (1 byte signed with NAs), ubyte (1 byte unsigned), short (2 byte signed with NAs), ushort (2 byte unsigned), single (4 byte float with NAs). For example 'quad' allows efficient storage of genomic data as an 'A','T','G','C' factor. The unsigned types support 'circular' arithmetic. There is also support for close-to-atomic types 'factor', 'ordered', 'POSIXct', 'Date' and custom close-to-atomic types. ff not only has native C-support for vectors, matrices and arrays with flexible dimorder (major column-order, major row-order and generalizations for arrays). There is also a ffdf class not unlike data.frames and import/export filters for csv files. ff objects store raw data in binary flat files in native encoding, and complement this with metadata stored in R as physical and virtual attributes. ff objects have well-defined hybrid copying semantics, which gives rise to certain performance improvements through virtualization. ff objects can be stored and reopened across R sessions. ff files can be shared by multiple ff R objects (using different data en/de-coding schemes) in the same process or from multiple R processes to exploit parallelism. A wide choice of finalizer options allows to work with 'permanent' files as well as creating/removing 'temporary' ff files completely transparent to the user. On certain OS/Filesystem combinations, creating the ff files works without notable delay thanks to using sparse file allocation. Several access optimization techniques such as Hybrid Index Preprocessing and Virtualization are implemented to achieve good performance even with large datasets, for example virtual matrix transpose without touching a single byte on disk. Further, to reduce disk I/O, 'logicals' and non-standard data types get stored native and compact on binary flat files i.e. logicals take up exactly 2 bits to represent TRUE, FALSE and NA. Beyond basic access functions, the ff package also provides compatibility functions that facilitate writing code for ff and ram objects and support for batch processing on ff objects (e.g. as.ram, as.ff, ffapply). ff interfaces closely with functionality from package 'bit': chunked looping, fast bit operations and coercions between different objects that can store subscript information ('bit', 'bitwhich', ff 'boolean', ri range index, hi hybrid index). This allows to work interactively with selections of large datasets and quickly modify selection criteria. Further high-performance enhancements can be made available upon request.

Authors:Daniel Adler [aut], Christian Gläser [ctb], Oleg Nenadic [ctb], Jens Oehlschlägel [aut, cre], Martijn Schuemie [ctb], Walter Zucchini [ctb]

ff_4.5.0.tar.gz
ff_4.5.0.zip(r-4.5)ff_4.5.0.zip(r-4.4)ff_4.5.0.zip(r-4.3)
ff_4.5.0.tgz(r-4.4-x86_64)ff_4.5.0.tgz(r-4.4-arm64)ff_4.5.0.tgz(r-4.3-x86_64)ff_4.5.0.tgz(r-4.3-arm64)
ff_4.5.0.tar.gz(r-4.5-noble)ff_4.5.0.tar.gz(r-4.4-noble)
ff.pdf |ff.html
ff/json (API)
NEWS

# Install 'ff' in R:
install.packages('ff', repos = c('https://truecluster.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/truecluster/ff/issues

Uses libs:
  • c++– GNU Standard C++ Library v3

On CRAN:

165 exports 25 stars 6.10 score 1 dependencies 68 dependents 14 mentions 740 scripts 11.4k downloads

Last updated 8 hours agofrom:03df454dd9. Checks:OK: 9. Indexed: yes.

TargetResultDate
Doc / VignettesOKSep 17 2024
R-4.5-win-x86_64OKSep 17 2024
R-4.5-linux-x86_64OKSep 17 2024
R-4.4-win-x86_64OKSep 17 2024
R-4.4-mac-x86_64OKSep 17 2024
R-4.4-mac-aarch64OKSep 17 2024
R-4.3-win-x86_64OKSep 17 2024
R-4.3-mac-x86_64OKSep 17 2024
R-4.3-mac-aarch64OKSep 17 2024

Exports:.ffbytes.ffmode.rambytes.rammode.vcoerceable.vimplemented.vmax.vmin.vmode.vNA.vunsigned.vvaluesaddappendLevelsarray2vectorarrayIndex2vectorIndexas.booleanas.byteas.ffas.ffdfas.hias.nibbleas.quadas.ramas.shortas.ubyteas.ushortas.vmodebigsamplebooleanbyteccbindcfunclengthclone.ffclone.ffdfcmeancmediancquantilecrbindcsumcsummarydeletedeleteIfOpendforderdfsortdimorderdimorder<-dimorderCompatibledimorderStandarddummy.dimnamesffffapplyffcolapplyffconformffdfffdfindexgetffdfindexsetffdforderffdfsortffdropffindexgetffindexorderffindexordersizeffindexsetffinfoffloadfforderffreturnffrowapplyffsaveffsave.imageffsortffsuitableffsymmxtensionsfftempfileffvecapplyffxtensionsfile.movefile.resizefilenamefilename<-finalizefinalizerfinalizer<-fixdiagfixdiag<-get.ffgetalignedpagesizegetdefaultpagesizegeterror.ffgeterrstr.ffgetpagesizegetset.ffhihiparseis.factoris.ffis.ffdfis.openis.orderedis.readonlymatcombmatprintmaxffmodemaxlengthmismatchncol<-nibblenrow<-pagesizepatternpattern<-quadram2ffcoderam2ramcoderamattribsramclassramdforderramdfsortread.csv.ffdfread.csv2.ffdfread.delim.ffdfread.delim2.ffdfread.ffread.table.ffdfreadwrite.ffrecodeLevelsregtest.fforderregtest.vmoderepnamset.ffshortsortLevelssplitPathFilestandardPathFilesubscript2integerswapsymmetricsymmIndex2vectorIndextempPathFileubyteunclass<-undimunsortunsplitPathFileushortvecprintvector.vmodevector2arrayvectorCompatiblevectorIndex2arrayIndexvectorStandardvmodevmode<-vtvwvw<-write.csvwrite.csv.ffdfwrite.csv2write.csv2.ffdfwrite.ffwrite.table.ffdfymismatch

Dependencies:bit

Readme and manuals

Help Manual

Help pageTopics
Incrementing an ff or ram objectadd add.default add.ff
Array: make vector from arrayarray2vector
Array: make vector positions from array indexarrayIndex2vectorIndex
Coercing ram to ff and ff to ram objectsas.ff as.ff.default as.ff.ff as.ram as.ram.default as.ram.ff
Conversion between bit and ff booleanas.bit.ff as.ff.bit
Coercing to ffdf and data.frameas.data.frame.ffdf as.ffdf as.ffdf.data.frame as.ffdf.ff_matrix as.ffdf.ff_vector
Hybrid Index, coercion toas.hi as.hi.( as.hi.bit as.hi.bitwhich as.hi.call as.hi.character as.hi.double as.hi.hi as.hi.integer as.hi.logical as.hi.matrix as.hi.name as.hi.NULL as.hi.ri as.hi.which
Hybrid Index, coercing fromas.bit.hi as.bitwhich.hi as.character.hi as.integer.hi as.logical.hi as.matrix.hi as.which.hi
Coercing to virtual modeas.boolean as.boolean.default as.byte as.byte.default as.nibble as.nibble.default as.quad as.quad.default as.short as.short.default as.ubyte as.ubyte.default as.ushort as.ushort.default as.vmode as.vmode.default as.vmode.ff
Sampling from large poolsbigsample bigsample.default bigsample.ff
Collapsing functions for batch processingccbind CFUN cfun clength cmean cmedian cquantile crbind csum csummary
Chunk ff_vector and ffdfchunk.ffdf chunk.ff_vector
Cloning ff and ram objectsclone.ff
Cloning ffdf objectsclone.ffdf
Closing ff filesclose.ff close.ffdf close.ff_pointer
Deleting the file behind an ff objectdelete delete.default delete.ff delete.ffdf delete.ff_pointer deleteIfOpen deleteIfOpen.ff deleteIfOpen.ff_pointer
Getting and setting dim and dimorderdim.ff dim.ffdf dim<-.ff dim<-.ffdf dimorder dimorder.default dimorder.ffdf dimorder.ff_array dimorder<- dimorder<-.ffdf dimorder<-.ff_array
Getting and setting dimnamesdimnames.ff dimnames.ff_array dimnames<-.ff_array
Getting and setting dimnames of ffdfdimnames.ffdf dimnames<-.ffdf names.ffdf names<-.ffdf row.names.ffdf row.names<-.ffdf
Test for dimorder compatibilitydimorderCompatible dimorderStandard vectorCompatible vectorStandard
Array: make dimnamesdummy.dimnames
Reading and writing vectors and arrays (high-level)Extract.ff [.ff [.ff_array [<-.ff [<-.ff_array [[.ff [[<-.ff
Reading and writing data.frames (ffdf)$.ffdf $<-.ffdf Extract.ffdf [.ffdf [<-.ffdf [[.ffdf [[<-.ffdf
ff classes for representing (large) atomic dataff ff_pointer
Apply for ff objectsffapply ffcolapply ffrowapply ffvecapply
Get most conforming argumentffconform
ff class for data.framesffdf
Reading and writing ffdf data.frame using ff subscriptsffdfindexget ffdfindexset
Sorting: convenience wrappers for data.framesdforder dfsort ffdforder ffdfsort ramdforder ramdfsort
Delete an ffarchiveffdrop
Reading and writing ff vectors using ff subscriptsffindexget ffindexset
Sorting: chunked ordering of integer suscript positionsffindexorder ffindexordersize
Inspect content of ff savesffinfo
Reload ffSaved Datasetsffload
Sorting: order from ff vectorsfforder
Return suitable ff objectffreturn
Save R and ff objectsffsave ffsave.image
Sorting of ff vectorsffsort
Test ff object for suitabilityffsuitable ffsuitable_attribs
Test for availability of ff extensionsffsymmxtensions ffxtensions
Change size of move an existing filefile.move file.resize
Get or set filenamefilename filename.default filename.ffdf filename.ff_pointer filename<- filename<-.ff pattern pattern.ff pattern<- pattern<-.ff pattern<-.ffdf
Call finalizerfinalize finalize.ff finalize.ffdf finalize.ff_pointer
Get and set finalizer (name)finalizer finalizer.ff finalizer<- finalizer<-.ff
Test for fixed diagonalfixdiag fixdiag.default fixdiag.dist fixdiag.ff fixdiag<-
Get error and error stringgeterror.ff geterrstr.ff
Get page size informationgetalignedpagesize getdefaultpagesize getpagesize
Reading and writing vectors of values (low-level)get.ff getset.ff set.ff
Hybrid index classhi print.hi str.hi
Hybrid Index, parsinghiparse
Test for class ffis.ff
Test for class ffis.ffdf
Test if object is openedis.open is.open.ff is.open.ffdf is.open.ff_pointer
Get readonly statusis.readonly is.readonly.ff
Getting and setting 'is.sorted' physical attributeis.sorted.default is.sorted<-.default
Getting and setting lengthlength.ff length<-.ff
Getting length of a ffdf dataframelength.ffdf
Hybrid Index, queryinglength.hi maxindex.hi poslength.hi
Getting and setting factor levelsis.factor is.factor.default is.factor.ff is.ordered is.ordered.default is.ordered.ff levels.ff levels<-.ff
ff Limitations and WarningsLimWarn
Array: make matrix indices from row and columns positionsmatcomb
Print beginning and end of big matrixmatprint print.matprint
Lossless vmode coercabilitymaxffmode
Get physical length of an ff or ram objectmaxlength maxlength.default maxlength.ff
Test for recycle mismatchmismatch ymismatch
Getting and setting 'na.count' physical attributena.count.default na.count.ff na.count<-.default na.count<-.ff
Getting and setting namesnames.ff names.ff_array names<-.ff names<-.ff_array
Assigning the number of rows or columnsncol<- nrow<-
Opening an ff fileopen.ff open.ffdf
Pagesize of ff objectpagesize pagesize.ff
Getting and setting physical and virtual attributes of ff objectsphysical.ff physical<-.ff virtual.ff virtual<-.ff
Getting physical and virtual attributes of ffdf objectsphysical.ffdf virtual.ffdf
Print and str methodsprint.ff print.ffdf print.ff_matrix print.ff_vector str.ff str.ffdf
Factor codingsram2ffcode ram2ramcode
Get ramclass and ramattribsramattribs ramattribs.default ramattribs.ff ramattribs_excludes ramclass ramclass.default ramclass.ff ramclass_excludes
Sorting: order R vector in-RAM and in-placekeyorder.default mergeorder.default radixorder.default ramorder.default shellorder.default
Sorting: Sort R vector in-RAM and in-placekeysort.default mergesort.default radixsort.default ramsort.default shellsort.default
Importing csv files into ff data.framesread.csv.ffdf read.csv2.ffdf read.delim.ffdf read.delim2.ffdf read.table.ffdf
Reading and writing vectors (low-level)read.ff readwrite.ff write.ff
Sorting: regression testsregtest.fforder
Replicate with namesrepnam
Factor level manipulationappendLevels recodeLevels recodeLevels.factor recodeLevels.ff sortLevels sortLevels.factor sortLevels.ff sortLevels.ffdf
Analyze pathfile-stringsfftempfile splitPathFile standardPathFile tempPathFile unsplitPathFile
Reading and writing in one operation (high-level)swap swap.default swap.ff swap.ff_array
Test for symmetric structuresymmetric symmetric.default symmetric.dist symmetric.ff
Array: make vector positions from symmetric array indexsymmIndex2vectorIndex
Unclassed assignementunclass<-
Undimundim
Hybrid Index, internal utilitiessubscript2integer unsort unsort.ahi unsort.hi
Update ff content from another objectupdate.ff update.ffdf
Print beginning and end of big vectorprint.vecprint vecprint
Create vector of virtual modeboolean byte nibble quad short ubyte ushort vector.vmode vector.vmode.default vector.vmode.ff
Array: make array from vectorvector2array
Array: make array from index vector positionsvectorIndex2arrayIndex
Virtual storage mode.ffbytes .ffmode .rambytes .rammode .vcoerceable .vimplemented .vmax .vmin .vmode .vNA .vunsigned .vvalues regtest.vmode vmode vmode.default vmode.ff vmode<- vmode<-.default vmode<-.ff
Virtual storage mode of ffdfvmode.ffdf
Virtual transposet.ff vt vt.default vt.ff
Getting and setting virtual windowsvw vw.default vw.ff vw<- vw<-.ff_array vw<-.ff_vector
Exporting csv files from ff data.frameswrite.csv write.csv.ffdf write.csv2 write.csv2.ffdf write.table.ffdf