Simian version runs under any Java2 1.4 or higher Java Virtual Machine (JVM) and any Dot Net 1.1 or higher environment, meaning Simian can be run on anything from windows, macOS and linux to zOS.
The distribution contains everything you need to be up and running in minutes:
Aslak Hellesoy has kindly donated a Maven plugin.
Neil Bartlett has kindly donated an Eclipse plugin.
Simian fully supports the following languages:
with partial support for the following languages:
If the file is not of a supported type, it is treated as plain text. This means that you can usually run Simian on just about any type of human-readable file with good results.
Ignores whitespace, curly braces, comments, imports, includes, package declarations, etc.
Supports the following processing options:
Option | Languages | Default | Possible values | Description |
---|---|---|---|---|
formatter | all | none | plain, xml, emacs, vs (visual studio), yaml, null | Specifies the format in which processing results will be produced. |
threshold | all | 6 | integer >= 2 | Matches will contain at least the specified number of lines. |
language | n/a | none | java, c#, cs, csharp, c, c++, cpp, cplusplus, js, javascript, cobol, abap, rb, ruby, vb, jsp, html, xml, groovy, asm390 | Assumes all files are in the specified language |
defaultLanguage | n/a | none | java, c#, cs, csharp, c, c++, cpp, cplusplus, js, javascript, cobol, abap, rb, ruby, vb, jsp, html, xml, groovy, asm390 | Assumes files are in the specified language if none can be inferred |
failOnDuplication | all | true | boolean | Causes the checker to fail the current process if duplication is detected |
reportDuplicateText | all | false | boolean | Prints the duplicate text in reports |
ignoreBlocks | all | none | string | Ignores all lines between specified START/END markers |
ignoreCurlyBraces | Java, C#, C, C++, JavaScript, Ruby, Groovy | false | boolean | Curly braces are ignored. |
ignoreIdentifiers | Java, C#, C, C++, JavaScript, COBOL, Ruby, Groovy | false | boolean | Completely ignores all identfiers. |
ignoreIdentifierCase | Java, C#, C, C++, JavaScript, COBOL, Ruby, Groovy | true | boolean | Matches identifiers irrespective of case. Eg. MyVariableName and myvariablename would both match. |
ignoreRegions | C# | false | boolean | Ignore lines between #region/#endregion. |
ignoreStrings | Java, C#, C, C++, JavaScript, COBOL, Ruby, SQL, Groovy | false | boolean | MyVariable and myvariablewould both match. |
ignoreStringCase | Java, C#, C, C++, JavaScript, COBOL, Ruby, SQL, Groovy | true | boolean | "Hello, World" and "HELLO, WORLD" would both match. |
ignoreNumbers | Java, C#, C, C++, JavaScript, COBOL, Ruby, SQL, Groovy | false | boolean | int x = 1; and int x = 576; would both match. |
ignoreCharacters | Java, C#, C, C++, JavaScript, COBOL, Ruby, Groovy | false | boolean | 'A' and 'Z'would both match. |
ignoreCharacterCase | Java, C#, C, C++, JavaScript, COBOL, Ruby, Groovy | true | boolean | 'A' and 'a'would both match. |
ignoreLiterals | Java, C#, C, C++, JavaScript, COBOL, Ruby, SQL, Groovy | false | boolean | 'A', "one" and 27.8would all match. |
ignoreSubtypeNames | Java, C, Groovy | false | boolean | BufferedReader, StringReader and Reader would all match. |
ignoreModifiers | Java, C#, C, C++, JavaScript, Groovy | true | boolean | public, protected, static, etc. |
ignoreVariableNames | Java, C, Groovy | false | boolean | Completely ignores variable names (field, parameter and local). Eg. int foo = 1; and int bar = 1 would both match |
balanceParentheses | Java, C#, C, C++, JavaScript, COBOL, Ruby, SQL, Groovy | false | boolean | Ensures that expressions inside parenthesis that are split across multiple physical lines are considered as one. |
balanceCurlyBraces | Ruby | false | boolean | Ensures that expressions inside curly braces that are split across multiple physical lines are considered as one. |
balanceSquareBrackets | Java, C#, C, C++, JavaScript, Ruby, Groovy | false | boolean | Ensures that expressions inside square brackets that are split across multiple physical lines are considered as one. Defaults to false. |
Recognises the following file extensions/language options:
Language | Extensions |
---|---|
java | java |
c sharp | cs, c#, csharp |
c | c, h, m |
cpp | cpp, c++, hpp, cplusplus, inl |
ruby | rb, ruby |
cobol | cobol |
abap | abap |
xml | xml, xsl, xsd |
jsp | jsp |
asp | asp |
javascript | js, javascript |
html | html, htm |
vb | vb, bas, cls, frm |
lisp | lisp, lsp |
groovy | groovy |
text | this is the default when no appropriate language can be determined |
Here is an example of the standard output produced by Simian (version 2.6.0) when run against the JDK 9 source code:
Similarity Analyser 2.6.0 - http://www.harukizaemon.com/simian Copyright (c) 2003-2018 Simon Harris. All rights reserved. Simian is not free unless used solely for non-commercial or evaluation purposes. {failOnDuplication=true, ignoreCharacterCase=true, ignoreCurlyBraces=true, ignoreIdentifierCase=true, ignoreModifiers=true, ignoreStringCase=true, threshold=6} Found 6 duplicate lines with fingerprint 2340e9b1e2419bcb5516a5a1d9037271 in the following files: Between lines 70 and 82 in com/sun/corba/se/PortableActivationIDL/_ServerProxyImplBase.java Between lines 70 and 82 in com/sun/corba/se/spi/activation/_ServerImplBase.java Between lines 90 and 102 in org/omg/CosNaming/BindingIteratorPOA.java Found 6 duplicate lines with fingerprint e94fb8a8017a3d05048dcdfb8bce8dff in the following files: Between lines 101 and 111 in javax/swing/plaf/synth/SynthOptionPaneUI.java Between lines 96 and 106 in javax/swing/plaf/synth/SynthMenuBarUI.java Found 6 duplicate lines with fingerprint 16485a9bd0994dc56f52735c2395a7b2 in the following files: Between lines 290 and 295 in java/time/zone/ZoneRules.java Between lines 234 and 239 in java/time/zone/ZoneRules.java Found 6 duplicate lines with fingerprint 7ca74bcd5707431bd195c0d867f5767e in the following files: Between lines 380 and 398 in org/omg/DynamicAny/_DynFixedStub.java Between lines 463 and 481 in org/omg/DynamicAny/_DynSequenceStub.java ... Found 233 duplicate lines with fingerprint 8bc044fa6e21987c76424535dbc1fe47 in the following files: Between lines 77 and 377 in javax/swing/plaf/nimbus/TextFieldPainter.java Between lines 77 and 377 in javax/swing/plaf/nimbus/PasswordFieldPainter.java Between lines 77 and 377 in javax/swing/plaf/nimbus/FormattedTextFieldPainter.java Found 382 duplicate lines with fingerprint 922ba26b84cbbf0edfabb0e25189c3b4 in the following files: Between lines 81 and 482 in com/sun/org/apache/xalan/internal/res/XSLTErrorResources_sv.java Between lines 81 and 482 in com/sun/org/apache/xalan/internal/res/XSLTErrorResources_es.java Between lines 81 and 482 in com/sun/org/apache/xalan/internal/res/XSLTErrorResources_fr.java Between lines 81 and 482 in com/sun/org/apache/xalan/internal/res/XSLTErrorResources.java Between lines 81 and 482 in com/sun/org/apache/xalan/internal/res/XSLTErrorResources_ko.java Between lines 81 and 482 in com/sun/org/apache/xalan/internal/res/XSLTErrorResources_zh_CN.java Between lines 81 and 482 in com/sun/org/apache/xalan/internal/res/XSLTErrorResources_pt_BR.java Between lines 81 and 482 in com/sun/org/apache/xalan/internal/res/XSLTErrorResources_zh_TW.java Between lines 81 and 482 in com/sun/org/apache/xalan/internal/res/XSLTErrorResources_de.java Between lines 81 and 482 in com/sun/org/apache/xalan/internal/res/XSLTErrorResources_it.java Between lines 81 and 482 in com/sun/org/apache/xalan/internal/res/XSLTErrorResources_ja.java Found 141070 duplicate lines in 12134 blocks in 2406 files Processed a total of 775314 significant (2402974 raw) lines in 7714 files Processing time: 4.818sec
* Results may vary depending on factors such as hardware used, number of duplicate lines, etc.
Java and all Java-based marks are trademarks or registered trademarks of
Oracle in the United States and other countries.
.NET and all .NET-based marks are trademarks or registered trademarks of Microsoft® in the United States and
other countries.
Copyright (c) 2003-2018 Simon Harris. All rights reserved.